Introduction to Computer Programming, 2005
Program Assignment Ⅸ
2005/12/13
Scope: Chapter 2 to chapter 11 in the textbook Notice:
1. Plagiarism is prohibited.
2. TA will randomly choose 10 students to demonstrate program.
3. Please make your code readable. Reference for coding style:
http://www.cs.usyd.edu.au/~scilect/tpop/handouts/Style.htm
Problem: Query Your File - in this assignment, you are asked to write a program to help people search words in a file. In order to speedup the query time, you have to preprocessing an input file first and build a word database. In this database, it consists word, word count and word occurrence (line and the order in the line). In this problem, the array technology is adopted to store the word database. Because of static array for word occurrence, if the array is full, you can ignore the remaining. When users query the database, they can query in case sensitive or case insensitive mode. Your program should output word count, word occurrence, and array status for occurrence (full or not full).
Deadline: 2006/1/4, 12:00 pm Description:
1. We will provide stop-word list and punctuation marks to filter out stop word.
2. Using struct to maintain word database that consists word, word count, and occurrence (line number and the order in the line) called word database.
3. Using fixed array to store word occurrence, if size of those information is larger than the size of array, just ignore the remaining. Each array can store 20 pairs of line number and order.
4. Notice, user can change the input file name to test any files.
5. Don’t worry about some special cases such as dash (She wants high-style clothes, an elegant
apartment, a modern car—in short, she wants money.), hyphen (one-year plan), and apostrophe (‘s).
Hint:
1. string tokens
strtok()
strtok_r()
usage:
http://www.freebsd.org/cgi/man.cgi?query=strtok&apropos=0&sektion=0&manpath=FreeBSD +6.0-RELEASE+and+Ports&format=html
2. locate a substring in a string
strstr()
strcasestr()
strnstr()
usage:
http://www.freebsd.org/cgi/man.cgi?query=strstr&apropos=0&sektion=0&manpath=FreeBSD+
6.0-RELEASE+and+Ports&format=html 3. separate strings
strsep()
usage:
http://www.freebsd.org/cgi/man.cgi?query=strsep&apropos=0&sektion=0&manpath=FreeBSD +6.0-RELEASE+and+Ports&format=html
4. locate character in string
strchr()
strrchr()
usage:
http://www.freebsd.org/cgi/man.cgi?query=strchr&apropos=0&sektion=0&manpath=FreeBSD +6.0-RELEASE+and+Ports&format=html
Sample Input:
input.txt
Last year Kenya, with one of Africa s most developed tourism industries, hosted about 600,000 tourists and pocketed $577 million or about 12 percent of its GDP. In the coming year, Kenya tourism ministry official Rebecca Nabutola said Kenya expects to attract 1 million to 1.6 million tourists.
"When tourism is thriving, we get better schools, better hospitals and better infrastructure. When tourism does well, so do our other industries," said Nabutola.
Some critics charge that in the rush to collect tourist dollars, indigenous cultures are changed forever or forced to move away from ancestral lands to make way for new hotels, restaurants, roads or airports.
Example Output:
== Query Your File ==
Please input the input file name: input.txt Parsing input.txt: Done
Please input your query word: tourism Case sensitive search: Yes
Word Count: 4
Occurrence: 1:11, 1:33, 2:2, 2:15 Occurrence array for “tourism”: not full Please input your query word: q
Bye.
Hand-in:
1. Pack your program (HW9_STUDENTID.c) and document (HW9_STUDENTID.doc (or pdf)) into a compressed file (HW9_STUDENTID.rar (or zip)).
2. In the document, you can explain how to achieve this program and use simple flow chart to show your program architecture. Also, you can show what kind of problems you met on developing this
program and how to solve them. Notice, use one page to document your program. You must use flow chart to illustrate this homework assignment.
3. Send the compressed file to [email protected] with mail subject [cprog2005]HW9_STUDENTID.
4. For example, if your student id is b94902200, the program file name is HW9_b94902200.c, the
document file name is HW9_b94902200.doc and compressed file name is HW9_b94902200.rar.