Exercise 6
First part:
Write the word counting program with parallel lists. Your program should read in all of alice.txt and compute the count for each word.
Print the frequency of the following words: “Alice”, “cassowary”, “roses”, “rabbit”, “mirror”.
If not doing anything fancy with punctuation and case (in other words, you are just splitting each line of the text on white spaces), you should get the following answers:
Frequency of Alice : 221
Frequency of cassowary : 0
Frequency of roses : 1
Frequency of rabbit : 1
Frequency of mirror : 0
Second part:
Read the "efficiency interlude" in the notes.
Third part:
For additional practice, try some of the following:
- Write the "max" function (this is a Python builtin... but write it anyway). max([1, 3, 4, 0, 2]) should be 4.
- Write a function that finds the index of a sublist in a longer list. For instance, findSublist([1, 2, 3], [-1, 0, 1, 2, 3, 4, 5]) should return 2.
- Write a function that finds the longest sublist shared by two lists. For instance, longestSublist(["cassowary", "emu", "rhea"], ["ostrich", "cassowary", "emu", "kiwi"]) should return ["cassowary", "emu"]. This function is used in many computational morphology applications, since it's a good starting point for finding shared roots or affixes.
- Find all the words in alice.txt which appear both capitalized and uncapitalized. For which of them is the ratio of counts greatest?