Exercises
A great resource to download text files is the Project Gutenberg. Project Gutenberg is a library of over 70,000 free eBooks, mostly older works for which U.S. copyright has expired. You can download them (including as text files) or read them online.
Here is a small extract from the book "Photographic investigations of faint nebulae" by Edwin Hubble (1920). The complete text is also available on the Project Gutenberg site.
Exercise 1:
Write a function that reads bookSampleHubble.txt and prints only the words with more than 10 characters (not counting whitespace).
Exercise 2:
In 1939 Ernest Vincent Wright published a 50,000 word novel called "Gadsby" that does not contain the letter e.' Since e' is the most common letter in English, that's not easy to do.
Write a function called has_no_e(filename) that returns True if the given text file doesn't have the letter e in it.
Write a function no_e_percentage(filename) that computes the percentage of the words in the file that have no e.
Exercise 3: Gadsby
Write a function named avoids(filename, forbidden) that takes a text file's name and a string of forbidden letters, and that returns the set of words that don't use any of the forbidden letters.
Modify your program to find a combination of 5 forbidden letters that excludes the smallest number of words.
Exercise 4:
Write a function named redact_uses_only(inputfile, outputfile, letters) that takes a text file inputfile as input, read the file and redact out all the words in the text file that is not only composed of characters from the string letters, and write the redacted text into the outputfile. For example, if letters == 'ehlo' the text 'Hello, I am in hell' should be redacted to 'Hello, _ __ __ hell'.
Last updated