Tokens

1)

a) Using one of the Corpora in the last lab. Calculate the average “Tokens” per sentence.

Don't use plagiarized sources. Get Your Custom Essay on
Tokens
Just from $13/Page
Order Essay

b) Using the same or different corpus, which category has the longest sentences on average, which has the shortest?

2) Download your own “Corpus” on https://www.gutenberg.org/ (Links to an external site.)

a) How many sentences are in the document (use NLTK to split the sentences)? How does this differ from the amount of lines in the file (readlines)?

b) After tokenizing the sentences, find 3 errors and describe why you think this error might of occurred. What in the algorithm might have gone wrong?

Homework Writing Bay
Calculator

Calculate the price of your paper

Total price:$26
Our features

We've got everything to become your favourite writing service

Need a better grade?
We've got you covered.

Order your paper