How to Tokenizing Sentences and words in NLTK

Spread the love

NLTK Tokenizing Sentences: 

You need to import nltk library with sent_tokenzie for this program. It split the sentences by using punctuations.

from nltk.tokenize import sent_tokenize

text = "I love python. I love nlp"

print(sent_tokenize(text))

 

Output:

['I love python.', 'I love nlp']

 

NLTK Tokenizing Words: 

Same like sentence tokenize, you need to use word_tokenize function to split the words.

from nltk.tokenize import word_tokenize

text = "I love python. I love nlp"

print(word_tokenize(text))

 

Output:

['I', 'love', 'python', '.', 'I', 'love', 'nlp']

 

admin

admin

Leave a Reply

Your email address will not be published. Required fields are marked *