Chapter [ ]: Natural Language Processing
What does NLP stand for?
Natural language processing is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve: natural language understanding, enabling computers to derive meaning from human or natural language input; and others involve natural language generation.
Write code to count the number of words in a document using any programming language. Now, extend this for bi-grams.
# Simple Python
file = open('doc.txt', 'r+')
word_count = {}
for word in file.read().split():
if word not in word_count:
word_count[word] = 1
else:
word_count[word] += 1
for k, v in word_count.items():
print(k, v)
# Counter
from collection import Counter
file = open('doc.txt', 'r+')
word_count = Counter(file.read().split()
for item in word_count.items(): print("{}\t{}".format(*item))
# bi-grams
import re
from itertools import islice, izip
file = open('doc.txt', 'r+')
words = re.findall("\w+", file.read())
print Counter(izip(words, islice(words, 1, None)))
# Optional exercise - Build a function that gets the frequency of any n-gram