Chapter [ ]: Natural Language Processing

What does NLP stand for?

Natural language processing is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve: natural language understanding, enabling computers to derive meaning from human or natural language input; and others involve natural language generation.

Write code to count the number of words in a document using any programming language. Now, extend this for bi-grams.

# Simple Python
file = open('doc.txt', 'r+')
word_count = {}

for word in file.read().split():
    if word not in word_count:
        word_count[word] = 1
    else:
        word_count[word] += 1

for k, v in word_count.items():
    print(k, v)

# Counter
from collection import Counter

file = open('doc.txt', 'r+')
word_count = Counter(file.read().split()

for item in word_count.items(): print("{}\t{}".format(*item))

# bi-grams
import re
from itertools import islice, izip

file = open('doc.txt', 'r+')
words = re.findall("\w+", file.read())

print Counter(izip(words, islice(words, 1, None)))

# Optional exercise - Build a function that gets the frequency of any n-gram

results matching ""

    No results matching ""