Blog

Tools for computational linguistics

Here at AIDevel we do a lot of Natural Language Processing. Over time, after we stumbled upon many tools we selected a few that are very useful.

 

Python

Yes, Python. Besides the fact that it’s a fantastic programming language it’s also appropriate for processing natural language thanks to the many packages available.

 

NLTK (Natural Language ToolKit)

This is an amazing package providing a lot of useful stuff like: text tokenization, POS-tagging, parsing, text classification, named entity recognition, weka, corpus readers but also an useful interface to WordNet.

 

WordNet

WordNet is an amazing tool for doing NLP. It is basically a database of English words. Nouns, verbs, adjectives and adverbs are grouped into synsets(sets of synonims). Other relations between words also exist: antonyms, hiponyms, hypernyms etc. WordNet is especially useful for sense disambiguisation. For tasks like sentiment analysis or opinion mining there also is a WordNet extension: SentiWordNet. Check it out!

 

Pattern.en

This package is especially useful for writing inflectors: singularize/pluralize nouns, articulation, comparative/superlative form of adjectives, verb conjugation, etc. The package has a ton of other useful features like: parsing, segmentation, tools for analysing sentiment or mood, and lots more.

 

Stanford Parser

This is probably the most known English parser and that is for good reasons. SFP is a great tool for tagging/parsing/dependency extraction. The parser is written in Java so you cannot use it in Python directly. Fortunately there are ways of integrating it in your Python application. Here are some of them:

http://projects.csail.mit.edu/spatial/Stanford_Parser

https://github.com/dasmith/stanford-corenlp-python

http://jpype.sourceforge.net/