Package pt.tumba.cage

This package implements a utility for extracting named entities from text through lexicons and a variety of orthographic and contextual features.

See:
          Description

Class Summary
Cage Extracting named entities (names, places, dates, and other words and phrases that establish the meaning of a body of text) is critical to software systems that process large amounts of unstructured data coming from sources such as email, document files, and the Web.
DefaultWordFinder A word finder for normal text documents, which searches text for sequences of words and text blocks.This class also defines common methods and behaviour for the various word finding subclasses.
ExtractAbbrev A simple algorithm for extracting of abbreviations and their definitions from text.
MathEvaluator A Mathematic expression evaluator.
NamedEntity A Named Entity recognized in the text.
Numex Implements methods for recognizing numeric expressions in both Portuguese and English texts.
StringUtils A collection of String handling utility methods.
TeXWordFinder A word finder for TeX and LaTeX documents, which searches text for sequences of letters, but ignores any commands and environments as well as Math environments.
XMLWordFinder A word finder for XMLdocuments, which searches text for sequences of letters, but ignores tags.
 

Package pt.tumba.cage Description

This package implements a utility for extracting named entities from text through lexicons and a variety of orthographic and contextual features.