|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectpt.tumba.cage.Cage
public class Cage
Extracting named entities (names, places, dates, and other words and phrases that establish the meaning of a body of text) is critical to software systems that process large amounts of unstructured data coming from sources such as email, document files, and the Web. The purpose of named entity recognition is to locate certain types of phrases, and associate them with a category. This allows text analysis software to create a map of the concepts in a document.
Cage is a utility for extracting named entities from text, using a morphological approach to analyzing text. This means that CAGE works with the language specific features, such as punctuation, capitalization, actual words, word forms and affixes. Particularly, the used method involves matching names from lexicons for persons, geographical places and organizations, together with a mechanism for matching common patterns associated with these entities.
This class encapsulates the needed state information, and provides a command line interface for the program.
Constructor Summary | |
---|---|
Cage()
Simple constructor for invocation by subclass constructors, typically implicit. |
|
Cage(java.lang.String namesFile,
java.lang.String placesFile,
java.lang.String organizationsFile)
Constructor for Cage |
Method Summary | |
---|---|
java.lang.String |
findEntities(java.lang.String sentence)
Finds named entities in a given text block. |
NamedEntity[] |
getNamedEntities()
Returns an array of all the entities found in the text. |
java.lang.String |
getText()
Returns the input text to the named entity recognizer. |
java.lang.String |
getTextFinal()
Returns the text given to the named entity recognized, with the named entities surrounded by appropriate SGML tags. |
void |
load(java.lang.String namesFile,
java.lang.String placesFile,
java.lang.String organizationsFile)
Loads the data from the lexicon files and corresponding pattern rule files. |
static void |
main(java.lang.String[] args)
Main method, used to text named entity recognition. |
void |
setRegularTextMode()
Sets the text processing mode to handle regular text files. |
void |
setTeXMode()
Sets the text processing mode to handle TeX/LaTeX files. |
void |
setText(java.lang.String text)
Sets the text to be processed by the named entity recognizer. |
void |
setXMLMode()
Sets the text processing mode to handle XML and SGML files. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Cage()
public Cage(java.lang.String namesFile, java.lang.String placesFile, java.lang.String organizationsFile) throws java.io.IOException
namesFile
- Path for the names lexicon fileplacesFile
- Path for the places lexicon fileorganizationsFile
- Path for the organizations lexicon file
java.io.IOException
Method Detail |
---|
public java.lang.String findEntities(java.lang.String sentence)
DefaultWordFinder
class.
sentence
- The text block.
public void load(java.lang.String namesFile, java.lang.String placesFile, java.lang.String organizationsFile) throws java.io.IOException
namesFile
- Path for the names lexicon file.placesFile
- Path for the places lexicon file.organizationsFile
- Path for the organizations lexicon file.
java.io.IOException
- A problem occurred while reading the lexicons.public NamedEntity[] getNamedEntities()
public void setText(java.lang.String text)
text
- The text to be processed.public java.lang.String getText()
public java.lang.String getTextFinal()
public void setTeXMode()
public void setXMLMode()
public void setRegularTextMode()
public static void main(java.lang.String[] args) throws java.lang.Exception
argv
- The command line input, tokenized.
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |