|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectpt.tumba.cage.DefaultWordFinder
public class DefaultWordFinder
A word finder for normal text documents, which searches text for sequences of words and text blocks.This class also defines common methods and behaviour for the various word finding subclasses.
StringTokenizer,
BreakIterator,
TeXWordFinder,
XMLWordFinder| Constructor Summary | |
|---|---|
DefaultWordFinder()
Constructor for DefaultWordFinder. |
|
DefaultWordFinder(java.lang.String inText)
Constructor for DefaultWordFinder. |
|
| Method Summary | |
|---|---|
java.lang.String |
current()
Returns the current word in the text. |
java.lang.String |
currentNGram(int n)
Returns the current word N-gram from the input. |
java.lang.String |
currentSegment()
Returns the current text segment from the input. |
java.lang.String |
currentWordGram(int n)
Returns the current word N-gram from the input. |
java.lang.String |
getText()
Returns the text associated with this DefaultWordFinder. |
boolean |
hasNext()
Tests if there are more words available from the text. |
java.lang.String |
lookAhead()
Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space. |
java.lang.String |
next()
This method scans the text from the end of the last word, and returns a String corresponding to the next word. |
java.lang.String |
nextSegment()
Returns the next text segment from the input. |
void |
replace(java.lang.String newWord)
Replaces the current word in the text. |
void |
replaceBigram(java.lang.String newBigram)
Replaces the current bigram (current word and the next as returned by lookahead) in the text. |
void |
replaceSegment(java.lang.String newSegment)
Replaces the current text segment. |
void |
setText(java.lang.String newText)
Changes the text associates with this DefaultWordFinder. |
static java.lang.String[] |
splitNGrams(java.lang.String text,
int n)
Splits a given String into an array with its constituent character n-grams. |
static java.lang.String[] |
splitSegments(java.lang.String text)
Splits a given String into an array with its constituent text segments. |
static java.lang.String[] |
splitWordGrams(java.lang.String text,
int n)
Splits a given String into an array with its constituent word n-grams. |
static java.lang.String[] |
splitWords(java.lang.String text)
Splits a given String into an array with its constituent words. |
boolean |
startsSentence()
Checks if the current word marks the begining of a sentence. |
java.lang.String |
toString()
Produces a string representation of this word finder by returning the associated text. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public DefaultWordFinder(java.lang.String inText)
inText - A String with the input text to tokenize.public DefaultWordFinder()
| Method Detail |
|---|
public java.lang.String currentWordGram(int n)
n - Number of consecutive words on the n-grams.
public java.lang.String currentNGram(int n)
n - Number of consecutive characters on the n-grams.
public java.lang.String currentSegment()
public java.lang.String nextSegment()
public void replaceSegment(java.lang.String newSegment)
newSegment - A String with the new text segment.public java.lang.String getText()
public void setText(java.lang.String newText)
newText - The new String with the input text to tokenize.public java.lang.String current()
public boolean hasNext()
public void replace(java.lang.String newWord)
newWord - A string with the replacement word.public void replaceBigram(java.lang.String newBigram)
newBigram - A string with the replacement Bigram.public java.lang.String lookAhead()
public boolean startsSentence()
public java.lang.String toString()
toString in class java.lang.Objectpublic java.lang.String next()
public static java.lang.String[] splitWords(java.lang.String text)
text - A String.
public static java.lang.String[] splitSegments(java.lang.String text)
text - A String.
public static java.lang.String[] splitWordGrams(java.lang.String text,
int n)
text - A String.n - Number of consecutive words on the n-grams.
public static java.lang.String[] splitNGrams(java.lang.String text,
int n)
text - A String.n - Number of consecutive characters on the n-grams.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||