|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.cage.DefaultWordFinder
public class DefaultWordFinder
A word finder for normal text documents, which searches text for sequences of words and text blocks.This class also defines common methods and behaviour for the various word finding subclasses.
StringTokenizer
,
BreakIterator
,
TeXWordFinder
,
XMLWordFinder
Constructor Summary | |
---|---|
DefaultWordFinder()
Constructor for DefaultWordFinder. |
|
DefaultWordFinder(java.lang.String inText)
Constructor for DefaultWordFinder. |
Method Summary | |
---|---|
java.lang.String |
current()
Returns the current word in the text. |
java.lang.String |
currentNGram(int n)
Returns the current word N-gram from the input. |
java.lang.String |
currentSegment()
Returns the current text segment from the input. |
java.lang.String |
currentWordGram(int n)
Returns the current word N-gram from the input. |
java.lang.String |
getText()
Returns the text associated with this DefaultWordFinder. |
boolean |
hasNext()
Tests if there are more words available from the text. |
java.lang.String |
lookAhead()
Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space. |
java.lang.String |
next()
This method scans the text from the end of the last word, and returns a String corresponding to the next word. |
java.lang.String |
nextSegment()
Returns the next text segment from the input. |
void |
replace(java.lang.String newWord)
Replaces the current word in the text. |
void |
replaceBigram(java.lang.String newBigram)
Replaces the current bigram (current word and the next as returned by lookahead) in the text. |
void |
replaceSegment(java.lang.String newSegment)
Replaces the current text segment. |
void |
setText(java.lang.String newText)
Changes the text associates with this DefaultWordFinder. |
static java.lang.String[] |
splitNGrams(java.lang.String text,
int n)
Splits a given String into an array with its constituent character n-grams. |
static java.lang.String[] |
splitSegments(java.lang.String text)
Splits a given String into an array with its constituent text segments. |
static java.lang.String[] |
splitWordGrams(java.lang.String text,
int n)
Splits a given String into an array with its constituent word n-grams. |
static java.lang.String[] |
splitWords(java.lang.String text)
Splits a given String into an array with its constituent words. |
boolean |
startsSentence()
Checks if the current word marks the begining of a sentence. |
java.lang.String |
toString()
Produces a string representation of this word finder by returning the associated text. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public DefaultWordFinder(java.lang.String inText)
inText
- A String with the input text to tokenize.public DefaultWordFinder()
Method Detail |
---|
public java.lang.String currentWordGram(int n)
n
- Number of consecutive words on the n-grams.
public java.lang.String currentNGram(int n)
n
- Number of consecutive characters on the n-grams.
public java.lang.String currentSegment()
public java.lang.String nextSegment()
public void replaceSegment(java.lang.String newSegment)
newSegment
- A String with the new text segment.public java.lang.String getText()
public void setText(java.lang.String newText)
newText
- The new String with the input text to tokenize.public java.lang.String current()
public boolean hasNext()
public void replace(java.lang.String newWord)
newWord
- A string with the replacement word.public void replaceBigram(java.lang.String newBigram)
newBigram
- A string with the replacement Bigram.public java.lang.String lookAhead()
public boolean startsSentence()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String next()
public static java.lang.String[] splitWords(java.lang.String text)
text
- A String.
public static java.lang.String[] splitSegments(java.lang.String text)
text
- A String.
public static java.lang.String[] splitWordGrams(java.lang.String text, int n)
text
- A String.n
- Number of consecutive words on the n-grams.
public static java.lang.String[] splitNGrams(java.lang.String text, int n)
text
- A String.n
- Number of consecutive characters on the n-grams.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |