DefaultWordFinder

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

pt.tumba.cage
Class DefaultWordFinder

java.lang.Object
  pt.tumba.cage.DefaultWordFinder

Direct Known Subclasses:: TeXWordFinder, XMLWordFinder

public class DefaultWordFinder
extends java.lang.Object
extends java.lang.Object

A word finder for normal text documents, which searches text for sequences of words and text blocks.This class also defines common methods and behaviour for the various word finding subclasses.

Author:: Bruno Martins
See Also:: StringTokenizer, BreakIterator, TeXWordFinder, XMLWordFinder

Constructor Summary
`DefaultWordFinder()` Constructor for DefaultWordFinder.
`DefaultWordFinder(java.lang.String inText)` Constructor for DefaultWordFinder.

Method Summary
`java.lang.String`	`current()` Returns the current word in the text.
`java.lang.String`	`currentNGram(int n)` Returns the current word N-gram from the input.
`java.lang.String`	`currentSegment()` Returns the current text segment from the input.
`java.lang.String`	`currentWordGram(int n)` Returns the current word N-gram from the input.
`java.lang.String`	`getText()` Returns the text associated with this DefaultWordFinder.
`boolean`	`hasNext()` Tests if there are more words available from the text.
`java.lang.String`	`lookAhead()` Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space.
`java.lang.String`	`next()` This method scans the text from the end of the last word, and returns a String corresponding to the next word.
`java.lang.String`	`nextSegment()` Returns the next text segment from the input.
`void`	`replace(java.lang.String newWord)` Replaces the current word in the text.
`void`	`replaceBigram(java.lang.String newBigram)` Replaces the current bigram (current word and the next as returned by lookahead) in the text.
`void`	`replaceSegment(java.lang.String newSegment)` Replaces the current text segment.
`void`	`setText(java.lang.String newText)` Changes the text associates with this DefaultWordFinder.
`static java.lang.String[]`	`splitNGrams(java.lang.String text, int n)` Splits a given String into an array with its constituent character n-grams.
`static java.lang.String[]`	`splitSegments(java.lang.String text)` Splits a given String into an array with its constituent text segments.
`static java.lang.String[]`	`splitWordGrams(java.lang.String text, int n)` Splits a given String into an array with its constituent word n-grams.
`static java.lang.String[]`	`splitWords(java.lang.String text)` Splits a given String into an array with its constituent words.
`boolean`	`startsSentence()` Checks if the current word marks the begining of a sentence.
`java.lang.String`	`toString()` Produces a string representation of this word finder by returning the associated text.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

DefaultWordFinder

public DefaultWordFinder(java.lang.String inText)

Constructor for DefaultWordFinder.

Parameters:: inText - A String with the input text to tokenize.

DefaultWordFinder

public DefaultWordFinder()

Constructor for DefaultWordFinder.

Method Detail

currentWordGram

public java.lang.String currentWordGram(int n)

Returns the current word N-gram from the input. An N-gram is defined as the word sequence between the current position and the next n words.

Parameters:: n - Number of consecutive words on the n-grams.
Returns:: A String with the current word N-gram.

currentNGram

public java.lang.String currentNGram(int n)

Returns the current word N-gram from the input. An N-gram is defined as the character sequence between the current position and the next n characters.

Parameters:: n - Number of consecutive characters on the n-grams.
Returns:: A String with the current word N-gram.

currentSegment

public java.lang.String currentSegment()

Returns the current text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces.

Returns:: A String with the current text segment.

nextSegment

public java.lang.String nextSegment()

Returns the next text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces.If there are no more segments to return, it retuns a null String.

Returns:: A String with the next text segment.

replaceSegment

public void replaceSegment(java.lang.String newSegment)

Replaces the current text segment. After a call to this method, a call to currentSegment() returns the new text segment and a call to getText() returns the text supplied to this WordFinder with the current segment replaced.

Parameters:: newSegment - A String with the new text segment.

getText

public java.lang.String getText()

Returns the text associated with this DefaultWordFinder.

Returns:: A String with the text associated with this DefaultWordFinder.

setText

public void setText(java.lang.String newText)

Changes the text associates with this DefaultWordFinder.

Parameters:: newText - The new String with the input text to tokenize.

current

public java.lang.String current()

Returns the current word in the text.

Returns:: A String with the current word in the text.

hasNext

public boolean hasNext()

Tests if there are more words available from the text.

Returns:: true if and only if there is at least one word in the string after the current position, and false otherwise.

replace

public void replace(java.lang.String newWord)

Replaces the current word in the text. After a call to this method, a call to current() returns the new word and a call to getText() returns the text supplied to this WordFinder with the current word replaced.

Parameters:: newWord - A string with the replacement word.

replaceBigram

public void replaceBigram(java.lang.String newBigram)

Replaces the current bigram (current word and the next as returned by lookahead) in the text. After a call to this method, a call to current() returns the Bigram and a call to getText() returns the text supplied to this WordFinder with the current Bigram replaced.

Parameters:: newBigram - A string with the replacement Bigram.

lookAhead

public java.lang.String lookAhead()

Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space. This is usefull for getting BiGrams from the text.

Returns:: The next word in the text, or null.

startsSentence

public boolean startsSentence()

Checks if the current word marks the begining of a sentence.

Returns:: true if the current word marks the begining of a sentence and false otherwise.

toString

public java.lang.String toString()

Produces a string representation of this word finder by returning the associated text.

Overrides:: toString in class java.lang.Object

public java.lang.String next()

This method scans the text from the end of the last word, and returns a String corresponding to the next word. If there are no more words to return, it retuns a null String.

Returns:: the next word.

splitWords

public static java.lang.String[] splitWords(java.lang.String text)

Splits a given String into an array with its constituent words.

Parameters:: text - A String.
Returns:: An array with the words extracted from the String.

splitSegments

public static java.lang.String[] splitSegments(java.lang.String text)

Splits a given String into an array with its constituent text segments.

Parameters:: text - A String.
Returns:: An array with the text segments extracted from the String.

splitWordGrams

public static java.lang.String[] splitWordGrams(java.lang.String text,
                                                int n)

Splits a given String into an array with its constituent word n-grams.

Parameters:: text - A String.; n - Number of consecutive words on the n-grams.
Returns:: An array with the word n-grams extracted from the String.

splitNGrams

public static java.lang.String[] splitNGrams(java.lang.String text,
                                             int n)

Splits a given String into an array with its constituent character n-grams.

Parameters:: text - A String.; n - Number of consecutive characters on the n-grams.
Returns:: An array with the character n-grams extracted from the String.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

pt.tumba.cage Class DefaultWordFinder

DefaultWordFinder

DefaultWordFinder

currentWordGram

currentNGram

currentSegment

nextSegment

replaceSegment

getText

setText

current

hasNext

replace

replaceBigram

lookAhead

startsSentence

toString

next

splitWords

splitSegments

splitWordGrams

splitNGrams

pt.tumba.cage
Class DefaultWordFinder