pt.tumba.cage
Class TeXWordFinder

java.lang.Object
  extended by pt.tumba.cage.DefaultWordFinder
      extended by pt.tumba.cage.TeXWordFinder

public class TeXWordFinder
extends DefaultWordFinder

A word finder for TeX and LaTeX documents, which searches text for sequences of letters, but ignores any commands and environments as well as Math environments.

Author:
Bruno Martins
See Also:
DefaultWordFinder

Field Summary
static int REG_EXPR
          Constant value specifying regular expressions on user defined ignores.
static int STRING_EXPR
          Constant value specifying strings on user defined ignores.
 
Constructor Summary
TeXWordFinder()
          Constructor for TexWordFinder.
TeXWordFinder(java.lang.String inText)
          Constructor for TeXWordFinder.
 
Method Summary
 void addUserDefinedIgnores(java.util.Collection expressions, int regex)
          This method is used to import a user defined set of either strings or regular expressions to ignore.
 java.lang.String currentSegment()
          Returns the current text segment from the input.
 java.lang.String next()
          This method scans the text from the end of the last word, and returns a String corresponding to the next word.
 void setIgnoreComments(boolean ignore)
          Allows one to indicate if TeX comments should be ignored.
 
Methods inherited from class pt.tumba.cage.DefaultWordFinder
current, currentNGram, currentWordGram, getText, hasNext, lookAhead, nextSegment, replace, replaceBigram, replaceSegment, setText, splitNGrams, splitSegments, splitWordGrams, splitWords, startsSentence, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

STRING_EXPR

public static final int STRING_EXPR
Constant value specifying strings on user defined ignores.

See Also:
Constant Field Values

REG_EXPR

public static final int REG_EXPR
Constant value specifying regular expressions on user defined ignores.

See Also:
Constant Field Values
Constructor Detail

TeXWordFinder

public TeXWordFinder(java.lang.String inText)
Constructor for TeXWordFinder.

Parameters:
inText - A String with the input text to tokenize.

TeXWordFinder

public TeXWordFinder()
Constructor for TexWordFinder.

Method Detail

currentSegment

public java.lang.String currentSegment()
Returns the current text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces.

Overrides:
currentSegment in class DefaultWordFinder
Returns:
A String with the current text segment.

next

public java.lang.String next()
This method scans the text from the end of the last word, and returns a String corresponding to the next word. If there are no more words to return, it retuns a null String.

Overrides:
next in class DefaultWordFinder
Returns:
the next word.

addUserDefinedIgnores

public void addUserDefinedIgnores(java.util.Collection expressions,
                                  int regex)
This method is used to import a user defined set of either strings or regular expressions to ignore.

Parameters:
expressions - a collection of of Objects whose toString() value should be the expression. Typically String objects.
regex - is an integer specifying the type of expression to use. e.g. REG_EXPR, STRING_EXPR.

setIgnoreComments

public void setIgnoreComments(boolean ignore)
Allows one to indicate if TeX comments should be ignored.

Parameters:
ignore - true if TeX comments should be ignored and false otherwise.