pt.tumba.geoclass
Class GKBParser

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by pt.tumba.geoclass.GKBParser
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class GKBParser
extends org.xml.sax.helpers.DefaultHandler

A SAX2 event handler class, used for parsing RDF data from our geographical knowledge base (GKB). Essentially, the knowledge base contains information about geographical features, encoded in the form of an OWL ontology.

The ontology uses a semantic location model where concepts are defined relativelly to given universes of discourse (i.e. architecture, physical geography, political geography or city planning). This perspective is to link a mathematical definition of position (i.e. a geometric model) to a more human freiendly notion of place.

The data parsed from the RDF file is used both for recognizing named entities in text, and for building a probabilistic graphical model of geographical concepts, which is latter used to help in classifying web pages according to their geographical scopes.

Author:
Bruno Martins

Constructor Summary
GKBParser()
          Constructor for GKBParser.
GKBParser(java.io.PrintStream output)
          Constructor for GKBParser.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Receive notification of character data inside an element.
 void endElement(java.lang.String uri, java.lang.String name, java.lang.String qName)
          Receive notification of the end of an element.
static ClassNetwork getClassNetwork(java.io.File file)
          Returns a network of geographical features parsed from a given RDF file.
static ClassNetwork getClassNetwork(java.io.File file, java.io.PrintStream output)
          Returns a network of geographical features parsed from a given RDF file.
static ClassNetwork getClassNetwork(java.lang.String file)
          Returns a network of geographical features parsed from a given RDF file.
static ClassNetwork getClassNetwork(java.lang.String file, java.io.PrintStream output)
          Returns a network of geographical features parsed from a given RDF file.
static java.util.Map getFeatures(java.io.File file, java.io.PrintStream output)
          Returns a network of geographical features parsed from a given RDF file.
static void main(java.lang.String[] args)
          The main method.
 void setOutput(java.io.Writer output)
          Sets up a writer for printing the list of features from the RDF file.
 void startDocument()
          Receive notification of the beginning of the document, and initialization of parsing variables.
 void startElement(java.lang.String uri, java.lang.String name, java.lang.String qName, org.xml.sax.Attributes atts)
          Receive notification of the start of an element.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GKBParser

public GKBParser()
Constructor for GKBParser.


GKBParser

public GKBParser(java.io.PrintStream output)
Constructor for GKBParser.

Parameters:
output - A PrintStream for printing the list of features from the RDF file.
Method Detail

startDocument

public void startDocument()
Receive notification of the beginning of the document, and initialization of parsing variables.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler

startElement

public void startElement(java.lang.String uri,
                         java.lang.String name,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
Receive notification of the start of an element. The Parser will invoke this method at the beginning of every element in the RDF document; there will be a corresponding endElement event for every startElement event (even when the element is empty). All of the element's content will be reported, in order, before the corresponding endElement event.

This event allows up to three name components for each element:

  • the Namespace URI;
  • the local name; and
  • the qualified (prefixed) name. Any or all of these may be provided, depending on the values of the http://xml.org/sax/features/namespaces and the http://xml.org/sax/features/namespace-prefixes properties:

    the Namespace URI and local name are required when the namespaces property is true (the default), and are optional when the namespaces property is false (if one is specified, both must be);

    the qualified name is required when the namespace-prefixes property is true, and is optional when the namespace-prefixes property is false (the default).

    The attribute list will contain attributes used for Namespace declarations (xmlns* attributes) only if the http://xml.org/sax/features/namespace-prefixes property is true (it is false by default, and support for a true value is optional).

    Specified by:
    startElement in interface org.xml.sax.ContentHandler
    Overrides:
    startElement in class org.xml.sax.helpers.DefaultHandler
    Parameters:
    localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
    qName - The qualified name (with prefix), or the empty string if qualified names are not available.
    atts - The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object.

  • endElement

    public void endElement(java.lang.String uri,
                           java.lang.String name,
                           java.lang.String qName)
    Receive notification of the end of an element.

    The SAX2 parser will invoke this method at the end of every element in the RDF document; there will be a corresponding startElement event for every endElement event (even when the element is empty).

    For information on the names, see startElement.

    Specified by:
    endElement in interface org.xml.sax.ContentHandler
    Overrides:
    endElement in class org.xml.sax.helpers.DefaultHandler
    Parameters:
    localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
    qName - The qualified XML 1.0 name (with prefix), or the empty string if qualified names are not available.

    characters

    public void characters(char[] ch,
                           int start,
                           int length)
    Receive notification of character data inside an element.

    According to the XML tags apearing in the document, this method takes specific actions for each chunk of character data (such as assigning the data to the appropriate variables).

    Specified by:
    characters in interface org.xml.sax.ContentHandler
    Overrides:
    characters in class org.xml.sax.helpers.DefaultHandler
    Parameters:
    ch - The characters.
    start - The start position in the character array.
    length - The end position in the character array.

    getClassNetwork

    public static ClassNetwork getClassNetwork(java.lang.String file)
                                        throws org.xml.sax.SAXException,
                                               java.io.IOException
    Returns a network of geographical features parsed from a given RDF file.

    Parameters:
    file - The path to the RDF file.
    Returns:
    A network with the grographical features described in the file.
    Throws:
    org.xml.sax.SAXException - A problem occurred while parsing the RDF data.
    java.io.IOException - A problem occurred while reading the File.

    getClassNetwork

    public static ClassNetwork getClassNetwork(java.lang.String file,
                                               java.io.PrintStream output)
                                        throws org.xml.sax.SAXException,
                                               java.io.IOException
    Returns a network of geographical features parsed from a given RDF file.

    Parameters:
    file - The path to the RDF file.
    output - A PrintStream for printing the list of features from the RDF file.
    Returns:
    A network with the grographical features described in the file.
    Throws:
    org.xml.sax.SAXException - A problem occurred while parsing the RDF data.
    java.io.IOException - A problem occurred while reading the File.

    getClassNetwork

    public static ClassNetwork getClassNetwork(java.io.File file)
                                        throws org.xml.sax.SAXException,
                                               java.io.IOException
    Returns a network of geographical features parsed from a given RDF file.

    Parameters:
    file - The RDF File.
    Returns:
    A network with the grographical features described in the file.
    Throws:
    org.xml.sax.SAXException - A problem occurred while parsing the RDF data.
    java.io.IOException - A problem occurred while reading the File.

    getClassNetwork

    public static ClassNetwork getClassNetwork(java.io.File file,
                                               java.io.PrintStream output)
                                        throws org.xml.sax.SAXException,
                                               java.io.IOException
    Returns a network of geographical features parsed from a given RDF file.

    Parameters:
    file - The RDF File.
    output - A PrintStream for printing the list of features from the RDF file.
    Returns:
    A network with the grographical features described in the file.
    Throws:
    org.xml.sax.SAXException - A problem occurred while parsing the RDF data.
    java.io.IOException - A problem occurred while reading the File.

    getFeatures

    public static java.util.Map getFeatures(java.io.File file,
                                            java.io.PrintStream output)
                                     throws org.xml.sax.SAXException,
                                            java.io.IOException
    Returns a network of geographical features parsed from a given RDF file.

    Parameters:
    file - The RDF File.
    output - A PrintStream for printing the list of features from the RDF file.
    Returns:
    A network with the grographical features described in the file.
    Throws:
    org.xml.sax.SAXException - A problem occurred while parsing the RDF data.
    java.io.IOException - A problem occurred while reading the File.

    setOutput

    public void setOutput(java.io.Writer output)
    Sets up a writer for printing the list of features from the RDF file. If null, no output will be produced.

    Parameters:
    output - A writer for printing the list of features from the RDF file.

    main

    public static void main(java.lang.String[] args)
                     throws java.lang.Exception
    The main method.

    Parameters:
    args - The command line options, tokenized.
    Throws:
    java.lang.Exception