SDARTS

edu.columbia.cs.sdarts.backend.impls.xml
Class XMLDocumentEnum

java.lang.Object
  |
  +--edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
        |
        +--edu.columbia.cs.sdarts.backend.impls.xml.XMLDocumentEnum

public final class XMLDocumentEnum
extends DocumentEnum

An implementation of DocumentEnum used to parse XML files for indexing by the Lucene search engine. There is no need to instantiate this class; it is automatically used by the XMLBackEndLSP. Basically, it uses the doc_style.xsl file to learn how to extract fields from the document. The SDARTS Design Document contains more information on how this works, but the basic process is as follows:

Currently, XSL processing is being carried out by the Apache Xalan XSL processor. All Xalan-related code is confined to this class. A future version may want to hide the Xalan code behind another interface, in order to make it easier to switch to another XSL processor.


Fields inherited from class edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
BATCH_SIZE
 
Constructor Summary
XMLDocumentEnum()
           
 
Method Summary
 com.lucene.document.Document createDocument(java.io.File f, org.omg.CORBA.IntHolder storeTokenCountHere)
          Builds a Lucene Document from an XML file
 
Methods inherited from class edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
getDocConfig, getDocuments, initialize, isEmpty, makeValue, parseDate
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLDocumentEnum

public XMLDocumentEnum()
Method Detail

createDocument

public com.lucene.document.Document createDocument(java.io.File f,
                                                   org.omg.CORBA.IntHolder storeTokenCountHere)
                                            throws BackEndException
Builds a Lucene Document from an XML file
Overrides:
createDocument in class DocumentEnum
Following copied from class: edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
Parameters:
file - the File to turn into a Lucene Document
storeTokenCountHere - an OUT parameter; an implementor of this method should write the number of tokens in the file into the value field of this IntHolder
Returns:
a Lucene Document generated from the file
Throws:
BackEndException - if something goes wrong

SDARTS

Sdarts Homepage