SDARTS

edu.columbia.cs.sdarts.backend.impls.text
Class TextDocumentEnum

java.lang.Object
  |
  +--edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
        |
        +--edu.columbia.cs.sdarts.backend.impls.text.TextDocumentEnum

public final class TextDocumentEnum
extends DocumentEnum

An implementation of DocumentEnum used to parse plain text files for indexing by the Lucene search engine. There is no need to instantiate this class; it is automatically used by the TextBackEndLSP. Basically, it uses DocFieldDescriptors and the regular expressions contained within them to extract fields from a file.


Fields inherited from class edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
BATCH_SIZE
 
Constructor Summary
TextDocumentEnum()
           
 
Method Summary
 com.lucene.document.Document createDocument(java.io.File f, org.omg.CORBA.IntHolder storeTokenCountHere)
          Builds a Lucene Document from a plain text file
 
Methods inherited from class edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
getDocConfig, getDocuments, initialize, isEmpty, makeValue, parseDate
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextDocumentEnum

public TextDocumentEnum()
Method Detail

createDocument

public com.lucene.document.Document createDocument(java.io.File f,
                                                   org.omg.CORBA.IntHolder storeTokenCountHere)
                                            throws BackEndException
Builds a Lucene Document from a plain text file
Overrides:
createDocument in class DocumentEnum
Following copied from class: edu.columbia.cs.sdarts.backend.doc.lucene.DocumentEnum
Parameters:
file - the File to turn into a Lucene Document
storeTokenCountHere - an OUT parameter; an implementor of this method should write the number of tokens in the file into the value field of this IntHolder
Returns:
a Lucene Document generated from the file
Throws:
BackEndException - if something goes wrong

SDARTS

Sdarts Homepage