public class SolrContentHandler extends DefaultHandler implements ExtractingParams
SolrInputDocuments.
This class is not thread-safe.
This class cannot be reused, you have to create a new instance per document!
User's may wish to override this class to provide their own functionality.
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
captureAttribs |
protected StringBuilder |
catchAllBuilder |
static String |
contentFieldName |
protected Collection<String> |
dateFormats |
protected String |
defaultField |
protected SolrInputDocument |
document |
protected Map<String,StringBuilder> |
fieldBuilders |
protected boolean |
lowerNames |
protected org.apache.tika.metadata.Metadata |
metadata |
protected SolrParams |
params |
protected IndexSchema |
schema |
protected String |
unknownFieldPrefix |
BOOST_PREFIX, CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_OVERRIDE, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, PASSWORD_MAP_FILE, RESOURCE_NAME, RESOURCE_PASSWORD, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION| Constructor and Description |
|---|
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema) |
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addCapturedContent()
Add the per field captured content to the Solr Document.
|
protected void |
addContent()
Add in the catch all content to the field.
|
protected void |
addField(String fname,
String fval,
String[] vals) |
protected void |
addLiterals()
Add in the literals to the document using the
params and the ExtractingParams.LITERALS_PREFIX. |
protected void |
addMetadata()
Add in any metadata using
metadata as the source. |
void |
characters(char[] chars,
int offset,
int length) |
void |
endElement(String uri,
String localName,
String qName) |
protected String |
findMappedName(String name)
Get the name mapping
|
protected float |
getBoost(String name)
Get the value of any boost factor for the mapped name.
|
void |
ignorableWhitespace(char[] chars,
int offset,
int length)
Treat the same as any other characters
|
SolrInputDocument |
newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument.
|
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes) |
protected String |
transformValue(String val,
SchemaField schFld)
Can be used to transform input values based on their
SchemaField
This implementation only formats dates using the DateUtil. |
endDocument, endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warningpublic static final String contentFieldName
protected final SolrInputDocument document
protected final Collection<String> dateFormats
protected final org.apache.tika.metadata.Metadata metadata
protected final SolrParams params
protected final StringBuilder catchAllBuilder
protected final IndexSchema schema
protected final Map<String,StringBuilder> fieldBuilders
protected final boolean captureAttribs
protected final boolean lowerNames
protected final String unknownFieldPrefix
protected final String defaultField
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema)
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats)
public SolrInputDocument newDocument()
SolrInputDocument.addMetadata(),
addCapturedContent(),
addContent(),
addLiterals()protected void addCapturedContent()
fieldBuilders infoprotected void addContent()
contentFieldName
and the catchAllBuilderprotected void addLiterals()
params and the ExtractingParams.LITERALS_PREFIX.protected void addMetadata()
metadata as the source.public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
startElement in interface ContentHandlerstartElement in class DefaultHandlerSAXExceptionpublic void endElement(String uri, String localName, String qName) throws SAXException
endElement in interface ContentHandlerendElement in class DefaultHandlerSAXExceptionpublic void characters(char[] chars,
int offset,
int length)
throws SAXException
characters in interface ContentHandlercharacters in class DefaultHandlerSAXExceptionpublic void ignorableWhitespace(char[] chars,
int offset,
int length)
throws SAXException
ignorableWhitespace in interface ContentHandlerignorableWhitespace in class DefaultHandlerSAXExceptionprotected String transformValue(String val, SchemaField schFld)
SchemaField
This implementation only formats dates using the DateUtil.val - The value to transformschFld - The SchemaFieldprotected float getBoost(String name)
name - The name of the field to see if there is a boost specifiedCopyright © 2000-2015 Apache Software Foundation. All Rights Reserved.