org.apache.hadoop.streaming
Class StreamXmlRecordReader
java.lang.Object
org.apache.hadoop.streaming.StreamBaseRecordReader
org.apache.hadoop.streaming.StreamXmlRecordReader
- All Implemented Interfaces:
- RecordReader<Text,Text>
public class StreamXmlRecordReader
- extends StreamBaseRecordReader
A way to interpret XML fragments as Mapper input records.
Values are XML subtrees delimited by configurable tags.
Keys could be the value of a certain attribute in the XML subtree,
but this is left to the stream processor application.
The name-value properties that StreamXmlRecordReader understands are:
String begin (chars marking beginning of record)
String end (chars marking end of record)
int maxrec (maximum record size)
int lookahead(maximum lookahead to sync CDATA)
boolean slowmatch
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
StreamXmlRecordReader
public StreamXmlRecordReader(FSDataInputStream in,
FileSplit split,
Reporter reporter,
JobConf job,
FileSystem fs)
throws IOException
- Throws:
IOException
init
public void init()
throws IOException
- Throws:
IOException
next
public boolean next(Text key,
Text value)
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Read a record. Implementation should call numRecStats at the end
- Specified by:
next in interface RecordReader<Text,Text>- Specified by:
next in class StreamBaseRecordReader
- Parameters:
key - the key to read data intovalue - the value to read data into
- Returns:
- true iff a key/value was read, false if at EOF
- Throws:
IOException
seekNextRecordBoundary
public void seekNextRecordBoundary()
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Implementation should seek forward in_ to the first byte of the next record.
The initial byte offset in the stream is arbitrary.
- Specified by:
seekNextRecordBoundary in class StreamBaseRecordReader
- Throws:
IOException
Copyright © 2009 The Apache Software Foundation