|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<K,V>
public abstract class InputFormat<K,V>
InputFormat describes the input-specification for a
Map-Reduce job.
The Map-Reduce framework relies on the InputFormat of the
job to:
InputSplits, each of
which is then assigned to an individual Mapper.
RecordReader implementation to be used to glean
input records from the logical InputSplit for processing by
the Mapper.
The default behavior of file-based InputFormats, typically
sub-classes of FileInputFormat, is to split the
input into logical InputSplits based on the total size, in
bytes, of the input files. However, the FileSystem blocksize of
the input files is treated as an upper bound for input splits. A lower bound
on the split size can be set via
mapred.min.split.size.
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries are to respected. In such cases, the
application has to also implement a RecordReader on whom lies the
responsibility to respect record-boundaries and present a record-oriented
view of the logical InputSplit to the individual task.
InputSplit,
RecordReader,
FileInputFormat| Constructor Summary | |
|---|---|
InputFormat()
|
|
| Method Summary | |
|---|---|
abstract RecordReader<K,V> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
Create a record reader for a given split. |
abstract List<InputSplit> |
getSplits(JobContext context)
Logically split the set of input files for the job. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public InputFormat()
| Method Detail |
|---|
public abstract List<InputSplit> getSplits(JobContext context)
throws IOException,
InterruptedException
Each InputSplit is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader to read the InputSplit.
context - job configuration.
InputSplits for the job.
IOException
InterruptedException
public abstract RecordReader<K,V> createRecordReader(InputSplit split,
TaskAttemptContext context)
throws IOException,
InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext) before
the split is used.
split - the split to be readcontext - the information about the task
IOException
InterruptedException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||