@InterfaceAudience.Public @InterfaceStability.Stable public class DBInputFormat<T extends DBWritable> extends InputFormat<LongWritable,T> implements Configurable
DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.
| Modifier and Type | Field and Description | 
|---|---|
| protected String | conditions | 
| protected Connection | connection | 
| protected DBConfiguration | dbConf | 
| protected String | dbProductName | 
| protected String[] | fieldNames | 
| protected String | tableName | 
| Constructor and Description | 
|---|
| DBInputFormat() | 
| Modifier and Type | Method and Description | 
|---|---|
| protected void | closeConnection() | 
| Connection | createConnection() | 
| protected RecordReader<LongWritable,T> | createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split,
                    Configuration conf) | 
| RecordReader<LongWritable,T> | createRecordReader(InputSplit split,
                  TaskAttemptContext context)Create a record reader for a given split. | 
| Configuration | getConf()Return the configuration used by this object. | 
| Connection | getConnection() | 
| protected String | getCountQuery()Returns the query for getting the total number of rows, 
 subclasses can override this for custom behaviour. | 
| DBConfiguration | getDBConf() | 
| String | getDBProductName() | 
| List<InputSplit> | getSplits(JobContext job)Logically split the set of input files for the job. | 
| void | setConf(Configuration conf)Set the configuration to be used by this object. | 
| static void | setInput(Job job,
        Class<? extends DBWritable> inputClass,
        String inputQuery,
        String inputCountQuery)Initializes the map-part of the job with the appropriate input settings. | 
| static void | setInput(Job job,
        Class<? extends DBWritable> inputClass,
        String tableName,
        String conditions,
        String orderBy,
        String... fieldNames)Initializes the map-part of the job with the appropriate input settings. | 
protected String dbProductName
protected String conditions
protected Connection connection
protected String tableName
protected String[] fieldNames
protected DBConfiguration dbConf
public void setConf(Configuration conf)
setConf in interface Configurableconf - configuration to be usedpublic Configuration getConf()
ConfigurablegetConf in interface Configurablepublic DBConfiguration getDBConf()
public Connection getConnection()
public Connection createConnection()
public String getDBProductName()
protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) throws IOException
IOExceptionpublic RecordReader<LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext) before
 the split is used.createRecordReader in class InputFormat<LongWritable,T extends DBWritable>split - the split to be readcontext - the information about the taskIOExceptionInterruptedExceptionpublic List<InputSplit> getSplits(JobContext job) throws IOException
Each InputSplit is then assigned to an individual Mapper
 for processing.
Note: The split is a logical split of the inputs and the
 input files are not physically split into chunks. For e.g. a split could
 be <input-file-path, start, offset> tuple. The InputFormat
 also creates the RecordReader to read the InputSplit.
getSplits in class InputFormat<LongWritable,T extends DBWritable>job - job configuration.InputSplits for the job.IOExceptionprotected String getCountQuery()
public static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the 
 Java object holding tuple fields.tableName - The table to read data fromconditions - The condition which to select data with, 
 eg. '(updated > 20070101 AND length > 0)'orderBy - the fieldNames in the orderBy clause.fieldNames - The field names in the tablesetInput(Job, Class, String, String)public static void setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the 
 Java object holding tuple fields.inputQuery - the input query to select fields. Example : 
 "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"inputCountQuery - the input query that returns 
 the number of records in the table. 
 Example : "SELECT COUNT(f1) FROM Mytable"setInput(Job, Class, String, String, String, String...)protected void closeConnection()
Copyright © 2024 Apache Software Foundation. All rights reserved.