Hive is a popular tool to query data in HDFS. Oracle SQL Connector for HDFS can read data directly from a Hive table in version 2.0. Also, the command-line tool of Oracle SQL Connector for HDFS has been simplified greatly. The connector only needs the Hive table and Hive database name, specified as Hadoop configuration properties, to access data from a Hive table.
oracle.hadoop.exttab.sourceType=hive
oracle.hadoop.exttab.hive.tableName=<hive table name>
oracle.hadoop.exttab.hive.databaseName=<hive
database name>
A command-line tool creates an external table in Oracle
Database. This external table is queried
to access data in the Hive table. Since an
external table can be queried like any other table in the database, Oracle SQL
functionality is available to query the Hive table.
The name of the external table and database connection information are also specified as Hadoop configuration properties.
hadoop jar
$OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable
-D oracle.hadoop.exttab.sourceType=hive
\
-D
oracle.hadoop.exttab.hive.tableName=<hive table name> \
-D
oracle.hadoop.exttab.hive.databaseName=<hive database name> \
-D
oracle.hadoop.connection.url=<database connection url> \
-D oracle.hadoop.exttab.tableName=<external
table name> \
….
-createTable
While creating the external table the Hive table data types are
automatically mapped to Oracle data types in the external table. Below are some sample mappings:
Hive Tinyint, Smallint, Int Oracle NUMBER
Hive Float, Double Oracle NUMBER
(default precision)
Hive Decimal Oracle NUMBER
Hive Boolean Oracle VARCHAR2(5)
Hive String Oracle VARCHAR2(4000)
Hive Timestamp Oracle TIMESTAMP