This blog describes the steps necessary to configure Oracle
SQL Connector for Hadoop Distributed File System (OSCH) to
enable Oracle Database running on Oracle Solaris to access and
analyze data residing in Cloudera HDFS running on an Oracle
Big Data Appliance. The steps have been verified against a
Cloudera Distribution including Apache Hadoop running in the Oracle
Big Data Lite Virtual Machine. My configuration adventures
regarding the Oracle Big Data Lite Virtual Machine and VirtualBox
are shown in this
previous blog entry.
Although similar steps are expected to work on many similar hardware and software configurations, it is worth noting that this document has been tested using the following two configurations:
Part A: Preparing the initial environment
Step 1: Oracle Database 12c deployment
* Follow the typical database deployment guides:
Oracle Database Online Documentation 12c Release 1 (12.1)
-> Installing and Upgrading
--> Under, "Solaris Installation Guides", follow the guide that suites your needs:
---> Grid Infrastructure Installation Guide for Oracle Solaris
---> Database Installation Guide for Oracle Solaris
---> Database Quick Installation Guide for Oracle Solaris on SPARC (64-Bit)
Step 2: Oracle Big Data deployment
* For the Big Data Appliance, follow the instructions in the Oracle Big Data Documentation.
* For the Oracle Big Data Lite Virtual Machine, follow the Quick Deployment Step-by-step instructions. Also, see my configuration adventures recounted in this previous blog entry.
Part B: Enable the Cloudera HDFS client on Solaris
* It is assumed that both Oracle database and Hadoop cluster are up and running independently before you attempt to install the connector.
Step 1: Make sure that Java is installed on Solaris
$ sudo pkg install --accept jdk-7
$ /usr/java/bin/java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) Server VM (build 24.65-b04, mixed mode)
Step 2: Determine the Version of Cloudera that is running on your Big Data Appliance:
* Starting with the BDA up and running:
![initial_state.png]()
* Click on the "Support" pull down and the "About":
![about_1.png]()
* In my case, I was using Cloudera Express 5.1.2
![about.png]()
Step 3: On the Oracle Database servers, install the version of Hadoop that matches your cluster
* Visit Cloudera Downloads, click on "CDH Download Now", and choose your version. In my case, CDH 5.1.2.
![file:///Users/user/Desktop/Connector/download_1.png]()
* This took me to the Packaging and Tarballs page for all of the CDH components. Download the Apache Hadoop Tarball
![download_2.png]()
Place this tarball onto your Solaris server, or in the case of Oracle RAC, copy the tarball to every node.There are many ways to get the tarball onto your Solaris server. In my case, the most convenient method was to use wget:
$ export http_proxy=http://the-proxy-server.bigcorp.com:80
$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.2.tar.gz
$ scp hadoop-2.3.0-cdh5.1.2.tar.gz oracle@my-rac-node-2:
Unpack the tarball on your Solaris server, or on every RAC node:
$ tar xzf ../Downloads/hadoop-2.3.0-cdh5.1.2.tar.gz
And verify:
$ ls hadoop-2.3.0-cdh5.1.2
bin examples libexec
bin-mapreduce1 examples-mapreduce1 sbin
cloudera include share
etc lib src
Step 4: Download the hdfs configuration files to the Solaris RDBMS server
* In the Cloudera Manager, go the the hdfs page:
![client_1.png]()
* From the hdfs page, download the client configuration. Place the client configuration on the Solaris server(s)
![client_2.png]()
* Unpack the client configuration on your Solaris server(s):
$ unzip ./Downloads/hdfs-clientconfig.zip
$ ls hadoop-conf
core-site.xml hdfs-site.xml topology.map
hadoop-env.sh log4j.properties topology.py
Step 5: Configure the Hadoop client software on the Solaris server(s)
* Edit hadoop-conf/hadoop-env.sh set JAVA_HOME correctly:
export JAVA_HOME=/usr/jdk/instances/jdk1.7.0
* Move the configuration files into place:
$ cp hadoop-conf/* hadoop-2.3.0-cdh5.1.2/etc/hadoop/
* Add the Hadoop bin directory to your PATH. You may want to do this in your local shell or .bashrc
$ export PATH=~/hadoop-2.3.0-cdh5.1.2/bin:$PATH
Step 6: Test the Hadoop client software on Solaris
* Verify that the remote hdfs filesystem is visible from your Solaris server(s)
$ hdfs dfs -ls
14/12/12 09:35:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
drwx------ - oracle oracle 0 2014-08-25 02:55 .Trash
drwx------ - oracle oracle 0 2014-09-23 10:25 .staging
drwxr-xr-x - oracle oracle 0 2014-01-12 15:15 moviedemo
drwxr-xr-x - oracle oracle 0 2014-09-24 06:38 moviework
drwxr-xr-x - oracle oracle 0 2014-09-08 12:50 oggdemo
drwxr-xr-x - oracle oracle 0 2014-09-20 10:59 oozie-oozi
Part C: Install "Oracle SQL Connector for HDFS" (OSCH) on your Solaris server(s)
Step 1: Download OSCH
* Download Oracle SQL Connector for Hadoop Distributed File System Release 3.1.0 from the Oracle Big Data Connectors Downloads page.
* Unzip OSCH
$ unzip oraosch-3.1.0.zip
Archive: oraosch-3.1.0.zip
extracting: orahdfs-3.1.0.zip
inflating: README.txt
$ unzip orahdfs-3.1.0
Archive: orahdfs-3.1.0.zip
replace orahdfs-3.1.0/doc/README.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
inflating: orahdfs-3.1.0/doc/README.txt
inflating: orahdfs-3.1.0/jlib/ojdbc6.jar
inflating: orahdfs-3.1.0/jlib/osdt_cert.jar
inflating: orahdfs-3.1.0/jlib/orahdfs.jar
inflating: orahdfs-3.1.0/jlib/oraclepki.jar
inflating: orahdfs-3.1.0/jlib/osdt_core.jar
inflating: orahdfs-3.1.0/jlib/ora-hadoop-common.jar
inflating: orahdfs-3.1.0/bin/hdfs_stream
inflating: orahdfs-3.1.0/examples/sql/mkhive_unionall_view.sql
Step 2: Install OSCH
* Follow the instructions in the Connectors User's Guide which is available as part of the Oracle Big Data Documentation. Also see
Getting Started with Oracle Big Data Connectors.
* I'm tempted to cut & paste everything from "1.4.3 Installing Oracle SQL Connector for HDFS" into this document, but I won't.
* Your mileage may vary, but for me, it looked like this:
$ tail -1 /var/opt/oracle/oratab
dbmc1:/u01/app/oracle/product/12.1.0.2/dbhome_1:N # line added by Agent
$ export PATH=/usr/local/bin:$PATH
$ export ORACLE_SID=dbmc1
$ . /usr/local/bin/oraenv
ORACLE_SID = [dbmc1] ?
The Oracle base has been set to /u01/app/oracle
$ env | grep ORA
ORACLE_SID=dbmc1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
$ srvctl status database -d dbmc1
Instance dbmc11 is running on node etc20dbadm01
Instance dbmc12 is running on node etc20dbadm02
$ export OSCH_HOME=/export/home/oracle/orahdfs-3.1.0
$ export HADOOP_CLASSPATH=$OSCH_HOME/jlib/*
$ sqlplus / as sysdba
SQL> CREATE USER hdfsuser IDENTIFIED BY n0ne0fyourBusiness
DEFAULT TABLESPACE users
QUOTA UNLIMITED ON users;
SQL> GRANT CREATE SESSION, CREATE TABLE, CREATE VIEW TO hdfsuser;
SQL> GRANT EXECUTE ON sys.utl_file TO hdfsuser;
SQL> CREATE OR REPLACE DIRECTORY osch_bin_path AS '/export/home/oracle/orahdfs-3.1.0/bin'
SQL> GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO hdfsuser;
Notice that MOVIEDEMO_DIR needs to be on shared storage, visible to both RAC nodes. From a Solaris shell prompt, create the MOVIEDEMO_DIR, substituting the ZFS_SA InfiniBand hostname for XXX, below. Then allow the database user to assess the directory:
$ mkdir /net/XXX/export/test/hdfsuser
$ sqlplus / as sysdba
SQL> CREATE OR REPLACE DIRECTORY MOVIEDEMO_DIR AS '/net/XXX/export/test/hdfsuser';
SQL> GRANT READ, WRITE ON DIRECTORY MOVIEDEMO_DIR TO hdfsuser;
Step 3: Test using the Movie Demo
Test using the movie demo which is documented in the Big Data Connectors User's Guide. Cut and paste moviefact_hdfs.sh and moviefact_hdfs.xml from Example 2-1 Accessing HDFS Data Files from Oracle Database
In moviefact_hdfs.sh, for my configuration I need to change the path to OSCH_HOME and moviefact_hdfs.xml.
In moviefact_hdfs.xml, I needed to change two properties, as follows. For the database connection, use the Oracle Single Client Access Name (SCAN).
<property>
<name>oracle.hadoop.connection.url</name>
<value>jdbc:oracle:thin:@sc-scan:1521/dbmc1</value>
</property>
<property>
<name>oracle.hadoop.connection.user</name>
<value>hdfsuser</value>
</property>
Run the script:
$ sh moviefact_hdfs.sh
Oracle SQL Connector for HDFS Release 3.1.0 - Production
Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
[Enter Database Password:]
14/12/12 12:36:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The create table command succeeded.
CREATE TABLE "HDFSUSER"."MOVIE_FACTS_EXT"
(
"CUST_ID" VARCHAR2(4000),
"MOVIE_ID" VARCHAR2(4000),
"GENRE_ID" VARCHAR2(4000),
"TIME_ID" TIMESTAMP(9),
"RECOMMENDED" NUMBER,
"ACTIVITY_ID" NUMBER,
"RATING" NUMBER,
"SALES" NUMBER
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY "MOVIEDEMO_DIR"
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0X'0A'
CHARACTERSET AL32UTF8
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
FIELDS TERMINATED BY 0X'09'
MISSING FIELD VALUES ARE NULL
(
"CUST_ID" CHAR(4000),
"MOVIE_ID" CHAR(4000),
"GENRE_ID" CHAR(4000),
"TIME_ID" CHAR DATE_FORMAT TIMESTAMP MASK 'YYYY-MM-DD:HH:MI:SS',
"RECOMMENDED" CHAR,
"ACTIVITY_ID" CHAR,
"RATING" CHAR,
"SALES" CHAR
)
)
LOCATION
(
'osch-20141212123602-2522-1',
'osch-20141212123602-2522-2',
'osch-20141212123602-2522-3',
'osch-20141212123602-2522-4'
)
) PARALLEL REJECT LIMIT UNLIMITED;
The following location files were created.
osch-20141212123602-2522-1 contains 1 URI, 12754882 bytes
12754882 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00001
osch-20141212123602-2522-2 contains 1 URI, 438 bytes
438 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00002
osch-20141212123602-2522-3 contains 1 URI, 432 bytes
432 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00003
osch-20141212123602-2522-4 contains 1 URI, 202 bytes
202 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00004
Final verification with SQL Developer:
![/test.png]()
Although similar steps are expected to work on many similar hardware and software configurations, it is worth noting that this document has been tested using the following two configurations:
Tested
Configuration 1 | ||
RDBMS | Connector | Hadoop |
|
|
|
Tested Configuration 2 | ||
RDBMS | Connector | Hadoop |
|
|
|
Part A: Preparing the initial environment
Step 1: Oracle Database 12c deployment
* Follow the typical database deployment guides:
Oracle Database Online Documentation 12c Release 1 (12.1)
-> Installing and Upgrading
--> Under, "Solaris Installation Guides", follow the guide that suites your needs:
---> Grid Infrastructure Installation Guide for Oracle Solaris
---> Database Installation Guide for Oracle Solaris
---> Database Quick Installation Guide for Oracle Solaris on SPARC (64-Bit)
Step 2: Oracle Big Data deployment
* For the Big Data Appliance, follow the instructions in the Oracle Big Data Documentation.
* For the Oracle Big Data Lite Virtual Machine, follow the Quick Deployment Step-by-step instructions. Also, see my configuration adventures recounted in this previous blog entry.
Part B: Enable the Cloudera HDFS client on Solaris
* It is assumed that both Oracle database and Hadoop cluster are up and running independently before you attempt to install the connector.
Step 1: Make sure that Java is installed on Solaris
$ sudo pkg install --accept jdk-7
$ /usr/java/bin/java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) Server VM (build 24.65-b04, mixed mode)
Step 2: Determine the Version of Cloudera that is running on your Big Data Appliance:
* Starting with the BDA up and running:

* Click on the "Support" pull down and the "About":

* In my case, I was using Cloudera Express 5.1.2

Step 3: On the Oracle Database servers, install the version of Hadoop that matches your cluster
* Visit Cloudera Downloads, click on "CDH Download Now", and choose your version. In my case, CDH 5.1.2.

* This took me to the Packaging and Tarballs page for all of the CDH components. Download the Apache Hadoop Tarball

Place this tarball onto your Solaris server, or in the case of Oracle RAC, copy the tarball to every node.There are many ways to get the tarball onto your Solaris server. In my case, the most convenient method was to use wget:
$ export http_proxy=http://the-proxy-server.bigcorp.com:80
$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.2.tar.gz
$ scp hadoop-2.3.0-cdh5.1.2.tar.gz oracle@my-rac-node-2:
Unpack the tarball on your Solaris server, or on every RAC node:
$ tar xzf ../Downloads/hadoop-2.3.0-cdh5.1.2.tar.gz
And verify:
$ ls hadoop-2.3.0-cdh5.1.2
bin examples libexec
bin-mapreduce1 examples-mapreduce1 sbin
cloudera include share
etc lib src
Step 4: Download the hdfs configuration files to the Solaris RDBMS server
* In the Cloudera Manager, go the the hdfs page:

* From the hdfs page, download the client configuration. Place the client configuration on the Solaris server(s)

* Unpack the client configuration on your Solaris server(s):
$ unzip ./Downloads/hdfs-clientconfig.zip
$ ls hadoop-conf
core-site.xml hdfs-site.xml topology.map
hadoop-env.sh log4j.properties topology.py
Step 5: Configure the Hadoop client software on the Solaris server(s)
* Edit hadoop-conf/hadoop-env.sh set JAVA_HOME correctly:
export JAVA_HOME=/usr/jdk/instances/jdk1.7.0
* Move the configuration files into place:
$ cp hadoop-conf/* hadoop-2.3.0-cdh5.1.2/etc/hadoop/
* Add the Hadoop bin directory to your PATH. You may want to do this in your local shell or .bashrc
$ export PATH=~/hadoop-2.3.0-cdh5.1.2/bin:$PATH
Step 6: Test the Hadoop client software on Solaris
* Verify that the remote hdfs filesystem is visible from your Solaris server(s)
$ hdfs dfs -ls
14/12/12 09:35:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
drwx------ - oracle oracle 0 2014-08-25 02:55 .Trash
drwx------ - oracle oracle 0 2014-09-23 10:25 .staging
drwxr-xr-x - oracle oracle 0 2014-01-12 15:15 moviedemo
drwxr-xr-x - oracle oracle 0 2014-09-24 06:38 moviework
drwxr-xr-x - oracle oracle 0 2014-09-08 12:50 oggdemo
drwxr-xr-x - oracle oracle 0 2014-09-20 10:59 oozie-oozi
Part C: Install "Oracle SQL Connector for HDFS" (OSCH) on your Solaris server(s)
Step 1: Download OSCH
* Download Oracle SQL Connector for Hadoop Distributed File System Release 3.1.0 from the Oracle Big Data Connectors Downloads page.
* Unzip OSCH
$ unzip oraosch-3.1.0.zip
Archive: oraosch-3.1.0.zip
extracting: orahdfs-3.1.0.zip
inflating: README.txt
$ unzip orahdfs-3.1.0
Archive: orahdfs-3.1.0.zip
replace orahdfs-3.1.0/doc/README.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
inflating: orahdfs-3.1.0/doc/README.txt
inflating: orahdfs-3.1.0/jlib/ojdbc6.jar
inflating: orahdfs-3.1.0/jlib/osdt_cert.jar
inflating: orahdfs-3.1.0/jlib/orahdfs.jar
inflating: orahdfs-3.1.0/jlib/oraclepki.jar
inflating: orahdfs-3.1.0/jlib/osdt_core.jar
inflating: orahdfs-3.1.0/jlib/ora-hadoop-common.jar
inflating: orahdfs-3.1.0/bin/hdfs_stream
inflating: orahdfs-3.1.0/examples/sql/mkhive_unionall_view.sql
Step 2: Install OSCH
* Follow the instructions in the Connectors User's Guide which is available as part of the Oracle Big Data Documentation. Also see
Getting Started with Oracle Big Data Connectors.
* I'm tempted to cut & paste everything from "1.4.3 Installing Oracle SQL Connector for HDFS" into this document, but I won't.
* Your mileage may vary, but for me, it looked like this:
$ tail -1 /var/opt/oracle/oratab
dbmc1:/u01/app/oracle/product/12.1.0.2/dbhome_1:N # line added by Agent
$ export PATH=/usr/local/bin:$PATH
$ export ORACLE_SID=dbmc1
$ . /usr/local/bin/oraenv
ORACLE_SID = [dbmc1] ?
The Oracle base has been set to /u01/app/oracle
$ env | grep ORA
ORACLE_SID=dbmc1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
$ srvctl status database -d dbmc1
Instance dbmc11 is running on node etc20dbadm01
Instance dbmc12 is running on node etc20dbadm02
$ export OSCH_HOME=/export/home/oracle/orahdfs-3.1.0
$ export HADOOP_CLASSPATH=$OSCH_HOME/jlib/*
$ sqlplus / as sysdba
SQL> CREATE USER hdfsuser IDENTIFIED BY n0ne0fyourBusiness
DEFAULT TABLESPACE users
QUOTA UNLIMITED ON users;
SQL> GRANT CREATE SESSION, CREATE TABLE, CREATE VIEW TO hdfsuser;
SQL> GRANT EXECUTE ON sys.utl_file TO hdfsuser;
SQL> CREATE OR REPLACE DIRECTORY osch_bin_path AS '/export/home/oracle/orahdfs-3.1.0/bin'
SQL> GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO hdfsuser;
Notice that MOVIEDEMO_DIR needs to be on shared storage, visible to both RAC nodes. From a Solaris shell prompt, create the MOVIEDEMO_DIR, substituting the ZFS_SA InfiniBand hostname for XXX, below. Then allow the database user to assess the directory:
$ mkdir /net/XXX/export/test/hdfsuser
$ sqlplus / as sysdba
SQL> CREATE OR REPLACE DIRECTORY MOVIEDEMO_DIR AS '/net/XXX/export/test/hdfsuser';
SQL> GRANT READ, WRITE ON DIRECTORY MOVIEDEMO_DIR TO hdfsuser;
Step 3: Test using the Movie Demo
Test using the movie demo which is documented in the Big Data Connectors User's Guide. Cut and paste moviefact_hdfs.sh and moviefact_hdfs.xml from Example 2-1 Accessing HDFS Data Files from Oracle Database
In moviefact_hdfs.sh, for my configuration I need to change the path to OSCH_HOME and moviefact_hdfs.xml.
In moviefact_hdfs.xml, I needed to change two properties, as follows. For the database connection, use the Oracle Single Client Access Name (SCAN).
<property>
<name>oracle.hadoop.connection.url</name>
<value>jdbc:oracle:thin:@sc-scan:1521/dbmc1</value>
</property>
<property>
<name>oracle.hadoop.connection.user</name>
<value>hdfsuser</value>
</property>
Run the script:
$ sh moviefact_hdfs.sh
Oracle SQL Connector for HDFS Release 3.1.0 - Production
Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
[Enter Database Password:]
14/12/12 12:36:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The create table command succeeded.
CREATE TABLE "HDFSUSER"."MOVIE_FACTS_EXT"
(
"CUST_ID" VARCHAR2(4000),
"MOVIE_ID" VARCHAR2(4000),
"GENRE_ID" VARCHAR2(4000),
"TIME_ID" TIMESTAMP(9),
"RECOMMENDED" NUMBER,
"ACTIVITY_ID" NUMBER,
"RATING" NUMBER,
"SALES" NUMBER
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY "MOVIEDEMO_DIR"
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0X'0A'
CHARACTERSET AL32UTF8
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
FIELDS TERMINATED BY 0X'09'
MISSING FIELD VALUES ARE NULL
(
"CUST_ID" CHAR(4000),
"MOVIE_ID" CHAR(4000),
"GENRE_ID" CHAR(4000),
"TIME_ID" CHAR DATE_FORMAT TIMESTAMP MASK 'YYYY-MM-DD:HH:MI:SS',
"RECOMMENDED" CHAR,
"ACTIVITY_ID" CHAR,
"RATING" CHAR,
"SALES" CHAR
)
)
LOCATION
(
'osch-20141212123602-2522-1',
'osch-20141212123602-2522-2',
'osch-20141212123602-2522-3',
'osch-20141212123602-2522-4'
)
) PARALLEL REJECT LIMIT UNLIMITED;
The following location files were created.
osch-20141212123602-2522-1 contains 1 URI, 12754882 bytes
12754882 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00001
osch-20141212123602-2522-2 contains 1 URI, 438 bytes
438 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00002
osch-20141212123602-2522-3 contains 1 URI, 432 bytes
432 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00003
osch-20141212123602-2522-4 contains 1 URI, 202 bytes
202 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00004
