Quantcast
Channel: Oracle Bloggers
Viewing all articles
Browse latest Browse all 19780

Oracle SQL Connector for HDFS on SPARC

$
0
0
This blog describes the steps necessary to configure Oracle SQL Connector for Hadoop Distributed File System (OSCH) to enable Oracle Database running on Oracle Solaris to access and analyze data residing in Cloudera HDFS running on an Oracle Big Data Appliance. The steps have been verified against a Cloudera Distribution including Apache Hadoop running in the Oracle Big Data Lite Virtual Machine. My configuration adventures regarding the Oracle Big Data Lite Virtual Machine and VirtualBox are shown in this previous blog entry.

Although similar steps are expected to work on many similar hardware and software configurations, it is worth noting that this document has been tested using the following two configurations:

Tested Configuration 1
RDBMS
Connector
Hadoop
  • Oracle SPARC T4-2 Server
  • Oracle Solaris 11.2
  • Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
  • Oracle SQL Connector for Hadoop Distributed File System v3.1.0
  • Oracle Big Data Appliance
  • Oracle Enterprise Linux 6.4
    • Cloudera Manager (5.1.0)
    • Cloudera Distribution including Apache Hadoop (CDH5.1.0)



        Tested Configuration 2
        RDBMS
        Connector
        Hadoop
        • SPARC SuperCluster T4-4
        • Oracle Solaris 11.1
        • Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
        • Oracle SQL Connector for Hadoop Distributed File System v3.1.0
        • Oracle Big Data Lite Virtual Machine 4.0.1
        • Oracle Enterprise Linux 6.4
        • Cloudera Manager (5.1.2)
        • Cloudera Distribution including Apache Hadoop (CDH5.1.2)


        Part A: Preparing the initial environment

        Step 1: Oracle Database 12c deployment

        * Follow the typical database deployment guides:

        Oracle Database Online Documentation 12c Release 1 (12.1)
        -> Installing and Upgrading
        --> Under, "Solaris Installation Guides", follow the guide that suites your needs:
        ---> Grid Infrastructure Installation Guide for Oracle Solaris
        ---> Database Installation Guide for Oracle Solaris
        ---> Database Quick Installation Guide for Oracle Solaris on SPARC (64-Bit)


        Step 2:
        Oracle Big Data deployment

        * For the Big Data Appliance, follow the instructions in the Oracle Big Data Documentation.

        * For the Oracle Big Data Lite Virtual Machine, follow the Quick Deployment Step-by-step instructions. Also, see my configuration adventures recounted in this previous blog entry.


        Part B: Enable the Cloudera HDFS client on Solaris

        * It is assumed that both Oracle database and Hadoop cluster are up and running independently before you attempt to install the connector.

        Step 1: Make sure that Java is installed on Solaris

        $ sudo pkg install --accept jdk-7

        $ /usr/java/bin/java -version
        java version "1.7.0_65"
        Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
        Java HotSpot(TM) Server VM (build 24.65-b04, mixed mode)


        Step 2: Determine the Version of Cloudera that is running on your Big Data Appliance:

        * Starting with the BDA up and running:

        initial_state.png

        * Click on the "Support" pull down and the "About":

        about_1.png

        * In my case, I was using Cloudera Express 5.1.2

        about.png



        Step 3:
        On the Oracle Database servers, install the version of Hadoop that matches your cluster

        * Visit Cloudera Downloads, click on "CDH Download Now", and choose your version. In my case, CDH 5.1.2.

        file:///Users/user/Desktop/Connector/download_1.png

        * This took me to the Packaging and Tarballs page for all of the CDH components. Download the Apache Hadoop Tarball

        download_2.png

        Place this tarball onto your Solaris server, or in the case of Oracle RAC, copy the tarball to every node.There are many ways to get the tarball onto your Solaris server. In my case, the most convenient method was to use wget:

        $ export http_proxy=http://the-proxy-server.bigcorp.com:80
        $ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.2.tar.gz
        $ scp hadoop-2.3.0-cdh5.1.2.tar.gz oracle@my-rac-node-2:


        Unpack the tarball on your Solaris server, or on every RAC node:

        $ tar xzf ../Downloads/hadoop-2.3.0-cdh5.1.2.tar.gz

        And verify:

        $ ls hadoop-2.3.0-cdh5.1.2
        bin                  examples             libexec
        bin-mapreduce1       examples-mapreduce1  sbin
        cloudera             include              share
        etc                  lib                  src


        Step 4: Download the hdfs configuration files to the Solaris RDBMS server

        * In the Cloudera Manager, go the the hdfs page:

        client_1.png

        * From the hdfs page, download the client configuration. Place the client configuration on the Solaris server(s)

        client_2.png


        * Unpack the client configuration on your Solaris server(s):

        $ unzip ./Downloads/hdfs-clientconfig.zip

        $ ls hadoop-conf
        core-site.xml     hdfs-site.xml     topology.map
        hadoop-env.sh     log4j.properties  topology.py



        Step 5: Configure the Hadoop client software on the Solaris server(s)

        * Edit hadoop-conf/hadoop-env.sh set JAVA_HOME correctly:

        export JAVA_HOME=/usr/jdk/instances/jdk1.7.0


        * Move the configuration files into place:

        $ cp hadoop-conf/* hadoop-2.3.0-cdh5.1.2/etc/hadoop/


        * Add the Hadoop bin directory to your PATH. You may want to do this in your local shell or .bashrc

        $ export PATH=~/hadoop-2.3.0-cdh5.1.2/bin:$PATH


        Step 6: Test the Hadoop client software on Solaris

        * Verify that the remote hdfs filesystem is visible from your Solaris server(s)

        $ hdfs dfs -ls
        14/12/12 09:35:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        Found 6 items
        drwx------   - oracle oracle          0 2014-08-25 02:55 .Trash
        drwx------   - oracle oracle          0 2014-09-23 10:25 .staging
        drwxr-xr-x   - oracle oracle          0 2014-01-12 15:15 moviedemo
        drwxr-xr-x   - oracle oracle          0 2014-09-24 06:38 moviework
        drwxr-xr-x   - oracle oracle          0 2014-09-08 12:50 oggdemo
        drwxr-xr-x   - oracle oracle          0 2014-09-20 10:59 oozie-oozi



        Part C: Install "Oracle SQL Connector for HDFS" (OSCH) on your Solaris server(s)

        Step 1: Download OSCH


        * Download Oracle SQL Connector for Hadoop Distributed File System Release 3.1.0 from the Oracle Big Data Connectors Downloads page.

        * Unzip OSCH

        $ unzip oraosch-3.1.0.zip
        Archive:  oraosch-3.1.0.zip
         extracting: orahdfs-3.1.0.zip      
          inflating: README.txt             

        $ unzip orahdfs-3.1.0
        Archive:  orahdfs-3.1.0.zip
        replace orahdfs-3.1.0/doc/README.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
          inflating: orahdfs-3.1.0/doc/README.txt 
          inflating: orahdfs-3.1.0/jlib/ojdbc6.jar 
          inflating: orahdfs-3.1.0/jlib/osdt_cert.jar 
          inflating: orahdfs-3.1.0/jlib/orahdfs.jar 
          inflating: orahdfs-3.1.0/jlib/oraclepki.jar 
          inflating: orahdfs-3.1.0/jlib/osdt_core.jar 
          inflating: orahdfs-3.1.0/jlib/ora-hadoop-common.jar 
          inflating: orahdfs-3.1.0/bin/hdfs_stream 
          inflating: orahdfs-3.1.0/examples/sql/mkhive_unionall_view.sql  


        Step 2: Install OSCH

        * Follow the instructions in the Connectors User's Guide which is available as part of the Oracle Big Data Documentation. Also see
        Getting Started with Oracle Big Data Connectors.

        * I'm tempted to cut & paste everything from "1.4.3 Installing Oracle SQL Connector for HDFS" into this document, but I won't.

        * Your mileage may vary, but for me, it looked like this:

        $ tail -1 /var/opt/oracle/oratab
        dbmc1:/u01/app/oracle/product/12.1.0.2/dbhome_1:N               # line added by Agent

        $ export PATH=/usr/local/bin:$PATH

        $ export ORACLE_SID=dbmc1

        $ . /usr/local/bin/oraenv
        ORACLE_SID = [dbmc1] ?
        The Oracle base has been set to /u01/app/oracle

        $ env | grep ORA
        ORACLE_SID=dbmc1
        ORACLE_BASE=/u01/app/oracle
        ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1

        $ srvctl status database -d dbmc1
        Instance dbmc11 is running on node etc20dbadm01
        Instance dbmc12 is running on node etc20dbadm02

        $ export OSCH_HOME=/export/home/oracle/orahdfs-3.1.0

        $ export HADOOP_CLASSPATH=$OSCH_HOME/jlib/*


        $ sqlplus / as sysdba

        SQL> CREATE USER hdfsuser IDENTIFIED BY n0ne0fyourBusiness
           DEFAULT TABLESPACE users
           QUOTA UNLIMITED ON users;

        SQL> GRANT CREATE SESSION, CREATE TABLE, CREATE VIEW TO hdfsuser;

        SQL> GRANT EXECUTE ON sys.utl_file TO hdfsuser;

        SQL> CREATE OR REPLACE DIRECTORY osch_bin_path AS '/export/home/oracle/orahdfs-3.1.0/bin'
        SQL> GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO hdfsuser;

        Notice that MOVIEDEMO_DIR needs to be on shared storage, visible to both RAC nodes. From a Solaris shell prompt, create the MOVIEDEMO_DIR, substituting the ZFS_SA InfiniBand hostname for XXX, below.  Then allow the database user to assess the directory:

        $ mkdir /net/XXX/export/test/hdfsuser

        $ sqlplus / as sysdba
        SQL>
        CREATE OR REPLACE DIRECTORY MOVIEDEMO_DIR AS '/net/XXX/export/test/hdfsuser';
        SQL> GRANT READ, WRITE ON DIRECTORY MOVIEDEMO_DIR TO hdfsuser;

        Step 3: Test using the Movie Demo

        Test using the movie demo which is documented in the Big Data Connectors User's Guide. Cut and paste moviefact_hdfs.sh and moviefact_hdfs.xml from Example 2-1 Accessing HDFS Data Files from Oracle Database

        In moviefact_hdfs.sh, for my configuration I need to change the path to OSCH_HOME and moviefact_hdfs.xml.

        In moviefact_hdfs.xml, I needed to change two properties, as follows. For the database connection, use the Oracle Single Client Access Name (SCAN).

            <property>
              <name>oracle.hadoop.connection.url</name>
              <value>jdbc:oracle:thin:@sc-scan:1521/dbmc1</value>
            </property>

            <property>
              <name>oracle.hadoop.connection.user</name>
              <value>hdfsuser</value>
            </property>


        Run the script:

        $ sh moviefact_hdfs.sh
        Oracle SQL Connector for HDFS Release 3.1.0 - Production

        Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.

        [Enter Database Password:]
        14/12/12 12:36:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        The create table command succeeded.

        CREATE TABLE "HDFSUSER"."MOVIE_FACTS_EXT"
        (
         "CUST_ID"                        VARCHAR2(4000),
         "MOVIE_ID"                       VARCHAR2(4000),
         "GENRE_ID"                       VARCHAR2(4000),
         "TIME_ID"                        TIMESTAMP(9),
         "RECOMMENDED"                    NUMBER,
         "ACTIVITY_ID"                    NUMBER,
         "RATING"                         NUMBER,
         "SALES"                          NUMBER
        )
        ORGANIZATION EXTERNAL
        (
           TYPE ORACLE_LOADER
           DEFAULT DIRECTORY "MOVIEDEMO_DIR"
           ACCESS PARAMETERS
           (
             RECORDS DELIMITED BY 0X'0A'
             CHARACTERSET AL32UTF8
             PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
             FIELDS TERMINATED BY 0X'09'
             MISSING FIELD VALUES ARE NULL
             (
               "CUST_ID" CHAR(4000),
               "MOVIE_ID" CHAR(4000),
               "GENRE_ID" CHAR(4000),
               "TIME_ID" CHAR DATE_FORMAT TIMESTAMP MASK 'YYYY-MM-DD:HH:MI:SS',
               "RECOMMENDED" CHAR,
               "ACTIVITY_ID" CHAR,
               "RATING" CHAR,
               "SALES" CHAR
             )
           )
           LOCATION
           (
             'osch-20141212123602-2522-1',
             'osch-20141212123602-2522-2',
             'osch-20141212123602-2522-3',
             'osch-20141212123602-2522-4'
           )
        ) PARALLEL REJECT LIMIT UNLIMITED;

        The following location files were created.

        osch-20141212123602-2522-1 contains 1 URI, 12754882 bytes

            12754882 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00001

        osch-20141212123602-2522-2 contains 1 URI, 438 bytes

                 438 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00002

        osch-20141212123602-2522-3 contains 1 URI, 432 bytes

                 432 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00003

        osch-20141212123602-2522-4 contains 1 URI, 202 bytes

                 202 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00004



        Final verification with SQL Developer:

        /test.png

        Viewing all articles
        Browse latest Browse all 19780

        Trending Articles



        <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>