Are you one of the organizations adopting ‘big data systems’
to manage and analyze a class of data typically referred to as big data? If so,
you may know that big data includes data that could be structured, semi-structured
or unstructured, each of which originates from a variety of different sources. Another characterization of big data is
described by the data's volume, velocity, and veracity. Due to its promise to help harness the data
deluge we are faced with, the adoption of big data solutions is becoming quite
pervasive. In this blog post I’d like discuss how to leverage Oracle
GoldenGate’s real-time replication for big data systems.
The term 'big data systems' is an umbrella terminology used
in general to discuss a wide variety of technologies each of which is used for
a specific purpose. Broadly speaking, big data technologies address the needs
for batch, transactional, and real-time processing requirements. Using the appropriate big data technology is
highly dependent on the use case being addressed.
While gaining business intelligence from transactional data
continues to be a dominant factor in the decision making process, businesses
have realized that gaining intelligence from other forms of data they have been
collecting will enable them achieve a
more complete view, address additional business objectives, and lead to better
decision making. The following table
illustrates some examples of various industry verticals, forms of data, and the
objective the business attempts to achieve using the other forms of data.
Industry | Data | Objective |
Healthcare
| Practitioner’s notes, machine
statistics. | Best practices and reduced hospitalization. |
Retail | Weblog, click streams. | Micro-segmentation recommendations. |
Banking | Weblogs, fraud reports. | Fraud detection, risk analysis. |
Utilities | Smart meter reading, call center data. | Real-time and predictive utilization analysis. |
Role of transactional data
When using other forms of data for analytics, better
contextual intelligence is obtained when the analysis is combined with
transactional data. Especially low-latency transactional data brings additional
value to dynamically changing operations that day-old data cannot deliver. In organizations, a vast majority of applications'
transactional data is captured in relational databases. In order to ensure an efficient supply of transactional
data for big data analytics, there are several requirements that the data
integration solution should address:
·Reliable change data capture and delivery mechanism ·Minimal resource
consumption when extracting data from the relational data source ·Secured data delivery ·Ability to customize
data delivery ·Support heterogeneous
database sources ·Easy to install,
configure and maintain |
A solution which
can reliably stream database transactions to a desired target enables that the
effort is spent on data analysis rather than data acquisition. Also, when the
solution is non-intrusive and minimally impacts the source database, it minimizes
the need for additional resources and changes on the source database.
Oracle GoldenGate is a time tested and proven product for
real-time, heterogeneous relational database replication. Oracle GoldenGate addresses the challenges
listed above and is widely used by organizations for mission critical data
replication among relational databases. Furthermore, GoldenGate moves
transactional data in real-time to support timely operational business
intelligence needs.
Oracle GoldenGate Integration Options for Big Data Analytics
There is a variety of integration options available with the
Oracle GoldenGate product that facilitates delivering transactions on
relational databases into non-relational targets.
Oracle GoldenGate provides pre-built
adapters which integrate with Flat Files and Messaging Systems. Please refer to Oracle
GoldenGate for Java - Administration Guide and Oracle
GoldenGate for Flat Files -Administration Guide for more information.
Oracle GoldenGate also provides Java APIs and a framework for
developing custom integrations to Java enabled targets. Using this capability, custom
adapters or handlers can be developed to address specific requirements. In this blog post I’d like focus on Oracle
GoldenGate Java APIs for developing custom integrations to big data systems.
As we mentioned earlier,'big data systems' is an umbrella terminology used in general to describe a
wide variety of technologies, each of which is used for a specific
purpose. Among the various big data
systems, Hadoop and its suite of
technologies are widely adopted by various organizations for processing big
data. The below diagram illustrates a general high level architecture for
integrating with Hadoop.
![]()
![]()
![]()
![]()
![]()
You can implement custom adapter or handler for
the big data system using Oracle GoldenGate's Java API. The custom adapter is deployed as an integral
part of the Oracle GoldenGate Pump process. The Pump and the custom adapter are configured through the Pump
parameter file and custom adapter's properties file respectively. Depending upon the requirements, the
properties for the custom adapter will need to be determined and implemented.
The Pump process will execute the adapter in its
address space. The Pump reads the Trail File created by the Oracle GoldenGate
Capture process and passes the transactions to the adapter. Based on the configuration, the adapter
will write the transactions into Hadoop.
Enabling the co-existence of big data systems with relational
systems will benefit organizations to better serve customers and improve
decision-making capabilities. Oracle
GoldenGate, which has an excellent record of empowering IT on the various
aspects of data management requirements, provides the capability to integrate
with big data systems. In the upcoming
blog posts, we will discuss in depth the implementation and the configuration
of integrating Oracle GoldenGate with Hadoop technologies.