This article starts with a brief overview of Hadoop and follows with an
example of setting up a Hadoop cluster with a NameNode, a secondary
NameNode, and three DataNodes using Oracle Solaris Zones.
The following are benefits of using Oracle Solaris Zones for a Hadoop cluster:
· Fast provision of new cluster members using the zone cloning feature
·Very high network throughput between the zones for data node replication
· Optimized disk I/O utilization for better I/O performance with ZFS built-in compression
· Secure data at rest using ZFS encryption
Hadoop use the Distributed File System (HDFS) in order to store data.
HDFS provides high-throughput access to application data and is suitable
for applications that have large data sets.
The Hadoop cluster building blocks are as follows:
·NameNode: The centerpiece of HDFS, which stores file system metadata,
directs the slave DataNode daemons to perform the low-level I/O tasks,
and also runs the JobTracker process.
·Secondary NameNode: Performs internal checks of the NameNode transaction log.
·DataNodes: Nodes that store the data in the HDFS file system, which are also known as slaves and run the TaskTracker process.
In the example presented in this article, all the Hadoop cluster
building blocks will be installed using the Oracle Solaris Zones, ZFS,
and network virtualization technologies.