Oracle Big Data Spatial and Graph has, in the property graph feature, two important components: data access layer and in-memory analyst. This first component, data access layer, allows one to store, manage, index, query, and traverse property graph data in a horizontally scalable database (Apache HBase or Oracle NoSQL Database). And the second component, in-memory analyst, offers a rich set of out-of-the-box graph analytics and graph operations. These two components together provide a solid framework for users to build graph based applications.
In this blog, I am going to demonstrate how graph traversal, an important function supported by the data access layer, can be used together with graph analytics.
- Setup
If you haven't already, download Oracle Big Data Lite Virtual Machine v4.4.0 (or newer) from the following page.
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
- Retrieve the latest property graph Hands-on-Lab/Demo scripts
- Login to Big Data Lite 4.4.0 VM
- Click Refersh Samples icon on the desktop, follow the instructions and download the latest property graph HoL/Demo scripts. (Kudos to Marty Gubar and Nigel Bayliss who designed this very cool script that can automatically fetch latest content from Github!)
- Open the following page using the Firefox browser
file:///home/oracle/src/hol/property_graph_hol_2015_Nov/property_graph_hol_2015_Nov.html
- Load example property graph data
- Follow steps described in 2.3 to 2.4.2 (if you are using Oracle NoSQL Database), or steps in 4.10 to 4.11 (for Apache HBase) to load an example property graph.
- Traverse the graph with Blueprints APIs and Gremlin syntax
In the built-in groovy shell, one can easily navigate the graph using either Blueprints Java APIs and/or Gremlin Syntax. A few examples as follows:
// find a start vertex using Blueprints Java API
opg-nosql> v=opg.getVertex(1l);
==>Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority}
opg-nosql> din=com.tinkerpop.blueprints.Direction.IN; dout= com.tinkerpop.blueprints.Direction.OUT;
==>OUT
// get in edges (using Java API)
opg-nosql> v.getEdges(din);
==>Edge ID 1078 from Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} =[collaborates]=> Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} edgeKV[{weight:flo:1.0}]
...
// get out edges (using Gremlin Syntax)
opg-nosql> v.outE
==>Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}]
...
// follow "collaborates" edges and add a filter on religion
opg-nosql> v.outE('collaborates').inV.filter{it.religion != 'Christianity'}
==>Vertex ID 3 {country:str:United States, name:str:Charlie Rose, role:str:talk show host journalist, show:str:Charlie Rose}
...
- Use PipeFunction to combine Gremlin traversal and In Memory analysis.
The following scripts will create a session and in memory analyst, compute page rank value for the vertices, and start a simple Gremlin traversal from vertex (with ID 1) and limit visited vertices to those with page rank value above a threshold.
// Create in-memory analytics session and analyst
session=Pgx.createSession("session_ID_1");
analyst=session.createAnalyst();
// Read the graph from database into memory
pgxGraph = session.readGraphWithProperties(opg.getConfig());
// Execute Page Rank
rank=analyst.pagerank(pgxGraph, 0.00000001, 0.85, 5000);
import com.tinkerpop.gremlin.java.*;
import com.tinkerpop.pipes.*;
opg-nosql> pipe = new GremlinPipeline(opg.getVertex(1).out("collaborates").filter(new PipeFunction<Vertex, Boolean>() { public Boolean compute(Vertex v) { if (rank.get(v.getId()) > 0.01) return true ; return false; } }));
// Traversal results shown below
==>Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress}
...
The important part of the above traversal is that it includes a Tinkerpop PipeFunction implementation which, upon receiving a vertex from the traversal, checks the analytical result (from a parallel in-memory page rank computation) for that vertex, and uses that information to guide the traversal.
Acknowledgement: thanks Jay Banerjee for his input on this blog post.