Quantcast
Channel: Oracle Bloggers
Viewing all articles
Browse latest Browse all 19780

Combining Graph Traversal with Powerful Graph Analytics

$
0
0

Oracle Big Data Spatial and Graph has, in the property graph feature, two important components: data access layer and in-memory analyst. This first component, data access layer, allows one to store, manage, index, query, and traverse property graph data in a horizontally scalable database (Apache HBase or Oracle NoSQL Database). And the second component, in-memory analyst, offers a rich set of out-of-the-box graph analytics and graph operations. These two components together provide a solid framework for users to build graph based applications.

In this blog, I am going to demonstrate how graph traversal, an important function supported by the data access layer, can be used together with graph analytics.


  • Setup


If you haven't already, download Oracle Big Data Lite Virtual Machine v4.4.0 (or newer) from the following page.
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

  • Retrieve the latest property graph Hands-on-Lab/Demo scripts


- Login to Big Data Lite 4.4.0 VM

- Click Refersh Samples icon on the desktop, follow the instructions and download the latest property graph HoL/Demo scripts. (Kudos to Marty Gubar and Nigel Bayliss who designed this very cool script that can automatically fetch latest content from Github!)

- Open the following page using the Firefox browser
  file:///home/oracle/src/hol/property_graph_hol_2015_Nov/property_graph_hol_2015_Nov.html

  • Load example property graph data


- Follow steps described in 2.3 to 2.4.2 (if you are using Oracle NoSQL Database), or steps in 4.10 to 4.11 (for Apache HBase) to load an example property graph.

  • Traverse the graph with Blueprints APIs and Gremlin syntax

In the built-in groovy shell, one can easily navigate the graph using either Blueprints Java APIs and/or Gremlin Syntax. A few examples as follows:

// find a start vertex using Blueprints Java API
opg-nosql> v=opg.getVertex(1l);
==>Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority}

opg-nosql> din=com.tinkerpop.blueprints.Direction.IN; dout= com.tinkerpop.blueprints.Direction.OUT;
==>OUT

// get in edges (using Java API)
opg-nosql> v.getEdges(din);
==>Edge ID 1078 from Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} =[collaborates]=> Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} edgeKV[{weight:flo:1.0}]
...

// get out edges (using Gremlin Syntax)
opg-nosql> v.outE
==>Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}]
...

// follow "collaborates" edges and add a filter on religion
opg-nosql> v.outE('collaborates').inV.filter{it.religion != 'Christianity'}
==>Vertex ID 3 {country:str:United States, name:str:Charlie Rose, role:str:talk show host journalist, show:str:Charlie Rose}
...

  • Use PipeFunction to combine Gremlin traversal and In Memory analysis.

The following scripts will create a session and in memory analyst, compute page rank value for the vertices, and start a simple Gremlin traversal from vertex (with ID 1) and limit visited vertices to those with page rank value above a threshold.

// Create in-memory analytics session and analyst
session=Pgx.createSession("session_ID_1");
analyst=session.createAnalyst();

// Read the graph from database into memory
pgxGraph = session.readGraphWithProperties(opg.getConfig());

// Execute Page Rank
rank=analyst.pagerank(pgxGraph, 0.00000001, 0.85, 5000);

import com.tinkerpop.gremlin.java.*;
import com.tinkerpop.pipes.*;

opg-nosql> pipe = new GremlinPipeline(opg.getVertex(1).out("collaborates").filter(new PipeFunction<Vertex, Boolean>() { public Boolean compute(Vertex v) { if (rank.get(v.getId()) > 0.01) return true ; return false; } }));

// Traversal results shown below
 ==>Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress}
...

The important part of the above traversal is that it includes a Tinkerpop PipeFunction implementation which, upon receiving a vertex from the traversal, checks the analytical result (from a parallel in-memory page rank computation) for that vertex, and uses that information to guide the traversal.

Acknowledgement: thanks Jay Banerjee for his input on this blog post.



Viewing all articles
Browse latest Browse all 19780

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>