Quantcast
Channel: Oracle Bloggers
Viewing all articles
Browse latest Browse all 19780

Pain Point #4: “Recoding R (or other) models into SQL, C, or Java takes time and is error prone”

$
0
0

In the previous post in this series Addressing Analytic Pain Points, I focused on some issues surrounding production deployment of advanced analytics solutions. One specific aspect of production deployment involves how to get predictive model results (e.g., scores) from R or leading vendor tools into applications that are based on programming languages such as SQL, C, or Java. In certain environments, one way to integrate predictive models involves recoding them into one of these languages. Recoding involves identifying the minimal information needed for scoring, i.e., making predictions, and implementing that in a language that is compatible with the target environment. For example, consider a linear regression model with coefficients. It can be fairly straightforward to write a SQL statement or a function in C or Java to produce a score using these coefficients. This translated model can then be integrated with production applications or systems.

While recoding has been a technique used for decades, it suffers from several drawbacks: latency, quality, and robustness. Latency refers to the time delay between the data scientist developing the solution and leveraging that solution in production. Customers recount historic horror stories where the process from analyst to software developers to application deployment took months. Quality comes into play on two levels: the coding and testing quality of the software produced, and the freshness of the model itself. In fast changing environments, models may become “stale” within days or weeks. As a result, latency can impact quality. In addition, while a stripped down implementation of the scoring function is possible, it may not account for all cases considered by the original algorithm implementer. As such, robustness, i.e., the ability to handle greater variation in the input data, may suffer.

One way to address this pain point is to make it easy to leverage predictive models immediately (especially open source R and in-database Oracle Advanced Analytics models), thereby eliminating the need to recode models. Since enterprise applications normally know how to interact with databases via SQL, as soon as a model is produced, it can be placed into production via SQL access. In the case of R models, these can be accessed using Oracle R Enterprise embedded R execution in parallel via ore.rowApply and, for select models, the ore.predict capability performs automatic translation of native R models for execution inside the database. In the case of native SQL Oracle Advanced Analytics interface algorithms, as found in Oracle Data Mining and exposed through an R interface in Oracle R Enterprise, users can perform scoring directly in Oracle Database. This capability minimizes or even eliminates latency, dramatically increases quality, and leverages the robustness of the original algorithm implementations.


Viewing all articles
Browse latest Browse all 19780

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>