Three Little Hive UDFs: Part 1

Introduction

In our ongoing series of posts explaining the in's and out's of Hive User Defined Functions, we're starting with the simplest case. Of the three little UDFs, today's entry built a straw house: simple, easy to put together, but limited in applicability. We'll walk through important parts of the code, but you can grab the whole source from github here.

Extending UDF

The first few lines of interest are very straightforward:

@Description(name="moving_avg",value="_FUNC_(x, n) - Returns the moving mean of a set of numbers over a window of n observations")
@UDFType(deterministic=false,stateful=true)

publicclassUDFSimpleMovingAverageextendsUDF

We're extending the UDF class with some decoration. The decoration is important for usability and functionality. The description decorator allows us to give the Hive some information to show users about how to use our UDF and what it's method signature will be. The UDFType decoration tells Hive what sort of behavior to expect from our function.

A deterministic UDF will always return the same output given a particular input. A square-root computing UDF will always return the same square root for 4, so we can say it is deterministic; a call to get the system time would not be. The stateful annotation of the UDFType decoration is relatively new to Hive (e.g., CDH4 and above). The stateful directive allows Hive to keep some static variables available across rows. The simplest example of this is a "row-sequence," which maintains a static counter which increments with each row processed.

Since square-root and row-counting aren't terribly interesting, we'll use the stateful annotation to build a simple moving average function. We'll return to the notion of a moving average later when we build a UDAF, so as to compare the two approaches.

privateDoubleWritableresult=newDoubleWritable();
  privatestaticArrayDeque<Double>window;
  intwindowSize;
  
  publicUDFSimpleMovingAverage(){
    result.set(0);

}

The above code is basic initialization. We make a double in which to hold the result, but it needs to be of class DoubleWritable so that MapReduce can properly serialize the data. We use a deque to hold our sliding window, and we need to keep track of the window's size. Finally, we implement a constructor for the UDF class.

 publicDoubleWritableevaluate(DoubleWritablev,IntWritablen){
    doublesum=0.0;
    doublemoving_average;
    doubleresidual;
    if(window==null)
    {
        window=newArrayDeque<Double>();

}

Here's the meat of the class: the evaluate method. This method will be called on each row by the map tasks. For any given row, we can't say whether or not our sliding window exists, so we initialize it if it's null.

//slide the window
    if(window.size()==n.get())
        window.pop();
            
    window.addLast(newDouble(v.get()));
            
    // compute the average
    for(Iterator<Double>i=window.iterator();i.hasNext();)

sum+=i.next().doubleValue();

Here we deal with the deque and compute the sum of the window's elements. Deques are essentially double-ended queues, so they make excellent sliding windows. If the window is full, we pop the oldest element and add the current value.

moving_average=sum/window.size();
    result.set(moving_average);

returnresult;

Computing the moving average without weighting is simply dividing the sum of our window by its size. We then set that value in our Writable variable and return it. The value is then emitted as part of the map task executing the UDF function.

Going Further

The stateful annotation made it simple for us to compute a moving average since we could keep the deque static. However, how would we compute a moving average if there was no notion of state between Hadoop tasks? At the end of the series we'll examine a UDAF that does this, but the algorithm ends up being much different. In the meantime, I challenge the reader to think about what sort of approach is needed to compute the window.

Three Little Hive UDFs: Part 1

Introduction

Extending UDF

Going Further

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112