Hive User Defined Function
A Hive User Defined Function is a user defined function that can operate in Apache Hive.
- AKA: Hive UDF.
- See: SQL Database Stored Procedure, Map/Reduce Job.
References
2012
- https://cwiki.apache.org/Hive/languagemanual-udf.html
- In the CLI, use the commands below to show the latest documentation:
SHOW FUNCTIONS; DESCRIBE FUNCTION <function_name>; DESCRIBE FUNCTION EXTENDED <function_name>;
2012
- Tom White. (2012). “Hadoop: The Definitive Guide, 3rd Edition." O'Reilly Media. ISBN: 978-1-4493-1152-0 http://my.safaribooksonline.com/book/software-engineering-and-development/9781449328917/12dot-hive/id2490083
- QUOTE: Sometimes the query you want to write can’t be expressed easily (or at all) using the built-in functions that Hive provides. By writing a user-defined function (UDF), Hive makes it easy to plug in your own processing code and invoke it from a Hive query.
UDFs have to be written in Java, the language that Hive itself is written in. For other languages, consider using a SELECT TRANSFORM query, which allows you to stream data through a user-defined script (MapReduce Scripts).
There are three types of UDF in Hive: (regular) UDFs, user-defined aggregate functions (UDAFs), and user-defined table-generating functions (UDTFs). They differ in the numbers of rows that they accept as input and produce as output.
- QUOTE: Sometimes the query you want to write can’t be expressed easily (or at all) using the built-in functions that Hive provides. By writing a user-defined function (UDF), Hive makes it easy to plug in your own processing code and invoke it from a Hive query.
2009
- http://dev.bizo.com/2009/06/custom-udfs-and-hive.html
- We just started playing around with Hive. Basically, it lets you write your hadoop map/reduce jobs using a SQL-like language. This is pretty powerful. Hive also seems to be pretty extendable -- custom data/serialization formats, custom functions, etc.
It turns out that writing your own UDF (user defined function) for use in hive is actually pretty simple.
All you need to do is extend UDF, and write one or more evaluate methods with a hadoop Writable return type. Here's an example of a complete implementation for a lower case function:
- We just started playing around with Hive. Basically, it lets you write your hadoop map/reduce jobs using a SQL-like language. This is pretty powerful. Hive also seems to be pretty extendable -- custom data/serialization formats, custom functions, etc.
package com.bizo.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text;
public final class Lower extends UDF { public Text evaluate(final Text s) { if (s == null) { return null; } return new Text(s.toString().toLowerCase()); } }