转载

Hive 中使用 UDF (用户自定义函数) 示例

记个简单的步骤方便后面使用。想实现的效果是自定义一个函数,用在 Hive 中。例如,在字符串前加个 Hello.

hive> select hello(firstname) from people limit 10;
 OK
 Hello hehe

环境

AWS EMR 5.20.0

编译 UDF 对应的 JAR

使用这个 git[1] 提供的源码作为示例。

$ git clone https://github.com/rathboma/hive-extension-examples.git

这个示例有个小问题,定义 class 的时候忘了指定 public。所以我们要把 public 先加上。修改 hive-extension-examples/src/main/java/com/matthewrathbone/example/SimpleUDFExample.java 如下:

public class SimpleUDFExample extends UDF {
 
   public Text evaluate(Text input) {
     if(input == null) return null;
     return new Text("Hello " + input.toString());
   }
 }

这里定义了一个继承 UDF 的类 SimpleUDFExample,后面 Hive 用作函数的类就在这里实现。它就是简单地返回一个加上 "Hello" 的字符串。

修改 hive-extension-examples/pom.xml 如下,使编译出来的 JAR 与 EMR 环境兼容[2]。

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 
 <build>
     <pluginManagement>
       <plugins>
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-surefire-plugin</artifactId>
           <version>2.8</version>
         </plugin>
         <plugin>
             <artifactId>maven-assembly-plugin</artifactId>
             <configuration>
                 <archive>
                     <manifest>
                         <mainClass>com.matthewrathbone.example.RawMapreduce</mainClass>
                     </manifest>
                 </archive>
                 <descriptorRefs>
                     <descriptorRef>jar-with-dependencies</descriptorRef>
                 </descriptorRefs>
             </configuration>
         </plugin>
       </plugins>
     </pluginManagement>
   </build>
 
   <modelVersion>4.0.0</modelVersion>
   <groupId>com.matthewrathbone.example</groupId>
   <artifactId>hive-extensions</artifactId>
   <packaging>jar</packaging>
   <version>1.0-SNAPSHOT</version>
   <name>hive-extensions</name>
   <url>http://maven.apache.org</url>
   <dependencies>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>2.8.5-amzn-1</version>
       <scope>provided</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>2.3.4-amzn-0</version>
       <scope>provided</scope>
     </dependency>
     <!-- TEST DEPENDENCIES -->
     <dependency>
       <groupId>org.apache.commons</groupId>
       <artifactId>commons-io</artifactId>
       <version>1.3.2</version>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>commons-httpclient</groupId>
       <artifactId>commons-httpclient</artifactId>
       <version>3.1</version>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-test</artifactId>
       <version>2.8.5-amzn-1</version>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>junit</groupId>
       <artifactId>junit</artifactId>
       <version>4.8.2</version>
       <scope>test</scope>
     </dependency>
   </dependencies>
     <repositories>
     <repository>
       <id>emr-5.20.0-artifacts</id>
       <releases>
         <enabled>true</enabled>
       </releases>
       <snapshots>
         <enabled>false</enabled>
       </snapshots>
       <url>https://s3.us-west-2.amazonaws.com/us-west-2-emr-artifacts/emr-5.20.0/repos/maven/</url>
     </repository>
   </repositories>
 </project>

编译打包:

$ cd hive-extension-examples
 $ mvn compile
 $ mvn assembly:single

将生成的 JAR 包复制到 Hive 能访问的位置,比如,

$ cp target/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar /tmp/

导入 Hive

hive> create table people(firstname String);
 OK
 Time taken: 0.816 seconds
 
 hive> INSERT INTO TABLE people VALUES ('hehe');
 
 hive> ADD JAR /tmp/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar;
 Added [/tmp/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar] to class path
 Added resources: [/tmp/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar]
 
 hive> create temporary function hello as 'com.matthewrathbone.example.SimpleUDFExample';
 OK
 Time taken: 0.017 seconds
 
 hive> select hello(firstname) from people limit 10;
 OK
 Hello hehe
 Time taken: 2.513 seconds, Fetched: 1 row(s)

链接

[1] https://github.com/rathboma/hive-extension-examples

[2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-artifact-repository.html

原文  https://feichashao.com/hive_udf/
正文到此结束
Loading...