Hive's built -in function
definition: (Recommended Learning: PHPSTOS #
UDF(User-Defined-Function),用户自定义函数对数据进行处理。 UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。 UDAF(User Defined Aggregation Function)用户自定义聚合函数,操作多个数据行,产生一个数据行。
Usage:
1. The UDF function can be directly applied to the select statement, format the query structure, and then output the content. 2. When writing UDF functions, you need to pay attention to the following points: a) Custom UDF needs to inherit org.apache.hadoop.hive.ql.UDF. b) Need to implement the evaluate function. c) The evaluate function supports overloading.hive's local mode:
Most Hadoop jobs require the complete scalability provided by hadoop to process big data. However, sometimes the amount of input data to hive is very small. In this case, the time consumed to execute the task for the query may be much longer than the actual job execution time. For most of these situations, hive can handle all tasks on a single machine through local mode. For small data sets, the execution time is significantly reduced. In this way, operations with a relatively small amount of data can be executed locally, which is much faster than submitting tasks to the cluster for execution.Configure the following parameters to enable Hive’s local mode:
hive> set hive.exec.mode.local.auto=true;(默认为false)
Only when a job meets the following conditions can the local mode be truly used. :
The input data size of the job must be smaller than the parameter: hive.exec.mode.local.auto.inputbytes.max (default 128MB)The number of maps of the job must be smaller than the parameter: hive.exec.mode.local.auto.tasks.max (default 4)The reduce number of job must be 0 or 1The above is the detailed content of What are the functions that come with hive?. For more information, please follow other related articles on the PHP Chinese website!