转载

学习钻MapR Sandbox Lesson3:对复杂数据类型进行查询《译》

目标

这节课重点练习查询自我描述数据和复杂数据类型函数和运算符。钻提供了直观的SQL扩展使用这些数据并提供高性能查询与一组复杂数据架构。

查询在这节课中

现在您已经运行ANSI SQL查询不同的表格和关系数据文件,你可以尝试一些例子包括复杂类型。

  • 在一个单一的SELECT声明语句访问文件的目录和子目录。

  • 演示简单的方法访问JSON文件中复杂的数据。

  • 演示repeated_count函数在一个数组中聚合值。

查询分区目录

您可以在你的工作区路径在drill中使用特殊变量参考子目录:

  • dir0

  • dir1

注意,这些变量是基于分区的文件系统动态确定的。 不需要预先定义分区存在。这是一个视觉的例子展示它是如何工作的:

设置dfs.logs工作区:

0: jdbc:drill:> use dfs.logs; +-------+---------------------------------------+ |  ok   |                summary                | +-------+---------------------------------------+ | true  | Default schema changed to [dfs.logs]  | +-------+---------------------------------------+ 1 row selected 

为特定的一年查询日志数据:

0: jdbc:drill:> select * from logs where dir0='2013' limit 10; +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ | dir0  | dir1  | trans_id  |    date     |   time    | cust_id  | device  | state  | camp_id  | keywords  | prod_id  | purch_flag  | +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ | 2013  | 8     | 12104     | 08/29/2013  | 09:34:37  | 962      | IOS5    | ma     | 3        | milhouse  | 17       | false       | | 2013  | 8     | 12132     | 08/23/2013  | 01:11:25  | 4        | IOS7    | mi     | 11       | hi        | 439      | false       | | 2013  | 8     | 12177     | 08/14/2013  | 13:48:50  | 23       | AOS4.2  | il     | 14       | give      | 382      | false       | | 2013  | 8     | 12180     | 08/03/2013  | 20:48:45  | 1509     | IOS7    | ca     | 0        | i'm       | 340      | false       | | 2013  | 8     | 12187     | 08/16/2013  | 10:28:07  | 0        | IOS5    | ny     | 16       | clicking  | 11       | false       | | 2013  | 8     | 12190     | 08/10/2013  | 14:16:50  | 9        | IOS5    | va     | 3        | a         | 495      | false       | | 2013  | 8     | 12200     | 08/02/2013  | 20:54:38  | 42219    | IOS5    | ia     | 0        | what's    | 346      | false       | | 2013  | 8     | 12210     | 08/05/2013  | 20:12:24  | 8073     | IOS5    | sc     | 5        | if        | 33       | false       | | 2013  | 8     | 12235     | 08/28/2013  | 07:49:45  | 595      | IOS5    | tx     | 2        | that      | 51       | false       | | 2013  | 8     | 12239     | 08/13/2013  | 03:24:31  | 2        | IOS5    | or     | 6        | haw-haw   | 40       | false       | +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ 10 rows selected 

这个查询约束文件内的子目录命名为2013。变量dir0指日志第一级列,dir1是下一个水平列,等等。这个查询返回2013年8月份的10行。

使用用多谓语进一步限制查询结果:

这个查询返回一个在2013年8月通过IOS5设备购购物的用户id列表。

 0: jdbc:drill:> select dir0 as yr, dir1 as mth, cust_id from logs where dir0='2013' and dir1='8' and device='IOS5' and purch_flag='true' order by `date`; +-------+------+----------+ |  yr   | mth  | cust_id  | +-------+------+----------+ | 2013  | 8    | 4        | | 2013  | 8    | 521      | | 2013  | 8    | 1        | | 2013  | 8    | 2        |  ...   

对于一个给定年份返回每个月每个客户的数量:

0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10; +----------+-----------+--------------+ | cust_id  | month_no  | month_count  | +----------+-----------+--------------+ | 0        | 1         | 143          | | 0        | 2         | 118          | | 0        | 3         | 117          | | 0        | 4         | 115          | | 0        | 5         | 137          | | 0        | 6         | 117          | | 0        | 7         | 142          | | 0        | 8         | 19           | | 1        | 1         | 66           | | 1        | 2         | 59           | +----------+-----------+--------------+ 10 rows selected   

这个查询为2014年组聚合客户ID和月函数。

复杂数据查询

钻提供了一些专门的操作符和函数,您可以使用它 分析嵌套数据(本身没有转换)。如果您熟悉 JavaScript注释,你就会知道这些扩展是如何工作的。

设置dfs.clicks工作区:

 0: jdbc:drill:> use dfs.clicks; +-------+-----------------------------------------+ |  ok   |                 summary                 | +-------+-----------------------------------------+ | true  | Default schema changed to [dfs.clicks]  | +-------+-----------------------------------------+ 1 row selected  

探测点击流数据:

注意,user_info和trans_info列包含嵌套数据:数组和在数组内的数组。以下查询展示如何访问这些复杂的数据。

 0: jdbc:drill:> select * from `clicks/clicks.json` limit 5; +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ | trans_id  |    date     |   time    |                     user_info                     |                                trans_info                                 | +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ | 31920     | 2014-04-26  | 12:17:12  | {"cust_id":22526,"device":"IOS5","state":"il"}    | {"prod_id":[174,2],"purch_flag":"false"}                                  | | 31026     | 2014-04-20  | 13:50:29  | {"cust_id":16368,"device":"AOS4.2","state":"nc"}  | {"prod_id":[],"purch_flag":"false"}                                       | | 33848     | 2014-04-10  | 04:44:42  | {"cust_id":21449,"device":"IOS6","state":"oh"}    | {"prod_id":[582],"purch_flag":"false"}                                    | | 32383     | 2014-04-18  | 06:27:47  | {"cust_id":20323,"device":"IOS5","state":"oh"}    | {"prod_id":[710,47],"purch_flag":"false"}                                 | | 32359     | 2014-04-19  | 23:13:25  | {"cust_id":15360,"device":"IOS5","state":"ca"}    | {"prod_id":[0,8,170,173,1,124,46,764,30,711,0,3,25],"purch_flag":"true"}  | +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ 5 rows selected  

打开user_info列:

0: jdbc:drill:> select t.user_info.cust_id as custid, t.user_info.device as device, t.user_info.state as state from `clicks/clicks.json` t limit 5; +---------+---------+--------+ | custid  | device  | state  | +---------+---------+--------+ | 22526   | IOS5    | il     | | 16368   | AOS4.2  | nc     | | 21449   | IOS6    | oh     | | 20323   | IOS5    | oh     | | 15360   | IOS5    | ca     | +---------+---------+--------+ 5 rows selected (0.171 seconds)    
原文  http://segmentfault.com/a/1190000004325125
正文到此结束
Loading...