Hive:sql练习题之一(用户商品推荐)

JerryXia 发表于 , 阅读 (44)

前言

一道sql练习题,在hive上跑的,用到了hive的一些窗函数。

练习题

数据

建表语句:

drop table if exists `dante_test.rank`;create external table `dante_test.rank` (  `uid` int COMMENT '用户id',  `product_skn` int COMMENT '商品id',  `brand_id` int COMMENT '品牌id',  `rank` int '排序')row format delimited fields terminated by ','location '/tmp/dante/rank';load data inpath '/tmp/dante/rank' into table dante_test.rank

数据集

1,50001,11,11,50002,11,21,50003,12,31,50004,13,42,50002,11,12,50004,13,22,50005,13,32,50006,14,4

描述

就是对一个用户,在他偏好的商品中,把每个品牌中排第一的排前面,内部顺序还是依照原来的顺序;后面再跟着每个品牌内排第二的;再跟每个品牌内排第三的。

答案

select uid, product_skn, brand_id, dense_rank() over (distribute by uid sort by product_rank asc, rank asc) as rank from (select uid, product_skn, brand_id, rank, row_number() over (distribute by uid, brand_id sort by rank asc) as product_rank    from dante_test.rank) as a  order by uid asc, rank asc;

结果

Total MapReduce CPU Time Spent: 8 seconds 540 msecOK1	50001	11	11	50003	12	21	50004	13	31	50002	11	42	50002	11	12	50004	13	22	50006	14	32	50005	13	4