我有一个Presto表,假设它具有[id,name,update_time]列和数据
(1, Amy, 2018-08-01), (1, Amy, 2018-08-02), (1, Amyyyyyyy, 2018-08-03), (2, Bob, 2018-08-01)
现在,我想执行一个sql,结果将是
(1, Amyyyyyyy, 2018-08-03), (2, Bob, 2018-08-01)
目前,我在Presto中进行重复数据删除的最佳方法如下。
select t1.id, t1.name, t1.update_time from table_name t1 join (select id, max(update_time) as update_time from table_name group by id) t2 on t1.id = t2.id and t1.update_time = t2.update_time
在Presto中是否有更好的重复数据删除方法?
在PrestoDB中,我倾向于使用row_number():
row_number()
select id, name, date from (select t.*, row_number() over (partition by name order by date desc) as seqnum from table_name t ) t where seqnum = 1;