在给定唯一ID的情况下，仅选择其列之前已更改的行

admin

在给定唯一ID的情况下，仅选择其列之前已更改的行

sql

我有一个postgreSQL数据库，我想在其中记录特定列随时间变化的方式。表格1：

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        2       213325      x            y        z
    1        2       213326      x            y        z
    1        2       213327      x            y        z
    1        2       213328      x            y        z
    1        3       214330      x            y        z
    1        3       214331      x            y        z
    1        3       214332      x            y        z
    1        2       324543      x            y        z

我想跟踪一段时间内的所有状态。因此，基于此，我想要一个新表table2，其中包含以下数据：

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        3       214323      x            y        z
    1        2       324543      x            y        z

x，y，z是可以并且将在每一行之间变化的变量。这些表还有成千上万的personID，但我想捕获的ID也有所变化。按状态，personid进行单个分组是不够的（如我所见），因为我可以存储具有相同状态和personID的几行，就像状态发生了变化一样。

我在Python中执行此操作，但速度很慢（而且我猜想它的IO也很多）：

for person in personid:
    status = -1
    records = getPersonRecords(person) #sorted by unixtime in query
    newrecords = []
    for record in records:
        if record.status != status:
                 status = record.status
                 newrecords.append(record)
    appendtoDB(newrecords)

阅读 139

2021-06-07

共1个答案

admin

这是一个孤岛问题。您需要每个孤岛的开始，您可以通过将当前行的状态与“上一个”记录的状态进行比较来进行标识。

窗口函数对此非常方便：

select t.*
from (
    select t.*, lag(status) over(partition by personID order by unixtime) lag_status
    from mytable t
) t
where lag_status is null or status <> lag_status

2021-06-07