一尘不染

SQL-将时间序列事件转换为开/关对(处理可能缺少的“开”或“关”)

sql

在SQL Server中,我有一组时间序列的开/关事件,如下所示(为简单起见,我只显示了一个警报号,但同一表中有许多):

'Alarms' Table:
AlarmNumber   Time                      AlarmState
1592          2020-01-02 01:52:02       1
1592          2020-01-02 01:58:07       0
1592          2020-04-28 03:46:49       1
1592          2020-04-28 06:19:10       0
1592          2020-06-04 00:25:22       1
1592          2020-08-27 01:57:03       1
1592          2020-08-27 05:16:32       0
1592          2020-09-17 02:51:57       0

我正在尝试将其转换为开/关对:

Output I am trying to achieve, ideally as an SQL View:
AlarmNumber   StartTime                 EndTime
1592          2020-01-02 01:52:02       2020-01-02 01:58:07
1592          2020-04-28 03:46:49       2020-04-28 06:19:10
1592          2020-06-04 00:25:22       NULL
1592          2020-08-27 01:57:03       2020-08-27 05:16:32
1592          NULL                      2020-09-17 02:51:57

如果我有一个干净的数据集,并且没有丢失“打开”或“关闭”事件,则可以使用以下方法实现此目的:

select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)

(代码从Adriano
Carneiro的答案在T-SQL开始和结束日期时间的单列中修改而来)

我的问题:谁能想到一种处理“警报”表的有效方式来实现我的示例输出,该输出可处理缺少的打开/关闭事件(在示例输出中显示为NULL)?

我的备份是使用Cursor和Where循环,但是我希望可以通过将成对的On / Off事件组合在一起来实现此目的,但我只是无法使其工作。我有500k
+的事件,因此要迭代的数据量很大。

任何想法欢迎!

谢谢托马斯

------更新2020年11月1日------

提供了两种出色的解决方案,它们都可以正常工作,并且对80,000行凌乱的真实世界数据的样本提供了相同的结果。

  • GMB的解决方案更易于阅读,但运行速度稍慢
  • gotqn的解决方案是使用更多的代码行,但是在我的测试服务器上运行速度提高了约50%

阅读 202

收藏
2021-03-08

共1个答案

一尘不染

一旦确定了行的顺序,就将SELECT它们分成几部分,并使用合并结果UNION ALL

DECLARE @DataSource TABLE
(
    [AlarmNumber] INT
   ,[Time] DATETIME2(0)
   ,[AlarmState] INT
);

INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
VALUES (1592, '2020-01-02 01:52:02', 1)
      ,(1592, '2020-01-02 01:58:07', 0)
      ,(1592, '2020-04-28 03:46:49', 1)
      ,(1592, '2020-04-28 06:19:10', 0)
      ,(1592, '2020-06-04 00:25:22', 1)
      ,(1592, '2020-08-27 01:57:03', 1)
      ,(1592, '2020-08-27 05:16:32', 0)
      ,(1592, '2020-09-17 02:51:57', 0);

-- Add a rowID column to the data
WITH DataSource AS
(
    SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
    FROM @DataSource
)

-- This is just here so we can sort the result at the end
SELECT * FROM (

-- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
-- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
    SELECT DS1.AlarmNumber
            ,DS1.Time As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS1
    LEFT JOIN DataSource DS2
        ON DS1.[rowID] = DS2.[rowID] - 1
        AND DS1.AlarmNumber = DS2.AlarmNumber
        AND DS2.[AlarmState] = 0
    WHERE DS1.[AlarmState] = 1

    UNION ALL

    -- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
    SELECT DS2.AlarmNumber
            ,NULL As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS2

    INNER JOIN DataSource DS1
        ON DS2.[rowID] -1 = DS1.[rowID]
        AND DS1.[AlarmState] = 0
        AND DS2.AlarmNumber = DS1.AlarmNumber

    WHERE DS2.[AlarmState] = 0

    UNION ALL

    -- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
    SELECT DS1.AlarmNumber
            ,NULL As StartTime
            ,DS1.Time As EndTime
    FROM DataSource DS1
    WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
) z
ORDER BY COALESCE(StartTime,EndTime), AlarmNumber

在此处输入图片说明

2021-03-08