我有一个必须维护的旧产品。该表之一类似于以下示例:
DECLARE @t TABLE ( id INT, DATA NVARCHAR(30) ); INSERT INTO @t SELECT 1, 'name: Jim Ey' UNION ALL SELECT 2, 'age: 43' UNION ALL SELECT 3, '----------------' UNION ALL SELECT 4, 'name: Johnson Dom' UNION ALL SELECT 5, 'age: 34' UNION ALL SELECT 6, '----------------' UNION ALL SELECT 7, 'name: Jason Thwe' UNION ALL SELECT 8, 'age: 22' SELECT * FROM @t; /* You will get the following result id DATA ----------- ------------------------------ 1 name: Jim Ey 2 age: 43 3 ---------------- 4 name: Johnson Dom 5 age: 34 6 ---------------- 7 name: Jason Thwe 8 age: 22 */
现在,我想以以下形式获取信息:
name age -------------- -------- Jim Ey 43 Johnson Dom 34 Jason Thwe 22
最简单的方法是什么?谢谢。
出于(略有病态)的好奇心,我试图提出一种方法来转换您提供的确切输入数据。
当然,更好的方法是正确地构造原始数据。对于旧系统,这可能是不可能的,但是可以创建ETL流程以将这些信息带到中间位置,这样就不必实时运行像这样的丑陋查询。
此示例假定所有ID都是一致且连续的(否则,ROW_NUMBER()将需要使用其他列或新的Identity列来保证对ID进行正确的余数运算)。
ROW_NUMBER()
SELECT Name = REPLACE( Name, 'name: ', '' ), Age = REPLACE( Age, 'age: ', '' ) FROM ( SELECT Name = T2.Data, Age = T1.Data, RowNumber = ROW_NUMBER() OVER( ORDER BY T1.Id ASC ) FROM @t T1 INNER JOIN @t T2 ON T1.id = T2.id +1 -- offset by one to combine two rows WHERE T1.id % 3 != 0 -- skip delimiter records ) Q1 -- skip every other record (minus delimiters, which have already been stripped) WHERE RowNumber % 2 != 0
这是一个更实际的示例,因为实际的ID值并不重要,仅行顺序无关紧要。
DECLARE @NumberedData TABLE( RowNumber INT, Data VARCHAR( 100 ) ); INSERT @NumberedData( RowNumber, Data ) SELECT RowNumber = ROW_NUMBER() OVER( ORDER BY id ASC ), Data FROM @t; SELECT Name = REPLACE( N2.Data, 'name: ', '' ), Age = REPLACE( N1.Data, 'age: ', '' ) FROM @NumberedData N1 INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1 WHERE ( N1.RowNumber % 3 ) = 2; DELETE @NumberedData;
同样,最好避免实时运行这样的查询,而使用计划的事务性ETL流程。以我的经验,像这样的半结构化数据容易出现异常。
尽管示例#1和#2(以及其他人提供的解决方案)展示了处理数据的巧妙方法,但转换该数据的更实用的方法是使用游标。为什么?它实际上可能会执行得更好(没有嵌套查询,递归,数据透视或行编号),即使它比较慢,也为错误处理提供了更好的机会。
-- this could be a table variable, temp table, or staging table DECLARE @Results TABLE ( Name VARCHAR( 100 ), Age INT ); DECLARE @Index INT = 0, @Data VARCHAR( 100 ), @Name VARCHAR( 100 ), @Age INT; DECLARE Person_Cursor CURSOR FOR SELECT Data FROM @t; OPEN Person_Cursor; FETCH NEXT FROM Person_Cursor INTO @Data; WHILE( 1 = 1 )BEGIN -- busy loop so we can handle the iteration following completion IF( @Index = 2 ) BEGIN INSERT @Results( Name, Age ) VALUES( @Name, @Age ); SET @Index = 0; END ELSE BEGIN -- optional: examine @Data for integrity IF( @Index = 0 ) SET @Name = REPLACE( @Data, 'name: ', '' ); IF( @Index = 1 ) SET @Age = CAST( REPLACE( @Data, 'age: ', '' ) AS INT ); SET @Index = @Index + 1; END -- optional: examine @Index to see that there are no superfluous trailing -- rows or rows omitted at the end. IF( @@FETCH_STATUS != 0 ) BREAK; FETCH NEXT FROM Person_Cursor INTO @Data; END CLOSE Person_Cursor; DEALLOCATE Person_Cursor;
我创建了100K行的示例源数据,上面提到的三个示例对于转换数据似乎大致等效。
我创建了一百万行源数据,并且与以下类似的查询为选择行的子集(例如在网页或报表中的网格中使用)提供了出色的性能。
-- INT IDENTITY( 1, 1 ) numbers the rows for us DECLARE @NumberedData TABLE( RowNumber INT IDENTITY( 1, 1 ), Data VARCHAR( 100 ) ); -- subset selection; ordering/filtering can be done here but it will need to preserve -- the original 3 rows-per-result structure and it will impact performance INSERT @NumberedData( Data ) SELECT TOP 1000 Data FROM @t; SELECT N1.RowNumber, Name = REPLACE( N2.Data, 'name: ', '' ), Age = REPLACE( N1.Data, 'age: ', '' ) FROM @NumberedData N1 INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1 WHERE ( N1.RowNumber % 3 ) = 2; DELETE @NumberedData;
我看到100万条记录的执行时间为4-10毫秒(i7-3960x)。