我试图根据SSN的后4个,姓氏和DOB来确定数据库中潜在的重复客户。我编写的存储过程确实会识别潜在的重复项,但它会将它们列出在一行中- 由于报告原因,我试图将其分成几行。
我的T-SQL看起来像:
DECLARE @StartDate DATE = '1/1/2017', @EndDate DATE = '3/1/2017'; SELECT DENSE_RANK() OVER (ORDER BY c.socialSecurityNumber) AS [SSNRanking] , ROW_NUMBER() OVER (PARTITION BY c.socialSecurityNumber ORDER BY c.socialSecurityNumber) AS [RowNumb] , c.socialSecurityNumber AS [SSN], c.id AS [CustomerID] , c.firstName AS [FirstName] , c.lastName AS [lastName] , c.birthDate [birthdate] , c.createDate AS [CreateDate] , c2.socialSecurityNumber AS [DupSSN] , c2.id AS [DupCustomerID] , c2.firstName AS [DupFirstName] , c2.lastName AS [DupLastName] , c2.birthDate AS [DupBirthDate] , c2.createDate AS [DupCreateDate] FROM dbo.Customers AS [c] INNER JOIN dbo.Customers AS [c2] ON ( SUBSTRING(c.socialSecurityNumber,6,4) = SUBSTRING(c2.socialSecurityNumber,6,4) AND c.birthDate = c2.birthDate AND c.lastName = c2.lastName AND c.id <> c2.id ) LEFT JOIN dbo.CustomerAddresses AS [CA] ON c.id = CA.customerID LEFT OUTER JOIN dbo.Common_Orders AS [co] ON co.customerID = c.id WHERE c.customerStatusTypeID <> 'M' AND C2.customerStatusTypeID <> 'M' AND c.mergedTo IS NULL AND c2.mergedTo IS NULL AND CAST(co.orderDate AS DATE) >= @StartDate AND CAST(co.orderDate AS DATE) <= @EndDate AND c.id = 1234439 GROUP BY c.socialSecurityNumber , c.id , c.firstName , c.lastName , c.birthDate , c.createDate , c2.socialSecurityNumber , c2.id , c2.firstName , c2.lastName , c2.birthDate , c2.createDate ORDER BY CAST(c.socialSecurityNumber AS INT) ASC;
我的数据集如下所示:
SSNRanking RowNumb SSN CustomerID FirstName lastName birthdate CreateDate DupSSN DupCustomerID DupFirstName DupLastName DupBirthDate DupCreateDate 1 1 000009915 1234439 GREG GARRETT 1900-01-01 2014-02-25 000009915 1166084 ADAM GARRETT 1900-01-01 2013-08-29
在此特定实例中,我有两个用户具有相同的SSN姓氏4,相同的姓氏和相同的DOB-但名称不同。
如何使这两个记录显示在单独的行上?理想情况下,我希望看到:
SSNRanking RowNumb SSN CustomerID FirstName lastName birthdate CreateDate 1 1 000009915 1234439 GREG GARRETT 1900-01-01 2014-02-25 1 2 000009915 1166084 ADAM GARRETT 1900-01-01 2013-08-29
但是我不确定加入同一张表时如何才能做到这一点。有什么建议吗?
我链接到一个脚本,该脚本创建有问题的两个表并插入示例数据。希望这是可以接受的:SQL脚本
这称为“取消枢纽”。您可以使用UNPIVOT运算符,但我更喜欢使用CROSS APPLY ... VALUES。
UNPIVOT
CROSS APPLY ... VALUES
我将把查询包装到CTE中,而无需详细查看,并使用将每一行分为两部分CROSS APPLY。
CROSS APPLY
DECLARE @StartDate DATE = '1/1/2017', @EndDate DATE = '3/1/2017'; WITH CTE AS ( SELECT DENSE_RANK() OVER (ORDER BY c.socialSecurityNumber) AS [SSNRanking] , ROW_NUMBER() OVER (PARTITION BY c.socialSecurityNumber ORDER BY c.socialSecurityNumber) AS [RowNumb] , c.socialSecurityNumber AS [SSN], c.id AS [CustomerID] , c.firstName AS [FirstName] , c.lastName AS [lastName] , c.birthDate [birthdate] , c.createDate AS [CreateDate] , c2.socialSecurityNumber AS [DupSSN] , c2.id AS [DupCustomerID] , c2.firstName AS [DupFirstName] , c2.lastName AS [DupLastName] , c2.birthDate AS [DupBirthDate] , c2.createDate AS [DupCreateDate] FROM dbo.Customers AS [c] INNER JOIN dbo.Customers AS [c2] ON ( SUBSTRING(c.socialSecurityNumber,6,4) = SUBSTRING(c2.socialSecurityNumber,6,4) AND c.birthDate = c2.birthDate AND c.lastName = c2.lastName AND c.id <> c2.id ) LEFT JOIN dbo.CustomerAddresses AS [CA] ON c.id = CA.customerID LEFT JOIN dbo.Common_Orders AS [co] ON co.customerID = c.id WHERE c.customerStatusTypeID <> 'M' AND C2.customerStatusTypeID <> 'M' AND c.mergedTo IS NULL AND c2.mergedTo IS NULL AND CAST(co.orderDate AS DATE) >= @StartDate AND CAST(co.orderDate AS DATE) <= @EndDate AND c.id = 1234439 GROUP BY c.socialSecurityNumber , c.id , c.firstName , c.lastName , c.birthDate , c.createDate , c2.socialSecurityNumber , c2.id , c2.firstName , c2.lastName , c2.birthDate , c2.createDate ) SELECT CA.SSNRanking ,CA.RowNumb ,CA.SSN ,CA.CustomerID ,CA.FirstName ,CA.lastName ,CA.birthdate ,CA.CreateDate FROM CTE CROSS APPLY ( VALUES (CTE.SSNRanking, CTE.RowNumb, CTE.SSN, CTE.CustomerID, CTE.FirstName, CTE.lastName, CTE.birthdate, CTE.CreateDate), (CTE.SSNRanking, CTE.RowNumb, CTE.DupSSN, CTE.DupCustomerID, CTE.DupFirstName, CTE.DuplastName, CTE.Dupbirthdate, CTE.DupCreateDate) ) AS CA(SSNRanking, RowNumb, SSN, CustomerID, FirstName, lastName, birthdate, CreateDate) ORDER BY CAST(CA.SSN AS INT) ASC;
顺便一提,
ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnA)
当您按同一列进行分区和排序时,这没有任何意义。我不确定您想要在那实现什么。