我正在使用SQL Server 2005,当我想在IN子句中使用子查询时要过滤某些结果时,我注意到了一些奇怪的事情。例如,这是我当前的查询,平均运行70秒:
select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' and Phone in ( select Phone from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88 ) group by Phone, ZipCode
但是,如果我将它们分解为2个独立的查询,则每个查询要花费不到1秒的时间。
select Phone from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88
和
select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' group by Phone, ZipCode
最后,如果执行此操作,它将返回与第一个查询相同的结果,但大约需要2-3秒:
select Phone into #tempTable from PlanDetails where Phone is not null and Length is not null and PlannedImp > 0 and CustomerID = 20 and (StatusID <> 2 and StatusID <> 7) and SubcategoryID = 88 select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales from Archive where CustomerID = 20 and ReportDate = '2/3/2011' and Phone in ( select Phone from #tempTable ) group by Phone, ZipCode
在过去的几周中,我一直注意到,不仅此查询的速度很慢,而且任何在IN子句中使用(有点复杂)子查询的查询都会破坏性能。是什么原因呢?
这些查询中唯一可以使用的索引是两个表的CustomerID上的非聚集索引。我查看了慢查询和快速查询的执行计划,发现Archive表上的非聚集索引查找是迄今为止成本最高的百分比(80-90%)。但是,唯一的区别是慢查询中的这一步的CPU成本为7.1,而快查询中的这一步的CPU成本为1.7。
它取决于数据库系统,版本,设置等,但是通常最终会发生的是数据库无法(或拒绝)缓存该内部查询,因此将在外部查询的 每次迭代 中执行该数据库。您正在将问题从O(n)效率类更改为O(n ^ 2)。