我有一个数据库,其中包含电话号码,orgno,公司名称,邮政编码等数据。
我的数据库中也有一些重复项,例如相同的公司名称,相同的orgno,相同的邮政编码,但不同的电话号码。因此,我想使用脚本删除这些重复项。我该如何编写这种脚本?
CREATE TABLE TEST ( address_id bigint, name varchar(255), marketing_name varchar(255), co_address varchar(255), box_address1 varchar(255), box_address2 varchar(255), box_postal_code varchar(45), visit_address1 varchar(255), org_nr varchar(255), visit_postal_code varchar(255), county_id varchar(255), contact varchar(255), complete tinyint, deleted tinyint, district_id varchar(255), region varchar(255), phone1 varchar(255), mobile_phone varchar(255), )
首先,您需要确定将要使用的重复条件是什么。例如,我要检查电话号码是否在同一Org Nro,公司名称和邮政编码中重复。我可以做这个:
第一张表进行测试
CREATE TABLE TEST ( id int, phonenumber int, orgno int, companyname varchar(100), postalcode varchar(4) )
然后测试数据
insert into TEST values (1,4533660,1,'COMPANY 1',1234) insert into TEST values (2,4533660,1,'COMPANY 1',1234) insert into TEST values (3,954189547,1,'COMPANY 2',4444) insert into TEST values (4,954189547,1,'COMPANY 2',4444) insert into TEST values (5,3652591,1,'COMPANY 3',4444) insert into TEST values (6,4201580,1,'COMPANY 4',4444) insert into TEST values (7,3337788,1,'COMPANY 5',4444)
最后,查询重复项,并选择条件。
Select phonenumber, orgno, companyname, postalcode, COUNT(*) from test group by phonenumber, orgno, companyname, postalcode HAVING COUNT(*) > 1
通过此查询,您可以轻松找到重复项,最重要的是查看重复项的条件。
编辑
如果要删除除1之外的所有重复行,可以执行以下操作:
DELETE A FROM TEST A INNER JOIN ( SELECT ROW_NUMBER()OVER(PARTITION BY a.phonenumber, a.orgno, a.companyname, a.postalcode ORDER BY a.id)AS POS, a.phonenumber, a.orgno, a.companyname, a.postalcode, a.id FROM TEST A JOIN ( SELECT phonenumber, orgno, companyname, postalcode, COUNT(*) AS CONTADOR FROM test GROUP BY phonenumber, orgno, companyname, postalcode HAVING COUNT(*) > 1 ) TB ON A.companyname = TB.companyname AND A.orgno = TB.orgno AND A.phonenumber = TB.phonenumber AND A.postalcode = TB.postalcode ) TB_2 ON a.id = tb_2.id and tb_2.pos > 1
使用提供的数据结构进行编辑,编写脚本。
DELETE FROM TEST B JOIN ( SELECT ROW_NUMBER()OVER(PARTITION BY a.org_nr, a.phone1, a.phone2, a.name, a.marketing_name ORDER BY a.org_nr, a.phone1, a.phone2, a.name, a.marketing_name )AS POS, a.address_id, a.org_nr, a.phone1, a.phone2, a.name, a.marketing_name FROM TEST A JOIN ( SELECT org_nr, phone1, phone2, name, marketing_name FROM TEST GROUP BY org_nr, phone1, phone2, name, marketing_name HAVING COUNT(*) > 1) TB ON a.org_nr = tb.org_nr and a.phone1 = tb.phone1 and a.phone2 = tb.phone2 and a.name = tb.name and a.marketing_name = tb.marketing_name ) TB_2 ON b.address_id = tb_2.address_id and tb_2.pos > 1