主頁(yè) > 知識(shí)庫(kù) > 基于SQL Server中如何比較兩個(gè)表的各組數(shù)據(jù) 圖解說(shuō)明

基于SQL Server中如何比較兩個(gè)表的各組數(shù)據(jù) 圖解說(shuō)明

熱門標(biāo)簽:浙江穩(wěn)定外呼系統(tǒng)供應(yīng)商 慶陽(yáng)地圖標(biāo)注 承德地圖標(biāo)注公司名需要花錢嗎 北京400電話辦理多少錢 咸陽(yáng)電腦外呼系統(tǒng)運(yùn)營(yíng)商 美團(tuán)地圖標(biāo)注商戶認(rèn)證注冊(cè) 怎么給高德做地圖標(biāo)注 榕城市地圖標(biāo)注 電銷外呼系統(tǒng)軟件功能

開(kāi)始

前一陣子,在項(xiàng)目中碰到這樣一個(gè)SQL查詢需求,有兩個(gè)相同結(jié)構(gòu)的表(table_left table_right),如下:

圖1.

檢查表table_left的各組(groupId),是否在表table_right中存在有一組(groupId)數(shù)據(jù)(data)與它的數(shù)據(jù)(data)完全相等.

如圖1. 可以看出表table_left和table_right存在兩組數(shù)據(jù)完整相等:

圖2.

分析

從上面的兩個(gè)表,可以知道它們存放的是一組一組的數(shù)據(jù);那么,接下來(lái)我借助數(shù)學(xué)集合的列舉法和運(yùn)算進(jìn)行分析。

先通過(guò)集合的列舉法描述兩個(gè)表的各組數(shù)據(jù):

圖3.

這里只有兩種情況,相等和不相等。對(duì)于不相等,可再分為部分相等、包含、和完全不相等。使用集合描述,可使用交集,子集,并集。如下面圖4.,我列舉出這幾種常見(jiàn)的情況:

圖4.

實(shí)現(xiàn)

在數(shù)據(jù)庫(kù)中,要找出表table_left和表table_right存在相同數(shù)據(jù)的組,方法很多,這里我列出兩種常用的方法。

(下面的SQL腳本,是以圖4.的數(shù)據(jù)為基礎(chǔ)參考)

方法1:

通過(guò)"Select … From …Order by … xml for path('') "把各組的data列數(shù)據(jù)連串起來(lái)(如,圖4.把table_left的組#11的列data連串起來(lái)成"data1-data2-data3"),其他分組(包含表table_right)以此方法實(shí)現(xiàn)data列數(shù)據(jù)連串起來(lái);然后通過(guò)比較兩表的連串后字段是否存在相等,若是相等就說(shuō)明這比較多兩組數(shù)據(jù)相等,由此可以判斷出表table_left的哪組數(shù)據(jù)在表table_right存在與它數(shù)據(jù)完全相等的組。

針對(duì)方法1,需要對(duì)原表增加一個(gè)字段dataPath,用于存儲(chǔ)data列數(shù)據(jù)連串的結(jié)果,如:

復(fù)制代碼 代碼如下:

alter table table_left add dataPath nvarchar(200)
alter table table_right add dataPath nvarchar(200)

分組連串data列數(shù)據(jù)并update至剛新增的列dataPath,如:

復(fù)制代碼 代碼如下:

update a
    set dataPath=b.dataPath
    from table_left a
        cross apply(select (select '-'+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b

update a
    set dataPath=b.dataPath
    from table_right a
        cross apply(select (select '-'+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b

接下來(lái)就是查詢了,如:

復(fù)制代碼 代碼如下:

select distinct a.groupId
    from table_left a
    where exists(select 1 from table_right x where x.dataPath=a.dataPath)

完整代碼:

復(fù)制代碼 代碼如下:

View Code
use tempdb
go
if object_id('table_left') is not null drop table table_left
if object_id('table_right') is not null drop table table_right
go
create table table_left(groupId nvarchar(5),data nvarchar(10))
create table table_right(groupId nvarchar(5),data nvarchar(10))
go
alter table table_left add dataPath nvarchar(200)
alter table table_right add dataPath nvarchar(200)
go
create nonclustered index ix_left on table_left(dataPath)
create nonclustered index ix_right on table_right(dataPath)
go
set nocount on
go
insert into table_right(groupId,data)
select '#1','data1' union all
select '#1','data2' union all
select '#1','data3' union all
select '#2','data55' union all
select '#2','data55' union all
select '#3','data91' union all
select '#3','data92' union all
select '#4','data65' union all
select '#4','data66' union all
select '#4','data67' union all
select '#4','data68' union all
select '#4','data69' union all
select '#5','data77' union all
select '#5','data79'
insert into table_left(groupId,data)
select '#11','data1' union all
select '#11','data2' union all
select '#11','data3' union all
select '#22','data55' union all
select '#22','data57' union all
select '#33','data99' union all
select '#33','data99' union all
select '#44','data66' union all
select '#44','data68' union all
select '#55','data77' union all
select '#55','data78' union all
select '#55','data79'
go
update a
    set dataPath=b.dataPath
    from table_left a
        cross apply(select (select '-'+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b
update a
    set dataPath=b.dataPath
    from table_right a
        cross apply(select (select '-'+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b
--
select distinct a.groupId
    from table_left a
    where exists(select 1 from table_right x where x.dataPath=a.dataPath)

方法2:

通過(guò)SQL Sever提供的集運(yùn)算符"Except",判斷兩組非重復(fù)的數(shù)據(jù)。如果兩組針對(duì)對(duì)方都不存在非重復(fù)的數(shù)據(jù),就說(shuō)明這兩組數(shù)據(jù)完全相等。如,表table_left中的組#11和表 table_right中的組#1,對(duì)列data進(jìn)行"Except"集運(yùn)算,無(wú)任是(#11 à #1)進(jìn)行Except集運(yùn)算,還是(#1 à #11 )進(jìn)行Except集合運(yùn)算,都返回空結(jié)果,這就說(shuō)明組#1 和#11的data數(shù)據(jù)完全相等,如:

復(fù)制代碼 代碼如下:

select data from table_left where groupId='#11' except select data from table_right where groupId='#1'
select data from table_right where groupId='#1' except select data from table_left where groupId='#11'

同樣道理,我們把表table_left中的組#11和表 table_right中的組#2,對(duì)列data進(jìn)行"Except"集運(yùn)算,如:

復(fù)制代碼 代碼如下:

select data from table_left where groupId='#11' except select data from table_right where groupId='#2'
select data from table_right where groupId='#2' except select data from table_left where groupId='#11'

只要(#11 à #2 )或 (#2 à #11 )的"Except"集運(yùn)算結(jié)果有記錄,就說(shuō)明兩組的數(shù)據(jù)不相等。

兩張表的所有組都進(jìn)行比較,我們需要通過(guò)以下SQL腳本實(shí)現(xiàn),如:

復(fù)制代碼 代碼如下:

select distinct a.groupId
    from table_left a
        inner join table_right b on b.data=a.data
    where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId )
        and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId )

 完整代碼:

復(fù)制代碼 代碼如下:

View Code
use tempdb
go
if object_id('table_left') is not null drop table table_left
if object_id('table_right') is not null drop table table_right
go
create table table_left(groupId nvarchar(5),data nvarchar(10))
create table table_right(groupId nvarchar(5),data nvarchar(10))
go
create nonclustered index ix_left on table_left(data)
create nonclustered index ix_right on table_right(data)
go
set nocount on
go
insert into table_right(groupId,data)
select '#1','data1' union all
select '#1','data2' union all
select '#1','data3' union all
select '#2','data55' union all
select '#2','data55' union all
select '#3','data91' union all
select '#3','data92' union all
select '#4','data65' union all
select '#4','data66' union all
select '#4','data67' union all
select '#4','data68' union all
select '#4','data69' union all
select '#5','data77' union all
select '#5','data79'
insert into table_left(groupId,data)
select '#11','data1' union all
select '#11','data2' union all
select '#11','data3' union all
select '#22','data55' union all
select '#22','data57' union all
select '#33','data99' union all
select '#33','data99' union all
select '#44','data66' union all
select '#44','data68' union all
select '#55','data77' union all
select '#55','data78' union all
select '#55','data79'
go
--select
select distinct a.groupId
    from table_left a
        inner join table_right b on b.data=a.data
    where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId )
        and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId )

方法1 Vs. 方法2 :

方法1和方法2都能找出表table_left在table_right存在數(shù)據(jù)完全相等的組#11。但性能角度上,方法2比方法1略勝一籌,可以看它們執(zhí)行過(guò)程的統(tǒng)計(jì)信息:

方法1:

圖5.

方法2:

圖6.

如果,數(shù)據(jù)量大情況下,那么方法2比方法1更具有明顯的優(yōu)點(diǎn)。因?yàn)榉椒?,多兩個(gè)更新dataPath的部分,數(shù)據(jù)量隨著增加,這里位置的更新就耗很多的資源;如果dataPath列數(shù)據(jù)大小超過(guò)900字節(jié),會(huì)導(dǎo)致無(wú)法在dataPath創(chuàng)建索引,影響后面的Select查詢性能。

擴(kuò)展

這里說(shuō)擴(kuò)展,主要是針對(duì)上面的方法2來(lái)說(shuō)。在當(dāng)列data的數(shù)據(jù)大小超過(guò)900字節(jié),或者含有多個(gè)數(shù)據(jù)列要進(jìn)行比較,看是否存在兩組(groupId)的各對(duì)應(yīng)列數(shù)據(jù)一一相等。

圖7.

這樣的情況,可對(duì)字段dataSub1 dataSub2 創(chuàng)建一個(gè)哈希索引,如:

復(fù)制代碼 代碼如下:

alter table table_left add dataChecksum as checksum(dataSub1,dataSub2)
alter table table_right add dataChecksum as checksum(dataSub1,dataSub2)
go
create nonclustered index ix_table_left_cs on table_right(dataChecksum)
create nonclustered index table_right_cs on table_right(dataChecksum)

后面的select查詢語(yǔ)句,在Inner Join 部分稍改動(dòng)下即可,如:

復(fù)制代碼 代碼如下:

select distinct a.groupId
    from table_left a
        inner join table_right b on b.dataChecksum=a.dataChecksum
            and b.dataSub1=a.dataSub1
            and b.dataSub2=a.dataSub2
    where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId )
        and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId )

 完整代碼:

復(fù)制代碼 代碼如下:

View Code
use tempdb
go
if object_id('table_left') is not null drop table table_left
if object_id('table_right') is not null drop table table_right
go
create table table_left(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10))
create table table_right(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10))
go
alter table table_left add dataChecksum as checksum(dataSub1,dataSub2)
alter table table_right add dataChecksum as checksum(dataSub1,dataSub2)
go
create nonclustered index ix_table_left_cs on table_left(dataChecksum)
create nonclustered index table_right_cs on table_right(dataChecksum)
go
set nocount on
go
insert into table_right(groupId,dataSub1,dataSub2)
select '#1','data1','data7' union all
select '#1','data2','data8' union all
select '#1','data3','data9' union all
select '#2','data55','data4' union all
select '#2','data55','data5'
insert into table_left(groupId,dataSub1,dataSub2)
select '#11','data1','data7' union all
select '#11','data2','data8' union all
select '#11','data3','data9' union all
select '#22','data55','data0' union all
select '#22','data57','data2' union all
select '#33','data99','data4' union all
select '#33','data99','data6'
go
--select
select distinct a.groupId
    from table_left a
        inner join table_right b on b.dataChecksum=a.dataChecksum
            and b.dataSub1=a.dataSub1
            and b.dataSub2=a.dataSub2
    where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId )
        and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId )

小結(jié)

對(duì)于這個(gè)問(wèn)題,可能還有其他的或更優(yōu)的解決方法.而且在實(shí)際的生產(chǎn)環(huán)境中,可能碰到的情況會(huì)有所不同,無(wú)論如何,需要多分析,多動(dòng)手多實(shí)驗(yàn),找到最優(yōu)的解決方法。

標(biāo)簽:新鄉(xiāng) 江蘇 重慶 昭通 拉薩 貴州 上海 呼和浩特

巨人網(wǎng)絡(luò)通訊聲明:本文標(biāo)題《基于SQL Server中如何比較兩個(gè)表的各組數(shù)據(jù) 圖解說(shuō)明》,本文關(guān)鍵詞  基于,SQL,Server,中,如何,比較,;如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問(wèn)題,煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們,我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò),涉及言論、版權(quán)與本站無(wú)關(guān)。
  • 相關(guān)文章
  • 下面列出與本文章《基于SQL Server中如何比較兩個(gè)表的各組數(shù)據(jù) 圖解說(shuō)明》相關(guān)的同類信息!
  • 本頁(yè)收集關(guān)于基于SQL Server中如何比較兩個(gè)表的各組數(shù)據(jù) 圖解說(shuō)明的相關(guān)信息資訊供網(wǎng)民參考!
  • 推薦文章