我正在做一些代码练习,并在收到用户警告的同时应用数据帧合并
/usr/lib64/python2.7/site- packages/pandas/core/frame.py:6201:FutureWarning:排序是因为未串联的轴未对齐。熊猫的未来版本将更改为默认情况下不排序。要接受将来的行为,请传递“ sort = True”。要保留当前行为并消除警告,请传递sort = False
在这些代码行上:您能帮忙获得此警告的解决方案吗?
placement_video = [self.read_sql_vdx_summary, self.read_sql_video_km] placement_video_summary = reduce(lambda left, right: pd.merge(left, right, on='PLACEMENT', sort=False), placement_video) placement_by_video = placement_video_summary.loc[:, ["PLACEMENT", "PLACEMENT_NAME", "COST_TYPE", "PRODUCT", "VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100", "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25", "DPE50", "DPE75", "DPE100"]] # print (placement_by_video) placement_by_video["Placement# Name"] = placement_by_video[["PLACEMENT", "PLACEMENT_NAME"]].apply(lambda x: ".".join(x), axis=1) placement_by_video_new = placement_by_video.loc[:, ["PLACEMENT", "Placement# Name", "COST_TYPE", "PRODUCT", "VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100", "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25", "DPE50", "DPE75", "DPE100"]] placement_by_km_video = [placement_by_video_new, self.read_sql_km_for_video] placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT'], sort=False), placement_by_km_video) #print (list(placement_by_km_video_summary)) #print(placement_by_km_video_summary) #exit() # print(placement_by_video_new) """Conditions for 25%view""" mask17 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile']) mask18 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPCV"]) mask19 = placement_by_km_video_summary["PRODUCT"].isin(["InStream"]) mask20 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+", "CPCV"]) mask_video_video_completions = placement_by_km_video_summary["COST_TYPE"].isin(["CPCV"]) mask21 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE+"]) mask22 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM"]) mask23 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile', 'InStream']) mask24 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+"]) choice25video_eng = placement_by_km_video_summary["ENG25"] choice25video_vwr = placement_by_km_video_summary["VIEW25"] choice25video_deep = placement_by_km_video_summary["DPE25"] placement_by_km_video_summary["25_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21], [choice25video_eng, choice25video_vwr, choice25video_deep]) """Conditions for 50%view""" choice50video_eng = placement_by_km_video_summary["ENG50"] choice50video_vwr = placement_by_km_video_summary["VIEW50"] choice50video_deep = placement_by_km_video_summary["DPE50"] placement_by_km_video_summary["50_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21], [choice50video_eng, choice50video_vwr, choice50video_deep]) """Conditions for 75%view""" choice75video_eng = placement_by_km_video_summary["ENG75"] choice75video_vwr = placement_by_km_video_summary["VIEW75"] choice75video_deep = placement_by_km_video_summary["DPE75"] placement_by_km_video_summary["75_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21], [choice75video_eng, choice75video_vwr, choice75video_deep]) """Conditions for 100%view""" choice100video_eng = placement_by_km_video_summary["ENG100"] choice100video_vwr = placement_by_km_video_summary["VIEW100"] choice100video_deep = placement_by_km_video_summary["DPE100"] choicecompletions = placement_by_km_video_summary['COMPLETIONS'] placement_by_km_video_summary["100_pc_video"] = np.select([mask17 & mask22, mask19 & mask24, mask17 & mask21, mask23 & mask_video_video_completions], [choice100video_eng, choice100video_vwr, choice100video_deep, choicecompletions]) """conditions for 0%view""" choice0video_eng = placement_by_km_video_summary["ENG0"] choice0video_vwr = placement_by_km_video_summary["VIEW0"] choice0video_deep = placement_by_km_video_summary["DPE0"] placement_by_km_video_summary["Views"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21], [choice0video_eng, choice0video_vwr, choice0video_deep]) #print (placement_by_km_video_summary) #exit() #final Table placement_by_video_summary = placement_by_km_video_summary.loc[:, ["PLACEMENT", "Placement# Name", "PRODUCT", "VIDEONAME", "COST_TYPE", "Views", "25_pc_video", "50_pc_video", "75_pc_video","100_pc_video", "ENGAGEMENTS","IMPRESSIONS", "DPEENGAMENTS"]] #placement_by_km_video = [placement_by_video_summary, self.read_sql_km_for_video] #placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT']), #placement_by_km_video) #print(placement_by_video_summary) #exit() # dup_col =["IMPRESSIONS","ENGAGEMENTS","DPEENGAMENTS"] # placement_by_video_summary.loc[placement_by_video_summary.duplicated(dup_col),dup_col] = np.nan # print ("Dhar",placement_by_video_summary) '''adding views based on conditions''' #filter maximum value from videos placement_by_video_summary_new = placement_by_km_video_summary.loc[ placement_by_km_video_summary.reset_index().groupby(['PLACEMENT', 'PRODUCT'])['Views'].idxmax()] #print (placement_by_video_summary_new) #exit() # print (placement_by_video_summary_new) # mask22 = (placement_by_video_summary_new.PRODUCT.str.upper ()=='DISPLAY') & (placement_by_video_summary_new.COST_TYPE=='CPE') placement_by_video_summary_new.loc[mask17 & mask18, 'Views'] = placement_by_video_summary_new['ENGAGEMENTS'] placement_by_video_summary_new.loc[mask19 & mask20, 'Views'] = placement_by_video_summary_new['IMPRESSIONS'] placement_by_video_summary_new.loc[mask17 & mask21, 'Views'] = placement_by_video_summary_new['DPEENGAMENTS'] #print (placement_by_video_summary_new) #exit() placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index).append( placement_by_video_summary_new).sort_index() placement_by_video_summary["Video Completion Rate"] = placement_by_video_summary["100_pc_video"] / \ placement_by_video_summary["Views"] placement_by_video_final = placement_by_video_summary.loc[:, ["Placement# Name", "PRODUCT", "VIDEONAME", "Views", "25_pc_video", "50_pc_video", "75_pc_video", "100_pc_video", "Video Completion Rate"]]
tl; dr:
concat``append如果列不匹配,则当前对非串联索引(例如,如果要添加行的列)进行排序。在大熊猫0.23中,这开始产生警告。传递参数sort=True以使其静音。将来默认值将更改为 不 排序,因此最好指定一个sort=True或False现在,或者更好地确保您的非串联索引匹配。
concat``append
sort=True
False
该警告在 pandas 0.23.0中 是新的:
在大熊猫的未来版本pandas.concat()和DataFrame.append()将不再这类非串列轴线时尚未对齐。当前行为与先前的行为相同(排序),但是当未指定sort且未串联轴未对齐link时,将发出警告 。
pandas.concat()
DataFrame.append()
来自链接的非常老的github问题的更多信息,由smcinerney评论:
连接DataFrame时,如果列名称之间存在任何差异,则按字母数字顺序对其进行排序。如果它们在DataFrames中相同,则不会排序。 这种记录是无证的和不需要的。当然,默认行为应为不排序。
连接DataFrame时,如果列名称之间存在任何差异,则按字母数字顺序对其进行排序。如果它们在DataFrames中相同,则不会排序。
这种记录是无证的和不需要的。当然,默认行为应为不排序。
一段时间后,参数sort在pandas.concat和中实现DataFrame.append:
sort
pandas.concat
DataFrame.append
排序 :布尔值,默认值无 如果联接为“外部”时未对齐轴,则对非串联轴进行排序。当前默认的排序默认值已弃用,在以后的熊猫版本中将更改为不排序。 显式传递sort = True可使警告和排序静音。显式传递sort = False可使警告静音而不进行排序。 当join =’inner’时,这没有任何作用,因为已经保留了非串联轴的顺序。
排序 :布尔值,默认值无
如果联接为“外部”时未对齐轴,则对非串联轴进行排序。当前默认的排序默认值已弃用,在以后的熊猫版本中将更改为不排序。
显式传递sort = True可使警告和排序静音。显式传递sort = False可使警告静音而不进行排序。
当join =’inner’时,这没有任何作用,因为已经保留了非串联轴的顺序。
因此,如果两个DataFrame具有相同顺序的相同列,则不会出现警告,也不会进行排序:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['a', 'b']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b']) print (pd.concat([df1, df2])) a b 0 1 0 1 2 8 0 4 7 1 5 3 df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['b', 'a']) print (pd.concat([df1, df2])) b a 0 0 1 1 8 2 0 7 4 1 3 5
但是,如果DataFrame具有不同的列或相同的列,但顺序不同,则如果未sort显式设置参数(sort=None默认值),pandas将返回警告:
sort=None
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b']) print (pd.concat([df1, df2]))
FutureWarning:排序,因为未连接的轴未对齐。
a b 0 1 0 1 2 8 0 4 7 1 5 3 print (pd.concat([df1, df2], sort=True)) a b 0 1 0 1 2 8 0 4 7 1 5 3 print (pd.concat([df1, df2], sort=False)) b a 0 0 1 1 8 2 0 7 4 1 3 5
如果DataFrames的列不同,但是前几列对齐-它们将正确地彼此分配(列a以及在下面的示例中b,df1witha和bfrom df2),因为它们都存在。对于存在于一个而不是两个DataFrame中的其他列,将创建缺少的值。
a
b
df1
df2
最后,如果您通过sort=True,则按字母数字顺序对列进行排序。如果sort=False第二个DafaFrame的列不在第一列中,则它们将不进行排序地附加到末尾:
sort=False
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], 'e':[5, 0]}, columns=['b', 'a','e']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], 'c':[2, 8], 'd':[7, 0]}, columns=['c','b','a','d']) print (pd.concat([df1, df2]))
a b c d e 0 1 0 NaN NaN 5.0 1 2 8 NaN NaN 0.0 0 4 7 2.0 7.0 NaN 1 5 3 8.0 0.0 NaN print (pd.concat([df1, df2], sort=True)) a b c d e 0 1 0 NaN NaN 5.0 1 2 8 NaN NaN 0.0 0 4 7 2.0 7.0 NaN 1 5 3 8.0 0.0 NaN print (pd.concat([df1, df2], sort=False)) b a e c d 0 0 1 5.0 NaN NaN 1 8 2 0.0 NaN NaN 0 7 4 NaN 2.0 7.0 1 3 5 NaN 8.0 0.0
在您的代码中:
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index) .append(placement_by_video_summary_new, sort=True) .sort_index()