我有以下pandas数据框Top15:在此处输入图片说明
我创建了一个估计每人可引用文件数量的列:
Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita'] Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']
我想知道人均引用文件数量与人均能源供应之间的相关性。因此,我使用该.corr()方法 (皮尔逊相关性):
data = Top15[['Citable docs per Capita','Energy Supply per Capita']] correlation = data.corr(method='pearson')
没有实际数据,很难回答这个问题,但是我想您正在 寻找这样的东西:
Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
That calculates the correlation between your two columns 'Citable docs per Capita' and 'Energy Supply per Capita'.
'Citable docs per Capita'
'Energy Supply per Capita'
To give an example:
import pandas as pd df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]}) A B 0 0 0 1 1 2 2 2 4 3 3 6
Then
df['A'].corr(df['B'])
gives 1 as expected.
1
Now, if you change a value, e.g.
df.loc[2, 'B'] = 4.5 A B 0 0 0.0 1 1 2.0 2 2 4.5 3 3 6.0
the command
returns
0.99586
which is still close to 1, as expected.
If you apply .corr directly to your dataframe, it will return all pairwise correlations between your columns; that’s why you then observe 1s at the diagonal of your matrix (each column is perfectly correlated with itself).
.corr
1s
df.corr()
will therefore return
A B A 1.000000 0.995862 B 0.995862 1.000000
在您显示的图形中,仅表示相关矩阵的左上角(我假设)。
有可能的情况下,你在哪里得到NaN您的解决方案的S -检查这个职位的一个例子。
如果要过滤高于或低于某个阈值的条目,可以检查此问题。如果要绘制相关 系数的热图,则可以检查该答案,如果然后遇到轴标签重叠的问题,请检查以下文章。