I have two columns from a dataframe that I want to get the Correlation Coefficient for: df['a'] and df['b'] there are around 15 or 20 rows of data.

I assign these to "col1" and "col2" and try and call the corr method:

    col1 = df['a']
    col2 = df['b']
    corr = col1.corr(col2,method="pearson")

I get an error: 'float' object has no attribute 'shape'

If I import the stats library and try:

 corr, pval=stats.pearsonr(col1,col2)

I get a correlation coefficient. So what did I do wrong on the first one?

In answer to one of the comments, I checked the type of col1 and col2 and they are both series. I thought this would work since I went to this link in the documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.corr.html Which gives no indication that you need to specify this is a series rather than a dataframe.

I also checked the type of the full dataframe:

print(type(df))

And it comes back as type dataframe The full dataframe is 21 columns with an index. I only want to get the Correlation Coefficient for two of the columns. Here is a subset of the data I get if I print col1 and col2:
col1:
Country
Indonesia 9.3659e-05
Japan 0.000388417
Canada 0.001638514
...
Name: a, dtype: object

col2:
Country
Indonesia 65
Japan 194
Canada 167
...
Name: b, dtype: object

Is the index of Country causing the problem?

score:0

Accepted answer

Either, df is a Series:

>>> df
a    10.0
b    12.0
dtype: float64

or a columns of your dataframe has a wrong type:

>>> df
      a     b
0  10.0  20.0
1  12.0  22.0

>>> df.dtypes
a    float64
b     object
dtype: object

Related Query