小能豆

OneHotEncoder not behaving?

py

I am given a code chunk to run in Jupyter to learn about One Hot Encoding and when I run the code an error shows up.

    from sklearn.preprocessing import OneHotEncoder as ohc

    enc = ohc(drop='if_binary', sparse_output=False).set_output(transform='pandas')

    df = enc.fit_transform(default[["student"]])

    default_enc = default.assign(student = df['student_Yes'])

then I get the error code:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-f958840e2f7e> in <module>
      1 from sklearn.preprocessing import OneHotEncoder as ohc
      2 default = pd.read_csv("default.csv", index_col=[0])
----> 3 enc = ohc(drop = 'if_binary',sparse_output=False).set_output(transform='pandas')
      4 df = enc.fit_transform(default[["student"]])
      5 default_enc = default.assign(student = df['student_Yes'])

/usr/local/lib64/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

TypeError: __init__() got an unexpected keyword argument 'sparse_output'

I have tried updating anaconda, and sklearn. The code is supposed to work the next few problems rely on editing it to see how different parts affect it. your text


阅读 72

收藏
2023-12-23

共1个答案

小能豆

The error you’re encountering suggests that the OneHotEncoder class in scikit-learn does not have a sparse_output parameter in its __init__ method. This could be due to version differences.

In scikit-learn version 0.22.0 and later, the OneHotEncoder class does not have a sparse_output parameter in its constructor. Instead, the sparse_output parameter is part of the fit_transform method.

Here’s how you can modify your code:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

enc = OneHotEncoder(drop='if_binary', sparse_output=False)
df = pd.DataFrame(enc.fit_transform(default[["student"]]).toarray(), columns=enc.get_feature_names_out(["student"]))
default_enc = default.join(df)

In this code:

  • fit_transform is used directly on the OneHotEncoder instance.
  • toarray() is called on the result to convert the sparse matrix to a dense array.
  • get_feature_names_out is used to get the column names for the one-hot encoded features.
  • pd.DataFrame is used to create a DataFrame from the one-hot encoded array.
  • Finally, default.join(df) is used to concatenate the original DataFrame (default) with the one-hot encoded DataFrame (df).

Make sure to adjust the code according to your specific requirements and the structure of your dataset.

2023-12-23