我一直试图开发一个非常简单的初始模型,以根据疗养院的位置预测其可能需要支付的罚款金额。
这是我的课程定义
#initial model to predict the amount of fines a nursing home might expect to pay based on its location from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin class GroupMeanEstimator(BaseEstimator, RegressorMixin): #defines what a group is by using grouper #initialises an empty dictionary for group averages def __init__(self, grouper): self.grouper = grouper self.group_averages = {} #Any calculation I require for my predict method goes here #Specifically, I want to groupby the group grouper is set by #I want to then find out what is the mean penalty by each group #X is the data containing the groups #Y is fine_totals #map each state to its mean fine_tot def fit(self, X, y): #Use self.group_averages to store the average penalty by group Xy = X.join(y) #Joining X&y together state_mean_series = Xy.groupby(self.grouper)[y.name].mean() #Creating a series of state:mean penalties #populating a dictionary with state:mean key:value pairs for row in state_mean_series.iteritems(): self.group_averages[row[0]] = row[1] return self #The amount of fine an observation is likely to receive is based on his group mean #Want to first populate the list with the number of observations #For each observation in the list, what is his group and then set the likely fine to his group mean. #Return the list def predict(self, X): dictionary = self.group_averages group = self.grouper list_of_predictions = [] #initialising a list to store our return values for row in X.itertuples(): #iterating through each row in X prediction = dictionary[row.STATE] #Getting the value from group_averages dict using key row.group list_of_predictions.append(prediction) return list_of_predictions
它适用于此 state_model.predict(data.sample(5))
state_model.predict(data.sample(5))
但是当我尝试这样做时就崩溃了: state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))
state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))
我的模型无法处理这种可能性,我想寻求帮助来纠正它。
我看到的问题在于您的fit方法,iteritems基本上是对列而不是行进行迭代。您应该使用itertuples可以为您提供按行的数据的方法来迭代。只需将方法中的循环更改fit为
fit
iteritems
itertuples
for row in pd.DataFrame(state_mean_series).itertuples(): #row format is [STATE, mean_value] self.group_averages[row[0]] = row[1]
然后在你的预测方法中,只需进行故障安全检查即可
prediction = dictionary.get(row.STATE, None) # None is the default value here in case the 'AS' doesn't exist. you may replace it with what ever you want