数据分析模型之多元回归Python代码 - 勿要

python

数据分析模型之多元回归Python代码

多元回归

  # 导⼊模块

from sklearn import model_selection

# 导⼊数据

Profit = pd.read_excel(r\'Predict to Profit.xlsx\')

# 将数据集拆分为训练集和测试集

train, test = model_selection.train_test_split(Profit,

test_size = 0.2,

random_state=1234

)

# 根据train数据集建模

model = sm.formula.ols(\'Profit ~ RD_Spend+Administration+Marketing_Spend+C(State)\', data= train).fit()

# print(\'模型的偏回归系数分别为:\n\', model.params)

# 删除test数据集中的Profit变量,⽤剩下的⾃变量进⾏预测

test_X = test.drop(labels = \'Profit\', axis = 1)

pred = model.predict(exog = test_X)

print(\'对⽐预测值和实际值的差异:\n\',pd.DataFrame({\'Prediction\':pred,\'Real\':test.Profit}))

由于地区自变量存在多重共线性,所以系统会自动删除一个,当然也可以自定义一个

  # ⽣成由State变量衍⽣的哑变量

dummies = pd.get_dummies(Profit.State)

# 将哑变量与原始数据集⽔平合并

Profit_New = pd.concat([Profit,dummies], axis = 1)

# 删除State变量和California变量(因为State变量已被分解为哑变量,New York变量需要作为参照组)

Profit_New.drop(labels = [\'State\',\'New York\'], axis = 1, inplace = True)

# 拆分数据集Profit_New

train, test = model_selection.train_test_split(Profit_New, test_size = 0.2, random_state=1234)

# 建模

model2 = sm.formula.ols(\'Profit~RD_Spend+Administration+Marketing_Spend+Florida+California\',

data = train).fit()

print(\'模型的偏回归系数分别为:\n\', model2.params)

以上是 数据分析模型之多元回归Python代码 - 勿要 的全部内容, 来源链接: utcz.com/z/388856.html

回到顶部