运行环境: win7
、python3.6
实现功能: 对多个参数进行回归分析,得出回归方程,回归统计量P值等
创建statsmodels_test.py
将下面代码复制到该py文件
from pandas import DataFrame
import statsmodels.api as sm
#import statsmodels.regression.linear_model as sm
import pandas as pd
'''
# 测试集
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]
}
df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])
X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price']
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)
'''
#读取文件
datafile = u'cig_data.xlsx'#文件所在位置,u为防止路径中有中文名称,此处没有,可以省略
data = pd.read_excel(datafile)#datafile是excel文件,所以用read_excel,如果是csv文件则用read_csv
examDf = DataFrame(data)
print("GOOD")
new_examDf = examDf.ix[1:, 1:]
X = new_examDf.ix[:,:4]
Y = new_examDf.ix[:,4]
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)
读取的data.xlsx
文件:传送门
OLS Regression Results
==============================================================================
Dep. Variable: Day_abs R-squared: 0.056
Model: OLS Adj. R-squared: 0.039
Method: Least Squares F-statistic: 3.238
Date: Mon, 15 Jun 2020 Prob (F-statistic): 0.0132
Time: 00:54:57 Log-Likelihood: -1392.7
No. Observations: 223 AIC: 2795.
Df Residuals: 218 BIC: 2812.
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 62.1170 85.299 0.728 0.467 -105.999 230.233
Age 0.1967 0.692 0.284 0.777 -1.168 1.561
Cig_Day 1.3202 0.705 1.873 0.062 -0.069 2.710
CO -0.2645 0.103 -2.566 0.011 -0.468 -0.061
LogCOadj 0.0313 0.069 0.458 0.648 -0.104 0.166
==============================================================================
Omnibus: 54.065 Durbin-Watson: 1.813
Prob(Omnibus): 0.000 Jarque-Bera (JB): 86.116
Skew: 1.475 Prob(JB): 2.00e-19
Kurtosis: 3.756 Cond. No. 1.45e+04
==============================================================================
发布者:全栈程序员栈长,转转请注明出处:https://javaforall.cn/2153.html原文链接: