我有一个模型,我想在其中分析残差。最后,我想确定每天超出置信区间的极端结果。但我很难计算装袋回归器中每个模型的残差的逐点标准差。
我的示例代码如下:;
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.ensemble import BaggingRegressor
# Sample DataFrame
df = pd.DataFrame(np.random.randint(0,200,size=(500, 4)), columns=list('ABCD'))
# Add dates to sample data
base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(500)]
df['date'] = date_list
df['date'] = df['date'].astype('str')
# Split dataset into testing and training
train = df[:int(len(df)*0.80)]
test = df[int(len(df)*0.20):]
X_train = train[['B','C','D','date']]
X_test = test[['B','C','D','date']]
y_train = train[['A']]
y_test = test[['A']]
# Function to Encode the data
def encode_and_bind(data_in, feature_to_encode):
dummies = pd.get_dummies(data_in[[feature_to_encode]])
data_out = pd.concat([data_in, dummies], axis=1)
data_out = data_out.drop([feature_to_encode], axis=1)
return(data_out)
for feature in features_to_encode:
X_train_final = encode_and_bind(X_train, 'date')
X_test_final = encode_and_bind(X_test, 'date')
# Define Model
svr_lin = SVR(kernel="linear", C=100, gamma="auto")
regr = BaggingRegressor(base_estimator=svr_lin,random_state=5).fit(X_train_final, y_train.values.ravel())
# Predictions
y_pred = regr.predict(X_test_final)
# Join the predictions back into orignial dataframe
y_test['predict'] = y_pred
# Calculate residuals
y_test['residuals'] = y_test['A'] - y_test['predict']
我在网上找到了这个方法
raw_pred = [x.predict([[0, 0, 0, 0]]) for x in regr.estimators_]
但我不确定x.predict([[0, 0, 0, 0]])部分使用什么,因为我有4个以上的功能。