900字范文,内容丰富有趣,生活中的好帮手!
900字范文 > 金融数据分析赛题1:银行客户认购产品预测

金融数据分析赛题1:银行客户认购产品预测

时间:2023-12-14 17:02:29

相关推荐

金融数据分析赛题1:银行客户认购产品预测

赛题背景

赛题以银行产品认购预测为背景,想让你来预测下客户是否会购买银行的产品。在和客户沟通的过程中,我们记录了和客户联系的次数,上一次联系的时长,上一次联系的时间间隔,同时在银行系统中我们保存了客户的基本信息,包括:年龄、职业、婚姻、之前是否有违约、是否有房贷等信息,此外我们还统计了当前市场的情况:就业、消费信息、银行同业拆解率等。

赛题任务

输出结果:

输出结果;

模型训练

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.ensemble import AdaBoostClassifier

from xgboost import XGBRFClassifier

from lightgbm import LGBMClassifier

from sklearn.model_selection import cross_val_score

import time

clf_lr = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial')

clf_dt = DecisionTreeClassifier()

clf_rf = RandomForestClassifier()

clf_gb = GradientBoostingClassifier()

clf_adab = AdaBoostClassifier()

clf_xgbrf = XGBRFClassifier()

clf_lgb = LGBMClassifier()

from sklearn.model_selection import train_test_split

train_new = pd.read_csv('train_new.csv')

test_new = pd.read_csv('test_new.csv')

feature_columns = [col for col in train_new.columns if col not in ['subscribe']]

train_data = train_new[feature_columns]

target_data = train_new['subscribe']

# 模型调参

from lightgbm import LGBMClassifier

from sklearn.metrics import classification_report

from sklearn.model_selection import GridSearchCV

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(train_data, target_data, test_size=0.2,shuffle=True, random_state=)

#X_test, X_valid, y_test, y_valid = train_test_split(X_test, y_test, test_size=0.5,shuffle=True,random_state=)

n_estimators = [300]

learning_rate = [0.02]#中0.2最优

subsample = [0.6]

colsample_bytree = [0.7] ##在[0.5, 0.6, 0.7]中0.6最优

max_depth = [9, 11, 13] ##在[7, 9, 11, 13]中11最优

is_unbalance = [False]

early_stopping_rounds = [300]

num_boost_round = [5000]

metric = ['binary_logloss']

feature_fraction = [0.6, 0.75, 0.9]

bagging_fraction = [0.6, 0.75, 0.9]

bagging_freq = [2, 4, 5, 8]

lambda_l1 = [0, 0.1, 0.4, 0.5]

lambda_l2 = [0, 10, 15, 35]

cat_smooth = [1, 10, 15, 20]

param = {'n_estimators':n_estimators,

'learning_rate':learning_rate,

'subsample':subsample,

'colsample_bytree':colsample_bytree,

'max_depth':max_depth,

'is_unbalance':is_unbalance,

'early_stopping_rounds':early_stopping_rounds,

'num_boost_round':num_boost_round,

'metric':metric,

'feature_fraction':feature_fraction,

'bagging_fraction':bagging_fraction,

'lambda_l1':lambda_l1,

'lambda_l2':lambda_l2,

'cat_smooth':cat_smooth}

model = LGBMClassifier()

clf = GridSearchCV(model, param, cv=3, scoring='accuracy', verbose=1, n_jobs=-1)

clf.fit(X_train, y_train, eval_set=[(X_train, y_train),(X_test, y_test)])

print(clf.best_params_, clf.best_score_)

提交结果

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。