900字范文,内容丰富有趣,生活中的好帮手!
900字范文 > 数据挖掘:银行客户认购产品预测

数据挖掘:银行客户认购产品预测

时间:2024-01-31 06:06:18

相关推荐

数据挖掘:银行客户认购产品预测

数据来源阿里天池学习赛:【教学赛】金融数据分析赛题1:银行客户认购产品预测

直接放代码

import pandas as pdfrom catboost import CatBoostClassifierfrom sklearn.model_selection import train_test_splitfilename = r'train.csv'train = pd.read_csv(filename)data=trian.copy()subscribe_dict = {'yes':1,'no':0}data['subscribe'] = data['subscribe'].map(subscribe_dict)features_list = list(data.select_dtypes(include=['object']).columns)X = data.iloc[:,1:-1]y = data.iloc[:,-1]x_train,x_test,y_train,y_test = train_test_split(X,y,random_state=15,shuffle=True)model = CatBoostClassifier(iterations=400,learning_rate=0.2,max_depth=10,loss_function='Logloss',one_hot_max_size=13,eval_metric='AUC')model.fit(x_train,y_train,cat_features=features_list,eval_set=(x_test,y_test),verbose=False,use_best_model=True)importance = list(zip(model.feature_names_,model.feature_importances_))pred = model.predict(x_test)print(model.score(x_train,y_train))print(model.score(x_test,y_test))print(sorted(importance,key=lambda x:x[1],reverse=True))filename1 = r'test.csv'test = pd.read_csv(filename1)result = pd.DataFrame()result['id'] = test['id']subscribe_dict1 = {1:'yes',0:'no'}pre = model.predict(test.iloc[:,1:])result['subscribe'] = preresult['subscribe'] = result['subscribe'].map(subscribe_dict1)result.to_csv(r'result.csv',index=0)

模型得分和特征得分

0.92225185185185190.8787555555555555[('duration', 31.562158049271915),('emp_var_rate', 12.698946664792766),('month', 8.367504457082596),('pdays', 5.792981302062174),('campaign', 5.037045154538229),('age', 4.735792398997339),('lending_rate3m', 3.560042990101733),('nr_employed', 3.1324586348858263),('cons_conf_index', 3.1090591899823705),('cons_price_index', 2.9085820533638276),('previous', 2.852789204585),('contact', 2.7002900783006725),('loan', 2.568280006753603),('marital', 2.271147360645339),('day_of_week', 2.150930241458051),('poutcome', 1.9447035460281297),('default', 1.7135445811448156),('job', 1.3418564919909857),('housing', 0.8936975936995399),('education', 0.6581900035796278)]

结果提交得分accuracy:0.9529,排名116。EDA和特征工程基本没有,数据质量很好,只是简单的分出类别变量直接丢给模型,结果直接就有0.95精确率。调参过程只是手动调了一下one_hot_max_size,默认值是4,即对唯一值<4的类别型变量使用one-hot编码。这里变量唯一值最多的是job12个,且训练数据和测试数据取值没有差别,将值设为13,accuracy从0.9295提升到0.9529

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。