作者 | Thomas Ciha
译者 | 刘旭坤
编辑 | Jane
出品 | AI科技大本营
【导读】一般来说机器学习模型的优化没什么捷径可循。用什么架构,选择什么优化算法和参数既取决于我们对数据集的理解,也要不断地试错和修正。所以快速构建和测试模型的能力对于项目的推进就显得至关重要了。本文我们就来构建一条生产模型的流水线,帮助大家实现参数的快速优化。
对深度学习模型来说,有下面这几个可控的参数:
我们先把这些参数都写到一个存储模型参数信息的字典 model_info 中:
1model_info = {}
2model_info['Hidden layers'] = [100] * 6
3model_info['Input size'] = og_one_hot.shape[1] - 1
4model_info['Activations'] = ['relu'] * 6
5model_info['Optimization'] = 'adadelta'
6model_info["Learning rate"] = .005
7model_info["Batch size"] = 32
8model_info["Preprocessing"] = 'Standard'
9model_info["Lambda"] = 0
10model_2['Regularization'] = 'l2'
11model_2['Reg param'] = 0.0005
这里我们想实现对数据集的二元分类,大家可以从下面的链接中下载CSV格式的数据文件。
https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
了解一个数据集最直观的方法就是把数据用可视化的方法呈现出来,降维方法我用了 PCA 和 t-SNE,不过从下面图片中看来,t-SNE 能实现数据的最大区分。(其实我个人认为处理数据用 scikit-learn 带的 StandardScaler 就挺好)
接下来我们就可以用 model_info 中的参数来构建一个深度学习模型。下面这个 build_nn 函数根据输入的 model_info 中的参数构建,并返回一个深度学习模型:
1def build_nn(model_info):
2 """
3 This function builds and compiles a NN given a hash table of the model's parameters.
4 :param model_info:
5 :return:
6 """
7
8 try:
9 if model_info["Regularization"] == "l2": # if we're using L2 regularization
10 lambda_ = model_info['Reg param'] # get lambda parameter
11 batch_norm, keep_prob = False, False # set other regularization tactics
12
13 elif model_info['Regularization'] == 'Batch norm': # batch normalization regularization
14 lambda_ = 0
15 batch_norm = model_info['Reg param'] # get param
16 keep_prob = False
17 if batch_norm not in ['before', 'after']: # ensure we have a valid reg param
18 raise ValueError
19
20 elif model_info['Regularization'] == 'Dropout': # Dropout regularization
21 lambda_, batch_norm = 0, False
22 keep_prob = model_info['Reg param']
23 except:
24 lambda_, batch_norm, keep_prob = 0, False, False # if no regularization is being used
25
26 hidden, acts = model_info['Hidden layers'], model_info['Activations']
27 model = Sequential(name=model_info['Name'])
28 model.add(InputLayer((model_info['Input size'],))) # create input layer
29 first_hidden = True
30
31 for lay, act, i in zip(hidden, acts, range(len(hidden))): # create all the hidden layers
32 if lambda_ > 0: # if we're doing L2 regularization
33 if not first_hidden:
34 model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(hidden[i - 1],))) # add additional layers
35 else:
36 model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(model_info['Input size'],)))
37 first_hidden = False
38 else: # if we're not regularizing
39 if not first_hidden:
40 model.add(Dense(lay, input_shape=(hidden[i-1], ))) # add un-regularized layers
41 else:
42 model.add(Dense(lay, input_shape=(model_info['Input size'],))) # if its first layer, connect it to the input layer
43 first_hidden = False
44
45 if batch_norm == 'before':
46 model.add(BatchNormalization(input_shape=(lay,))) # add batch normalization layer
47
48 model.add(Activation(act)) # activation layer is part of the hidden layer
49
50 if batch_norm == 'after':
51 model.add(BatchNormalization(input_shape=(lay,))) # add batch normalization layer
52
53 if keep_prob:
54 model.add(Dropout(keep_prob, input_shape=(lay,))) # dropout layer
55
56 # --------- Adding Output Layer -------------
57 model.add(Dense(1, input_shape=(hidden[-1], ))) # add output layer
58 if batch_norm == 'before': # if we're using batch norm regularization
59 model.add(BatchNormalization(input_shape=(hidden[-1],)))
60 model.add(Activation('sigmoid')) # apply output layer activation
61 if batch_norm == 'after':
62 model.add(BatchNormalization(input_shape=(hidden[-1],))) # adding batch norm layer
63
64 if model_info['Optimization'] == 'adagrad': # setting an optimization method
65 opt = optimizers.Adagrad(lr = model_info["Learning rate"])
66 elif model_info['Optimization'] == 'rmsprop':
67 opt = optimizers.RMSprop(lr = model_info["Learning rate"])
68 elif model_info['Optimization'] == 'adadelta':
69 opt = optimizers.Adadelta()
70 elif model_info['Optimization'] == 'adamax':
71 opt = optimizers.Adamax(lr = model_info["Learning rate"])
72 else:
73 opt = optimizers.Nadam(lr = model_info["Learning rate"])
74 model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy']) # compile model
75
76 return model
有了这个 build_nn 函数我们就可以传不同的 model_info 给它,从而快速创建模型。下面我用了五个不同的隐藏层数目来实验不同模型架构的分类效果。
1def create_five_nns(input_size, hidden_size, act = None):
2 """
3 Creates 5 neural networks to be used as a baseline in determining the influence model depth & width has on performance.
4 :param input_size: input layer size
5 :param hidden_size: list of hidden layer sizes
6 :param act: activation function to use for each layer
7 :return: list of model_info hash tables
8 """
9 act = ['relu'] if not act else [act] # default activation = 'relu'
10 nns = [] # list of model info hash tables
11 model_info = {} # hash tables storing model information
12 model_info['Hidden layers'] = [hidden_size]
13 model_info['Input size'] = input_size
14 model_info['Activations'] = act
15 model_info['Optimization'] = 'adadelta'
16 model_info["Learning rate"] = .005
17 model_info["Batch size"] = 32
18 model_info["Preprocessing"] = 'Standard'
19 model_info2, model_info3, model_info4, model_info5 = model_info.copy(), model_info.copy(), model_info.copy(), model_info.copy()
20
21 model_info["Name"] = 'Shallow NN' # build shallow nn
22 nns.append(model_info)
23
24 model_info2['Hidden layers'] = [hidden_size] * 3 # build medium nn
25 model_info2['Activations'] = act * 3
26 model_info2["Name"] = 'Medium NN'
27 nns.append(model_info2)
28
29 model_info3['Hidden layers'] = [hidden_size] * 6 # build deep nn
30 model_info3['Activations'] = act * 6
31 model_info3["Name"] = 'Deep NN 1'
32 nns.append(model_info3)
33
34 model_info4['Hidden layers'] = [hidden_size] * 11 # build really deep nn
35 model_info4['Activations'] = act * 11
36 model_info4["Name"] = 'Deep NN 2'
37 nns.append(model_info4)
38
39 model_info5['Hidden layers'] = [hidden_size] * 20 # build realllllly deep nn
40 model_info5['Activations'] = act * 20
41 model_info5["Name"] = 'Deep NN 3'
42 nns.append(model_info5)
43 return nns
可能是因为我们的数据比较非线性,我发现隐藏层的数量和节点个数与测试的结果成正比,隐藏层越多效果越好。这里每组参数构建出的模型我都用了五折交叉验证。五折交叉验证简单说就是说把数据集分成五份,四份用来训练模型,一份用来测试模型。这样轮换测试五次,五份中每一份都会当一次测试数据。然后我们取这五次测试结果的均值作为这个模型的测试结果。这里我们测试了正确率和 AUC,测试结果如下图:
如果嫌交叉验证费时间,但是数据够用的话,我们也可以像下面的代码这样直接把数据集分成训练和测试两个子数据集:
1def quick_nn_test(model_info, data_dict, save_path):
2 model = build_nn(model_info) # use model info to build and compile a nn
3 stop = EarlyStopping(patience=5, monitor='acc', verbose=1) # maintain a max accuracy for a sliding window of 5 epochs. If we cannot breach max accuracy after 15 epochs, cut model off and move on.
4 tensorboard_path =save_path + model_info['Name'] # create path for tensorboard callback
5 tensorboard = TensorBoard(log_dir=tensorboard_path, histogram_freq=0, write_graph=True, write_images=True) # create tensorboard callback
6 save_model = ModelCheckpoint(filepath= save_path + model_info['Name'] + '\\' + model_info['Name'] + '_saved_' + '.h5') # save model after every epoch
7
8
9 model.fit(data_dict['Training data'], data_dict['Training labels'], epochs=150, # fit model
10 batch_size=model_info['Batch size'], callbacks=[save_model, stop, tensorboard]) # evaluate train accuracy
11 train_acc = model.evaluate(data_dict['Training data'], data_dict['Training labels'],
12 batch_size=model_info['Batch size'], verbose = 0)
13 test_acc = model.evaluate(data_dict['Test data'], data_dict['Test labels'], # evaluate test accuracy
14 batch_size=model_info['Batch size'], verbose = 0)
15
16
17 # Get Train AUC
18 y_pred = model.predict(data_dict['Training data']).ravel() # predict on training data
19 fpr, tpr, thresholds = roc_curve(data_dict['Training labels'], y_pred) # compute fpr and tpr
20 auc_train = auc(fpr, tpr) # compute AUC metric
21 # Get Test AUC
22 y_pred = model.predict(data_dict['Test data']).ravel() # same as above with test data
23 fpr, tpr, thresholds = roc_curve(data_dict['Test labels'], y_pred) # compute AUC
24 auc_test = auc(fpr, tpr)
25
26
27 return train_acc, test_acc, auc_train, auc_test
有的书上可能会讲到用网格搜索来实现超参数的优化,但网格搜索其实就是穷举法,现实中是很少能用到的。我们更常会用到的是优化思路:由粗到精,逐步收窄最优参数的范围。
1"""This section of code allows us to create and test many neural networks and save the results of a quick
2test into a CSV file. Once that CSV file has been created, we will continue to add results onto the existing
3file."""
4
5rapid_testing_path = 'YOUR PATH HERE'
6data_path = 'YOUR DATA PATH'
7
8try: # try to load existing csv
9 rapid_mlp_results = pd.read_csv(rapid_testing_path + 'Results.csv')
10 index = rapid_mlp_results.shape[1]
11except: # if no csv exists yet, create a DF
12 rapid_mlp_results = pd.DataFrame(columns=['Model', 'Train Accuracy', 'Test Accuracy', 'Train AUC', 'Test AUC',
13 'Preprocessing', 'Batch size', 'Learn Rate', 'Optimization', 'Activations',
14 'Hidden layers', 'Regularization'])
15 index = 0
16
17og_one_hot = np.array(pd.read_csv(data_path)) # load one hot data
18
19model_info = {} # create model_info dicts for all the models we want to test
20model_info['Hidden layers'] = [100] * 6 # specifies the number of hidden units per layer
21model_info['Input size'] = og_one_hot.shape[1] - 1 # input data size
22model_info['Activations'] = ['relu'] * 6 # activation function for each layer
23model_info['Optimization'] = 'adadelta' # optimization method
24model_info["Learning rate"] = .005 # learning rate for optimization method
25model_info["Batch size"] = 32
26model_info["Preprocessing"] = 'Standard' # specifies the preprocessing method to be used
27
28model_0 = model_info.copy() # create model 0
29model_0['Name'] = 'Model0'
30
31model_1 = model_info.copy() # create model 1
32model_1['Hidden layers'] = [110] * 3
33model_1['Name'] = 'Model1'
34
35model_2 = model_info.copy() # try best model so far with several regularization parameter values
36model_2['Hidden layers'] = [110] * 6
37model_2['Name'] = 'Model2'
38model_2['Regularization'] = 'l2'
39model_2['Reg param'] = 0.0005
40
41model_3 = model_info.copy()
42model_3['Hidden layers'] = [110] * 6
43model_3['Name'] = 'Model3'
44model_3['Regularization'] = 'l2'
45model_3['Reg param'] = 0.05
46
47# .... create more models ....
48
49#-------------- REGULARIZATION OPTIONS -------------
50# L2 Regularization: Regularization: 'l2', Reg param: lambda value
51# Dropout: Regularization: 'Dropout', Reg param: keep_prob
52# Batch normalization: Regularization: 'Batch norm', Reg param: 'before' or 'after'
53
54
55models = [model_0, model_1, model_2] # make a list of model_info hash tables
56
57column_list = ['Model', 'Train Accuracy', 'Test Accuracy', 'Train AUC', 'Test AUC', 'Preprocessing',
58 'Batch size', 'Learn Rate', 'Optimization', 'Activations', 'Hidden layers',
59 'Regularization', 'Reg Param']
60
61for model in models: # for each model_info in list of models to test, test model and record results
62 train_data, labels = preprocess_data(og_one_hot, model['Preprocessing'], True) # preprocess raw data
63 data_dict = split_data(0.9, 0, np.concatenate((train_data, labels.reshape(29999, 1)), axis=1)) # split data
64 train_acc, test_acc, auc_train, auc_test = quick_nn_test(model, data_dict, save_path=rapid_testing_path) # quickly assess model
65
66 try:
67 reg = model['Regularization'] # set regularization parameters if given
68 reg_param = model['Reg param']
69 except:
70 reg = "None" # else set NULL params
71 reg_param = 'NA'
72
73 val_lis = [model['Name'], train_acc[1], test_acc[1], auc_train, auc_test, model['Preprocessing'],
74 model["Batch size"], model["Learning rate"], model["Optimization"], str(model["Activations"]),
75 str(model["Hidden layers"]), reg, reg_param]
76
77 df_dict = {}
78 for col, val in zip(column_list, val_lis): # create df dict to append to csv file
79 df_dict[col] = val
80
81 df = pd.DataFrame(df_dict, index=[index])
82 rapid_mlp_results = rapid_mlp_results.append(df, ignore_index=False)
83 rapid_mlp_results.to_csv(rapid_testing_path + "Results.csv", index=False)
我们先要有一个大致的优化方向和参数的大致范围。这样我们才能在范围内进行参数的随机抽样,然后根据结果进一步收窄参数的范围。下面的代码就在生成模型(其实是用于生成模型的 model_info 字典)的过程中加入了一些随机数:
1def generate_random_model():
2 optimization_methods = ['adagrad', 'rmsprop', 'adadelta', 'adam', 'adamax', 'nadam'] # possible optimization methods
3 activation_functions = ['sigmoid', 'relu', 'tanh'] # possible activation functions
4 batch_sizes = [16, 32, 64, 128, 256, 512] # possible batch sizes
5 range_hidden_units = range(5, 250) # range of possible hidden units
6 model_info = {} # create hash table
7 same_units = np.random.choice([0, 1], p=[1/5, 4/5]) # dictates whether all hidden layers will have the same number of units
8 same_act_fun = np.random.choice([0, 1], p=[1/10, 9/10]) # will each hidden layer have the same activation function?
9 really_deep = np.random.rand()
10 range_layers = range(1, 10) if really_deep < 0.8 else range(6, 20) # 80% of time constrain number of hidden layers between 1 - 10, 20% of time permit really deep architectures
11 num_layers = np.random.choice(range_layers, p=[.1, .2, .2, .2, .05, .05, .05, .1, .05]) if really_deep < 0.8 else np.random.choice(range_layers) # choose number of layers
12 model_info["Activations"] = [np.random.choice(activation_functions, p = [0.25, 0.5, 0.25])] * num_layers if same_act_fun else [np.random.choice(activation_functions, p = [0.25, 0.5, 0.25]) for _ in range(num_layers)] # choose activation functions
13 model_info["Hidden layers"] = [np.random.choice(range_hidden_units)] * num_layers if same_units else [np.random.choice(range_hidden_units) for _ in range(num_layers)] # create hidden layers
14 model_info["Optimization"] = np.random.choice(optimization_methods) # choose an optimization method at random
15 model_info["Batch size"] = np.random.choice(batch_sizes) # choose batch size
16 model_info["Learning rate"] = 10 ** (-4 * np.random.rand()) # choose a learning rate on a logarithmic scale
17 model_info["Training threshold"] = 0.5 # set threshold for training
18 return model_info
到这里将我们快速优化的思路总结成八个大字就是:自动建模,逐步收窄。自动建模是通过 build_nn 这个函数实现的,逐步收窄则是通过参数区间的判断和随机抽样实现的。只要掌握好这个思路,相信大家都能实现对机器学习尤其是深度学习模型参数的快速优化。
原文链接:
https://towardsdatascience.com/how-to-rapidly-test-dozens-of-deep-learning-models-in-python-cb839b518531
【完】