Author : Emre Okcular
Date : June 7st 2021
In this post, we will take a closer look at hyperparameters of a deep neural network listed below. Moreover, we will cover how to tune these parameters similar approach with scikit-learn GridSearch. Ray Tune package is a Python library for experiment execution and hyperparameter tuning at any scale.
Important hyperparameters for Neural Networks:
Learnin rate is the most important hyper-parameter to tune for training neural networks. If learning rates are too small it causes slow training, if learning rates are too large is causes training to diverge. As an initial approach we can fix the learning rate and then monotonically decrease during training.
Dropout is a regularization technique that prevents overfitting in neural networks. At each training stage (batch), individual nodes are either dropped out of the net with probability p or kept with probability 1-p, so that a reduced network is left.
Training Phase
For each hidden layer, for each training sample, for each iteration, ignore (zero out) a random fraction, p, of nodes (and corresponding activations).
Testing Phase
Use all activations, but reduce them by a factor 1-p (to account for the missing activations during training).
Neural networks learn co-adaptations of hidden units that work for the training data but do not generalize to unseen data.Random dropout breaks up co-adaptations making the presence of any particular hidden unit unreliable.
It is usually a parameter in the optimization function such as torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
. In other words it is a L2 penalty that is added to error function. Larger values of λ will tend to shrink the weights toward zero: typically cross-validation is used to estimate λ.
Batch size is a number of samples processed before the model is updated.
Why to use minibatch?
Number of epochs is the number of complete passes through the training dataset.
While tuning these parameters manually, you might feel like a gradient decent in multidimensional space looking for some pattern(increase or decrease). In scikit-learn pipelines we use methods like HalvingGridSearchCV
, RandomizedSearchCV
, GridSearchCV
to solve this search problem. It may take some time and CPU heat with big datasets but eventually you will have tuned model with best estimator and hyperparameters. With the same approach we can tune neural network parameters using Ray Tune package in efficient and distributed way.
We will import below packages from PyTorch and Ray.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from ray import tune # For hyperparameter tuning in Neural Networks.
from ray.tune import Analysis # For analyzing tuning results.
Let use a simple NN model architecture with one layer, dropout, ReLU activation function and batchnorm.
class MulticlassClassification(nn.Module):
def __init__(self, num_feature, num_class,d,M):
super(MulticlassClassification, self).__init__()
self.layer_1 = nn.Linear(num_feature, M)
self.layer_out = nn.Linear(M, num_class)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(p=d)
self.batchnorm1 = nn.BatchNorm1d(M)
def forward(self, x):
x = self.layer_1(x)
x = self.batchnorm1(x)
x = self.relu(x)
x = self.dropout(x)
x = self.layer_out(x)
return x
To use tuning method we need to convert our training code to a function and add tune.report(training_accuracy=train_epoch_acc/len(train_loader),val_accuracy=val_epoch_acc/len(val_loader),training_loss=train_epoch_loss/len(train_loader),val_loss=val_epoch_loss/len(val_loader)
to our training loop to store the metrics for each epoch.
def train(model,optimizer,train_loader,val_loader,EPOCHS,accuracy_stats,loss_stats):
for e in tqdm(range(1, EPOCHS+1)):
train_epoch_loss = 0
train_epoch_acc = 0
model.train()
for X_train_batch, y_train_batch in train_loader:
X_train_batch, y_train_batch = X_train_batch, y_train_batch
optimizer.zero_grad()
y_train_pred = model(X_train_batch)
train_loss = criterion(y_train_pred, y_train_batch)
train_acc = multi_acc(y_train_pred, y_train_batch)
train_loss.backward()
optimizer.step()
train_epoch_loss += train_loss.item()
train_epoch_acc += train_acc.item()
# VALIDATION
with torch.no_grad():
val_epoch_loss = 0
val_epoch_acc = 0
model.eval()
for X_val_batch, y_val_batch in val_loader:
X_val_batch, y_val_batch = X_val_batch, y_val_batch
y_val_pred = model(X_val_batch)
val_loss = criterion(y_val_pred, y_val_batch)
val_acc = multi_acc(y_val_pred, y_val_batch)
val_epoch_loss += val_loss.item()
val_epoch_acc += val_acc.item()
loss_stats['train'].append(train_epoch_loss/len(train_loader))
loss_stats['val'].append(val_epoch_loss/len(val_loader))
accuracy_stats['train'].append(train_epoch_acc/len(train_loader))
accuracy_stats['val'].append(val_epoch_acc/len(val_loader))
tune.report(training_accuracy=train_epoch_acc/len(train_loader),val_accuracy=val_epoch_acc/len(val_loader),training_loss=train_epoch_loss/len(train_loader),val_loss=val_epoch_loss/len(val_loader))
return accuracy_stats
def get_loaders():
df = pd.read_csv("/Users/emre/Dev/GitHub/train_ml2_2021.csv")
y=df['target']
X=df.drop(columns='target')
test = pd.read_csv('/Users/emre/Dev/GitHub/test0.csv')
X_test = test.drop(columns=['target',"obs_id"])
y_test = test["target"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, stratify=y, random_state=21)
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
X_train, y_train = np.array(X_train), np.array(y_train)
X_val, y_val = np.array(X_val), np.array(y_val)
X_test, y_test = np.array(X_test), np.array(y_test)
train_dataset = CancerDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).long())
val_dataset = CancerDataset(torch.from_numpy(X_val).float(), torch.from_numpy(y_val).long())
test_dataset = CancerDataset(torch.from_numpy(X_test).float(), torch.from_numpy(y_test).long())
target_list = []
for _, t in train_dataset:
target_list.append(t)
target_list = torch.tensor(target_list)
target_list = target_list[torch.randperm(len(target_list))]
c = Counter(y_train)
class_count = [i for i in [i[1] for i in sorted(c.items())]]
class_weights = 1./torch.tensor(class_count, dtype=torch.float)
class_weights_all = class_weights[target_list]
weighted_sampler = WeightedRandomSampler(
weights=class_weights_all,
num_samples=len(class_weights_all),
replacement=True)
return weighted_sampler, train_dataset, val_dataset,len(X.columns),y.nunique()
Finally, we will use train function to wrap all the helpers.
def train(config):
test = []
accuracy_stats = {
'train': [],
"val": []
}
loss_stats = {
'train': [],
"val": []
}
w,tr,val,num_feat, num_cl = get_loaders()
train_loader = DataLoader(dataset=tr,
batch_size=config["bs"],
sampler=w, drop_last=True)
val_loader = DataLoader(dataset=val,shuffle=True, batch_size=1)
model = MulticlassClassification(num_feature = NUM_FEATURES, num_class=NUM_CLASSES,d=config["dr"],M=config["m"])
optimizer = optim.Adam(model.parameters(), lr=config["lr"],weight_decay = config["wd"])
train(model, optimizer, train_loader, val_loader,EPOCHS=EPOCH_FOR_TUNING,accuracy_stats=accuracy_stats,loss_stats=loss_stats) ##EPOCH is hardcoded!
Before hyperparameter search we need to specify the search space in a dictionary.
param_space = {"lr": tune.grid_search([0.00003,0.0005, 0.001, 0.0007]), # Learning Rate
"bs": tune.grid_search([8, 16, 32,64, 128 , 256,512]), # Batch Size
"dr": tune.grid_search([0.3, 0.5,0.7,0.85]), # Dropout
"m": tune.grid_search([30, 50, 100, 200, 300]), # Number of neurons
"wd": tune.grid_search([0])} # Weight Decay for optimizer
When we run below method it will start to search parameter space by creatgin folders in the root library.
analysis = tune.run(train, config=param_space,verbose=1, name="epoch_"+str(EPOCH_FOR_TUNING))
You can see the folders in the path /Users/emre/ray_results/epoch_300
Here epoch_300
is the name for tuning process we specified above in the run() method.
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00000_0_bs=8,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00001_1_bs=16,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00002_2_bs=32,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00003_3_bs=64,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00004_4_bs=128,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00005_5_bs=256,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00007_7_bs=8,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00008_8_bs=16,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00009_9_bs=32,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-08
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00010_10_bs=64,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-35
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00011_11_bs=128,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-35
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00012_12_bs=256,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-36
drwxr-xr-x 8 emre staff 256 6 May 11:35 train_cancer_c3378_00013_13_bs=512,dr=0.5,lr=3e-05,m=30,wd=0_2021-05-06_11-35-41
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00014_14_bs=8,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-35-47
drwxr-xr-x 7 emre staff 224 6 May 11:35 train_cancer_c3378_00015_15_bs=16,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-35-53
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00016_16_bs=32,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-35-53
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00017_17_bs=64,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-36-05
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00018_18_bs=128,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-36-07
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00019_19_bs=256,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-36-19
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00021_21_bs=8,dr=0.85,lr=3e-05,m=30,wd=0_2021-05-06_11-36-29
drwxr-xr-x 8 emre staff 256 6 May 11:36 train_cancer_c3378_00020_20_bs=512,dr=0.7,lr=3e-05,m=30,wd=0_2021-05-06_11-36-29
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00022_22_bs=16,dr=0.85,lr=3e-05,m=30,wd=0_2021-05-06_11-36-31
drwxr-xr-x 7 emre staff 224 6 May 11:36 train_cancer_c3378_00023_23_bs=32,dr=0.85,lr=3e-05,m=30,wd=0_2021-05-06_11-36-37
drwxr-xr-x 2 emre staff 64 6 May 11:36 train_cancer_c3378_00024_24_bs=64,dr=0.85,lr=3e-05,m=30,wd=0_2021-05-06_11-36-56
-rw-r--r-- 1 emre staff 2108895 6 May 11:36 experiment_state-2021-05-06_11-34-59.json
-rw-r--r-- 1 emre staff 6215 6 May 11:36 basic-variant-state-2021-05-06_11-34-59.json
drwxr-xr-x 8 emre staff 256 6 May 17:52 train_cancer_c3378_00006_6_bs=512,dr=0.3,lr=3e-05,m=30,wd=0_2021-05-06_11-35-01
Lets take a look at one of the folder.
-rw-r--r-- 1 emre staff 63 6 May 11:35 params.json
-rw-r--r-- 1 emre staff 64 6 May 11:35 params.pkl
-rw-r--r-- 1 emre staff 80366 6 May 11:36 progress.csv
-rw-r--r-- 1 emre staff 208199 6 May 11:36 result.json
-rw-r--r-- 1 emre staff 169618 6 May 11:36 events.out.tfevents.1620326101.Emre-MacBook-Pro.local
You can find NN parameters, weights, model as pkl object, and logs.
print("Best config: ", analysis.get_best_config(mode ="max" ,metric="val_accuracy"))
Best config: {'lr': 0.0005, 'bs': 128, 'dr': 0.3, 'm': 300, 'wd': 0}
training_accuracy | val_accuracy | training_loss | val_loss | time_this_iter_s | done | timesteps_total | episodes_total | training_iteration | experiment_id | date | timestamp | time_total_s | pid | hostname | node_ip | time_since_restore | timesteps_since_restore | iterations_since_restore | trial_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45.24390243902439 | 40.54054054054054 | 0.7499703270633046 | 0.7465202832544172 | 3.6098971366882324 | False | 1 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-08 | 1620326108 | 3.6098971366882324 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 3.6098971366882324 | 0 | 1 | c3378_00000 | ||
49.97560975609756 | 35.13513513513514 | 0.7230090389891368 | 0.7733307557331549 | 0.1837749481201172 | False | 2 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-08 | 1620326108 | 3.7936720848083496 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 3.7936720848083496 | 0 | 2 | c3378_00000 | ||
48.19512195121951 | 35.13513513513514 | 0.7115627977906204 | 0.7763204888717549 | 0.1500110626220703 | False | 3 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-08 | 1620326108 | 3.94368314743042 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 3.94368314743042 | 0 | 3 | c3378_00000 | ||
54.829268292682926 | 37.83783783783784 | 0.6684811834881945 | 0.7417761256565919 | 0.17262482643127441 | False | 4 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-09 | 1620326109 | 4.116307973861694 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 4.116307973861694 | 0 | 4 | c3378_00000 | ||
52.41463414634146 | 40.54054054054054 | 0.6830459309787285 | 0.7908904548432376 | 0.11512398719787598 | False | 5 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-09 | 1620326109 | 4.23143196105957 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 4.23143196105957 | 0 | 5 | c3378_00000 | ||
56.63414634146341 | 45.945945945945944 | 0.6508946157083279 | 0.7613682513301437 | 0.13743996620178223 | False | 6 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-09 | 1620326109 | 4.3688719272613525 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 4.3688719272613525 | 0 | 6 | c3378_00000 | ||
59.73170731707317 | 45.945945945945944 | 0.62467926449892 | 0.7794352487937825 | 0.14998722076416016 | False | 7 | 55bb3e915a12438a9574e1cecc1423dd | 2021-05-06_11-35-09 | 1620326109 | 4.518859148025513 | 40040 | Emre-MacBook-Pro.local | 192.168.0.155 | 4.518859148025513 | 0 | 7 | c3378_00000 |
You can save the results as a dataframe.
analysis.dataframe().to_csv("df_ep_100.csv")
For example, you can draw accuracy and loss graphs using this data.
If you want to take a look at previous tuning analysis you can load with below method.
analysis2 = Analysis("/Users/emre/ray_results/train_cancer_2021-05-01_22-32-12")
To wrap up, hyperparameter tuning in neural networks is an exhaustive search with lots of time and CPU power. Be careful with the parameter space that you used. It may give you the best parameters in somewhere in the parameter space which are not related with the exact solution.
Thank you for reading.