Getting Jumps in mini-batch loss when training YoloV2

3 Ansichten (letzte 30 Tage)
ohad a
ohad a am 2 Mai 2019
Beantwortet: Zahra Moayed am 5 Aug. 2019
Hello.
i'm trying to train YOLOV2 on my person detector data set.
For some reason i get big Training loss jumps in the middle of the training. i can also see that the temp checkpoint models files are reducing in size dramatically (e.g - from 59MB to 1.5Mb).
i'm using about 170 pictures with 1-6 bounding box each.
here is the code:
% Define the image input size.
imageSize = [450 800 3];
% Define the number of object classes to detect.
numClasses = width(personDataSet)-1;
anchorBoxes = [
76 43
208 147
103 68
158 106
198 137
129 81
73 40
];
baseNetwork = resnet50
% Specify the feature extraction layer.
featureLayer = 'activation_49_relu';
analyzeNetwork(baseNetwork);
%reorgLayer = 'activation_47_relu';
% Create the YOLO v2 object detection network.
% lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,baseNetwork,featureLayer,'ReorglayerSource',reorgLayer);
lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,baseNetwork,featureLayer);
% Configure the training options.
% * Lower the learning rate to 1e-3 to stabilize training.
% * Set CheckpointPath to save detector checkpoints to a temporary
% location. If training is interrupted due to a system failure or
% power outage, you can resume training from the saved checkpoint.
options = trainingOptions('sgdm', ...
'MiniBatchSize', 34, ...
'InitialLearnRate',1e-3, ...
'MaxEpochs',30,...
'VerboseFrequency',2, ...
'CheckpointPath', tempdir);
%'LearnRateSchedule','piecewise', ...
%'LearnRateDropPeriod',10 , ...
%'Shuffle','every-epoch');
% Train YOLO v2 detector.
[detector,info] = trainYOLOv2ObjectDetector(trainingData,lgraph,options);
as seen in code i also tried with 'LearnRateSchedule' and 'Shuffle' and with different learnRate, batch size and epochs. and also getting same results.
this is an example of the one in code:
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 8).
Training on single CPU.
|========================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | RMSE | Loss | Rate |
|========================================================================================|
| 1 | 1 | 00:00:37 | 8.56 | 73.2 | 0.0010 |
| 1 | 2 | 00:01:14 | 3.55 | 12.6 | 0.0010 |
| 1 | 4 | 00:02:27 | 2.15 | 4.6 | 0.0010 |
| 2 | 6 | 00:03:44 | 2.81 | 7.9 | 0.0010 |
| 2 | 8 | 00:04:57 | 2.89 | 8.4 | 0.0010 |
| 2 | 10 | 00:06:10 | 2.91 | 8.5 | 0.0010 |
| 3 | 12 | 00:07:26 | 2.80 | 7.8 | 0.0010 |
| 3 | 14 | 00:08:39 | 2.65 | 7.0 | 0.0010 |
| 4 | 16 | 00:09:55 | 2.18 | 4.7 | 0.0010 |
| 4 | 18 | 00:11:08 | 2.23 | 5.0 | 0.0010 |
| 4 | 20 | 00:12:21 | 2.32 | 5.4 | 0.0010 |
| 5 | 22 | 00:13:37 | 2.40 | 5.8 | 0.0010 |
| 5 | 24 | 00:14:50 | 2.42 | 5.9 | 0.0010 |
| 6 | 26 | 00:16:06 | 2.53 | 6.4 | 0.0010 |
| 6 | 28 | 00:17:18 | 2.59 | 6.7 | 0.0010 |
| 6 | 30 | 00:18:31 | 2.37 | 5.6 | 0.0010 |
| 7 | 32 | 00:19:47 | 2.29 | 5.2 | 0.0010 |
| 7 | 34 | 00:20:59 | 2.34 | 5.5 | 0.0010 |
| 8 | 36 | 00:22:15 | 2.24 | 5.0 | 0.0010 |
| 8 | 38 | 00:23:28 | 2.69 | 7.2 | 0.0010 |
| 8 | 40 | 00:24:41 | 2.86 | 8.2 | 0.0010 |
| 9 | 42 | 00:25:56 | 1.63 | 2.7 | 0.0010 |
| 9 | 44 | 00:27:09 | 1.71 | 2.9 | 0.0010 |
| 10 | 46 | 00:28:25 | 1.65 | 2.7 | 0.0010 |
| 10 | 48 | 00:29:37 | 1.68 | 2.8 | 0.0010 |
| 10 | 50 | 00:30:50 | 1.65 | 2.7 | 0.0010 |
| 11 | 52 | 00:32:07 | 1.68 | 2.8 | 0.0010 |
| 11 | 54 | 00:33:20 | 1.71 | 2.9 | 0.0010 |
| 12 | 56 | 00:34:35 | 1.65 | 2.7 | 0.0010 |
| 12 | 58 | 00:35:47 | 1.63 | 2.7 | 0.0010 |
| 12 | 60 | 00:36:58 | 1.62 | 2.6 | 0.0010 |
| 13 | 62 | 00:38:13 | 1.70 | 2.9 | 0.0010 |
| 13 | 64 | 00:39:25 | 1.79 | 3.2 | 0.0010 |
| 14 | 66 | 00:40:40 | 1.66 | 2.8 | 0.0010 |
| 14 | 68 | 00:41:52 | 1.66 | 2.7 | 0.0010 |
| 14 | 70 | 00:43:04 | 2.08 | 4.3 | 0.0010 |
| 15 | 72 | 00:44:19 | 4.30 | 18.5 | 0.0010 |
| 15 | 74 | 00:45:30 | 9.76 | 95.2 | 0.0010 |
| 16 | 76 | 00:46:42 | 9.08 | 82.5 | 0.0010 |
| 16 | 78 | 00:47:54 | 8.59 | 73.8 | 0.0010 |
| 16 | 80 | 00:49:05 | 8.25 | 68.1 | 0.0010 |
| 17 | 82 | 00:50:17 | 8.10 | 65.6 | 0.0010 |
| 17 | 84 | 00:51:30 | 7.86 | 61.7 | 0.0010 |
| 18 | 86 | 00:52:41 | 7.09 | 50.2 | 0.0010 |
| 18 | 88 | 00:53:52 | 6.51 | 42.3 | 0.0010 |
| 18 | 90 | 00:55:04 | 6.66 | 44.4 | 0.0010 |
| 19 | 92 | 00:56:16 | 6.70 | 45.0 | 0.0010 |
| 19 | 94 | 00:57:27 | 6.65 | 44.2 | 0.0010 |
| 20 | 96 | 00:58:39 | 6.18 | 38.3 | 0.0010 |
| 20 | 98 | 00:59:50 | 5.88 | 34.6 | 0.0010 |
| 20 | 100 | 01:01:01 | 6.15 | 37.8 | 0.0010 |
| 21 | 102 | 01:02:13 | 5.88 | 34.5 | 0.0010 |
| 21 | 104 | 01:03:25 | 6.09 | 37.0 | 0.0010 |
| 22 | 106 | 01:04:37 | 6.14 | 37.7 | 0.0010 |
| 22 | 108 | 01:05:48 | 5.12 | 26.2 | 0.0010 |
| 22 | 110 | 01:06:59 | 5.99 | 35.9 | 0.0010 |
| 23 | 112 | 01:08:10 | 5.95 | 35.4 | 0.0010 |
| 23 | 114 | 01:09:21 | 6.21 | 38.6 | 0.0010 |
| 24 | 116 | 01:10:33 | 6.07 | 36.9 | 0.0010 |
| 24 | 118 | 01:11:44 | 5.80 | 33.7 | 0.0010 |
| 24 | 120 | 01:12:55 | 6.30 | 39.7 | 0.0010 |
| 25 | 122 | 01:14:07 | 5.90 | 34.9 | 0.0010 |
| 25 | 124 | 01:15:18 | 6.17 | 38.0 | 0.0010 |
| 26 | 126 | 01:16:31 | 5.85 | 34.2 | 0.0010 |
| 26 | 128 | 01:17:42 | 5.53 | 30.6 | 0.0010 |
| 26 | 130 | 01:18:53 | 5.91 | 35.0 | 0.0010 |
| 27 | 132 | 01:20:05 | 5.88 | 34.6 | 0.0010 |
| 27 | 134 | 01:21:16 | 6.14 | 37.8 | 0.0010 |
| 28 | 136 | 01:22:28 | 6.03 | 36.4 | 0.0010 |
| 28 | 138 | 01:23:40 | 5.26 | 27.6 | 0.0010 |
| 28 | 140 | 01:24:53 | 5.90 | 34.8 | 0.0010 |
| 29 | 142 | 01:26:04 | 5.86 | 34.3 | 0.0010 |
| 29 | 144 | 01:27:16 | 6.14 | 37.7 | 0.0010 |
| 30 | 146 | 01:28:28 | 5.60 | 31.3 | 0.0010 |
| 30 | 148 | 01:29:40 | 5.76 | 33.2 | 0.0010 |
| 30 | 150 | 01:30:52 | 5.89 | 34.7 | 0.0010 |
|========================================================================================|

Antworten (2)

ping.jiang
ping.jiang am 13 Jun. 2019
所以,你的问题是什么呢?

Zahra Moayed
Zahra Moayed am 5 Aug. 2019
I had the same issue but when I decided to choose [224 224 3] which is the input size of ResNet and then resize the anchorboxes, it finally worked. However it only worked with Single class.
I also used MiniBatchSize =16 and Shuffle=every-epoch but the main change was the input size

Kategorien

Mehr zu Deep Learning Toolbox finden Sie in Help Center und File Exchange

Produkte


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by