Training Insight Reports
Last updated
Last updated
The Training Insight Report is a guide that helps you identify ways to improve your model’s training performance. Details of your dataset and your model’s training progress are all provided in the report. Understanding the report for the first time can be challenging. This tutorial aims to walk you through the report section by section. By the end, you will have a good understanding of how to read the report and adjust your training flows accordingly.
The first step of diagnosing our model starts with checking the dataset. Let’s consider a dataset containing pets as an example.
Suppose we trained a multi-class classification model to predict dogs, cats, birds, and fish. We want to improve the model’s performance. What do we do?
Using the Insight Summary, we can see that within our dataset, the bird and fish classes are under-represented. We need to increase the number of images for birds and fish so that the dataset becomes well balanced (like the case with dogs and cats).
We suggest starting with the seriously under-represented classes first (if there are any), and then move on to the mildly under-represented classes next. If all your classes are well represented, then there’s nothing to worry about at this step.
Q: How many images do I add?
Shown below are several possibilities of how your dataset could look like.
Dataset A: What we just mentioned.
Dataset B: This dataset is slightly better than A since there aren’t any seriously under-represented classes, but still there are two mildly under-represented classes. Try adding more images to the mildly under-represented classes.
Dataset C: It’s already a good enough dataset since every class has a sample size larger than the Cutoff Threshold. However, there is still room for improvement if you are willing to put in more effort.
Dataset D: This would be the ideal option since all the classes are equally represented (all near ~25%).
What was discussed above was for multi-class classification. We can extend the same principles to multi-label classification, object detection, and semantic segmentation as well.
Multi-label Classification:
The Insight Summary plot is generated by counting the number of images that contain each label. For example, Fig-3a will count as three positive labels: dog, cat and bird towards the entire dataset. To increase the amount of labels for class X requires adding images that contain a positive label for class X in one of its multiple labels.
Object Detection:
Insight Summary will count the number of bounding boxes that belong to each class. For example, Fig-3b will count one dog, one cat, and two bird bounding boxes. To increase the amount of labels for class X would require adding images that have a large amount of bounding boxes of class X.
Semantic Segmentation:
Insight Summary will count the number of images that contain masks for each class. For example, Fig-3c will count a mask for a dog, a mask for a cat, and a mask for two birds (the masks of the two birds are all contained in one layer). To increase the amount of labels for class X would require adding images that have the mask for class X.
A: Try to add more images to the under-represented class to surpass the Cutoff Threshold (e.g. 56 labels for our example) shown in the plot.
The next step to diagnosing our model is to check the Training and Validation Losses within the Training Curves section. Below we show several possible scenarios you might encounter during training.
Curves A: Both training and validation losses are decreasing. You are doing fine!
Curves B: You should try reducing the learning rate of your model (or decrease the value of the learning rate range for PBT).
Curves C: You should try adding some data augmentations to your training flow. You should also check if your dataset is imbalanced (as it can cause an imbalance split of your dataset). If so, try to add more data using the Insight Summary report.
Curves D: You can try adjusting your learning rate (try increasing first then decreasing). You can also try using a small subset of your data and try to see if you can get the training loss to decrease first.
For non-loss metrics (e.g. accuracy, mean average precision, dice), the general rule is that the higher the value of the metric the better. The definitions of the metrics are given in https://console.deepq.ai/docs/console/account-management/deep-learning-metrics-explained.
The information provided by the Insight Summary and the Training Curves should be enough to resolve the majority of issues relating to model training. If your model still fails to improve, the underlying issue could possibly be related to your dataset distribution (not just the issue of label imbalance).
Data Statistics provides you with a side by side comparison between the training and validation set distribution (training and validation sets are automatically split prior to AIP v2.2). An accompanying table that details the value for each Data Statistic plot is also given in the Tables section. We will teach you how to read the report and what adjustments to make in the following sections.
Multi-Class Classification:
In the case of multi-class classification, each image only has one label. Similar to the example above, Data Statistics for multi-class classification shows a bar graph resembling the percentage each class occupies in the dataset. Hence, the only issue to consider here is if the labels are evenly distributed.
Multi-Label Classification:
In the case of multi-label classification, each image may contain more than one label. We will first explain how we generate the Data Statistics plot using an example.
Consider the following example:
For class X, we count the number of images that have a label for class X, and we divide that number by the total number of images in the dataset to obtain the value of each bar.
To increase the number of labels for class X, we add more images that have class X (and hopefully avoid adding too much of the other classes at the same time).
Object Detection:
In the problem of object detection, each image may contain one or more bounding boxes. We can analyze the distribution of these bounding boxes as either multi-label labels of the image or crops of images. Consider the following example:
You can treat bounding boxes as multi-label labels by setting each class that has more than one bounding box as positive in the multi-label label format. You can also consider crops of each bounding box as an image in each of its own class labels. Using the same Data Statistics analysis we developed for multi-class and multi-label classification, we can generate the label distribution and bounding box labels bar charts.
In addition to label distribution plots, Data Statistics for object detection also includes analysis plots that show bounding box related statistics. For each bounding box, we compute its area relative to the image it belongs to (a percentage). We also compute the bounding box ratio by dividing the bounding box’s width by its height (see Fig-9).
We also compute the average number of bounding boxes in each image (see Fig-10). These statistics are then organized in the Bounding Box Area, Bounding Box Ratio, and Bounding Box Per Img bar charts.
You may use these bounding box statistics plots to understand your dataset’s bounding box distribution. These bounding box statistic plots, however, don’t necessarily have to be balanced for training to progress well. For object detection, the main metric (and associated plot) we suggest to achieve a balance in would be bounding box labels (Label distribution for object detection is already given in the Insight Summary). For a class X that has a small amount of bounding boxes out of all bounding boxes, we suggest adding images that have more bounding boxes of class X.
Semantic Segmentation:
In semantic segmentation, each image contains multiple masks, each belonging to a different class. If an image doesn’t have a mask for a certain class, you can think of that class’s mask as being a “zero-mask” (a complete transparent mask). This is similar to how labels are stored in the case of multi-label classification. Hence, we analyze the mask label distribution similar to the case of multi-label classification (see Fig-11). Using the same technique shown in Object Detection, we generate the multi-label label distribution for the segmentation masks.
In addition to the label distribution, we compute the area the masks take up in the image (relative to the original image). We compute the area each class’s mask takes up (see Fig-12), and we organize by class the mask area statistics (see Fig-13).
The mask area statistics allows you to understand how much area the segmentation masks for each class occupies generally. We provide the mask area information class-wise. It is not necessary for the mask area distribution to necessarily be balanced for training to progress well. We suggest, rather, to aim for balance in the label distribution (provided in the Insight Report). For classes that don’t have enough masks, we suggest adding more images that have non-zero masks to balance the dataset.