The Train New Model page facilitates the training of new computer vision models using annotated datasets, streamlining the process with intuitive configuration options and automated execution. Follow the steps outlined below to train your new model efficiently.

Training Process Overview

  1. Review Annotations: Navigate to the dataset table to review the annotations of your dataset. Ensure that the annotations accurately represent the objects or classes of interest in your images.

  2. Create New Model: Once you are satisfied with the annotations, click on “Create New Model” to initiate the model training process.

  3. Dataset Overview: In the first step of the configuration form, you will be provided with an overview of the dataset to be used for training. Review this information to ensure it corresponds to your expectations.

  4. Label Selection: In the next step, all annotation labels from the dataset will be populated for you to choose which ones to include in the model training. Select the relevant labels based on your project requirements.

  5. Dataset Train/Validation Split: Specify the dataset train/validation split to determine the proportion of the dataset reserved for validation after model training. This configuration helps assess model performance on unseen data.

  6. Initiate Model Training: Once you have configured the training parameters to your satisfaction, click on “Train Model” to start the model training process. The training will be executed as a background task, and everything is automated, so you don’t need to monitor it actively.

  7. Training Duration: The duration of model training can vary depending on the size of the dataset and the complexity of the model architecture. Training may take anywhere from 1 hour to 12 hours to complete.

  8. Email Notification: You will receive an email notification once the model training is completed, informing you that the model is ready for use.

Dataset Split: Training and Validation Split

Dataset split

In the model training process, dividing the dataset into training and validation sets is essential for assessing the model’s performance and generalization ability. The training set is used to train the model parameters, while the validation set is used to evaluate the model’s performance on unseen data. Here’s a closer look at the training and validation split:

Training Set

  • The training set comprises a portion of the annotated dataset used to train the model’s parameters.
  • During training, the model learns from the images and their corresponding annotations in the training set to optimize its performance.
  • A larger training set can lead to better model performance, as it provides more diverse examples for the model to learn from.
  • However, including too much data in the training set may result in longer training times and potential overfitting if the model memorizes the training data without generalizing well to unseen data.

Validation Set

  • The validation set is a separate portion of the dataset reserved for evaluating the model’s performance during training.
  • After each training epoch or iteration, the model’s performance is evaluated using the validation set to assess its ability to generalize to new, unseen data.
  • The validation set helps prevent overfitting by providing an independent evaluation of the model’s performance on data that it hasn’t been trained on.
  • It is crucial to ensure that the validation set is representative of the real-world data distribution to obtain reliable performance metrics.

Dataset Split Configuration

  • When configuring the training parameters for model training, you specify the dataset split between training and validation sets.
  • The dataset split percentage determines the proportion of the dataset allocated to the training and validation sets. Common split ratios include 80/20 (80% training, 20% validation) or 70/30 (70% training, 30% validation).
  • The choice of split ratio depends on factors such as the size of the dataset, the complexity of the model, and the desired trade-off between training time and model performance evaluation.
  • It is essential to strike a balance between having enough data for training and ensuring sufficient data for reliable model evaluation on the validation set.

By carefully configuring the dataset split and utilizing both training and validation sets in the model training process, you can effectively assess and optimize your computer vision models for superior performance and generalization to unseen data.