Dataset Quality Checks

The Dataset Quality Checks page provides valuable insights into the quality and effectiveness of your annotated dataset, enabling you to assess its suitability for training high-performance computer vision models. Through various metrics and analysis, you can identify areas for improvement and optimize your dataset to enhance model performance.

Metrics Overview

Create workspace

As you annotate your dataset, we compute several metrics to evaluate its quality and effectiveness. These metrics include:

  1. Labels Distribution: Distribution of annotations across different labels or categories within the dataset, indicating the diversity and coverage of annotated objects.

  2. Class Imbalance: Assessment of class distribution to identify overrepresented, balanced, or underrepresented classes within the dataset. Class imbalance affects model training and may lead to biased predictions.

  3. Annotation Bounding Box Area: Analysis of the distribution of bounding box areas within annotated objects, providing insights into object sizes and spatial characteristics.

  4. Image Counts: Total number of images included in the dataset, indicating the dataset’s scale and scope.

  5. Annotation Counts: Total number of annotations present in the dataset, reflecting the level of annotation detail and complexity.

  6. Missing Annotations: Identification of images or objects within images that lack annotations, highlighting potential gaps in the dataset coverage.

  7. Image Pixel Density: Analysis of image resolution and pixel density, influencing model performance and inference speed.

Interpreting Class Imbalance

  • Overrepresented: Classes with a disproportionately large number of annotations compared to others. Overrepresented classes may lead to model bias and reduced generalization performance.

  • Balanced: Classes with a relatively even distribution of annotations, facilitating fair model training and prediction across all classes.

  • Underrepresented: Classes with a disproportionately small number of annotations compared to others. Underrepresented classes may result in insufficient training data and lower model accuracy for those classes.

Optimizing Dataset Quality

Use the insights gained from dataset quality checks to optimize your dataset for improved model performance:

  • Address Class Imbalance: Mitigate class imbalance by collecting additional data for underrepresented classes or applying data augmentation techniques.

  • Enhance Annotation Detail: Ensure comprehensive annotation coverage across all relevant object classes and image regions.

  • Augment Dataset: Augment the dataset with diverse and representative images to improve model robustness and generalization.

  • Review Missing Annotations: Identify and address missing annotations to ensure dataset completeness and effectiveness for model training.

Get Started

Regularly review the Dataset Quality Checks page to monitor and improve the quality of your annotated dataset. By optimizing dataset quality, you can enhance model performance and achieve superior results in your computer vision tasks. Should you have any questions or need assistance interpreting dataset metrics, our support team is available to provide guidance and assistance. We’re committed to helping you build high-performance computer vision models through effective dataset management and optimization.