Training a Yolo object segmentation model for your needs

Ultralytics YoloV8 is one of the easiest path, but still it is a lot of ground to cover!

8 min readJan 4, 2024

TLDR: all source is available on this github gist.

Introduction

Ultralytics team did an incredible effort to make creating custom YOLO models really easy. However, dealing with large datasets is still painful. Training a yolo segmentation model requires the dataset to have their specific format, which might not be exactly what you get from big datasets. That’s the case if you want to use the huge OpenImagesV7 as your source of images and labels.

In this tutorial we are going to cover how to fetch data (images and segmentation masks) from OpenImagesV7; how to convert it to YOLO format (that’s the most complex part of this tutorial); and just a sneak peak on how to train a yolov8-seg model using our dataset.

Environment

To be crystal clear: this tutorial requires Python 3 (tested under 3.10). As the base image I am using the AWS Sagemaker conda_pytorch_p310 , which contains PyTorch and a lot of usual the tools like Numpy and OpenCV. Still, we need some extra packages to cover it:

sudo yum install -y openssl-devel openssl11-libs libcurl
pip install --upgrade pip setuptools wheel
pip install fiftyone
pip install fiftyone-db-rhel7 --force-reinstall
pip install shapely polars
pip install ultralytics

Dataset

We are going to use Google OpenImages Dataset v7 for training our model. This dataset is huge, having millions of images, aiming for a range of Computer Vision tasks such as Object Detection, Classification and Instance Segmentation. So each image is paired with labels for those tasks, covering a multitude of the most common objects: people, faces, dogs, cats, cars, trees, etc. Notice that not all labels are available to all tasks, so it pays off doing some exploration on the dataset website to understand to what extension it covers up the model you are trying to build. It is likely to be enough for the majority of Computer Vision problems.

So, to actually fetch the data there is this awesome tool named FiftyOne that is capable of downloading just the data you need. This avoids the burden of downloading and handling the entire dataset. Did I already mentioned it is huge? It would be really hard to handle it without FiftyOne.

First step is selecting a good location to download the dataset. I am assuming the defaults for a Sagemaker environment, but pick whatever suits you better. Just keep in mind that inside of this path it will be created the folder open-images-v7 automatically, to hold the data.

# choose your prefered path to download the dataset
# a folder named open-images-v7 will be created automatically inside of it
dataset_path = '/home/ec2-user/SageMaker/dataset'

import os
import torch
import torchvision

import fiftyone as fo
fo.config.default_ml_backend = "torch"
fo.config.dataset_zoo_dir = dataset_path

The next function is just to make our script more clean and yet more flexible. We could simply put all the labels we want into the fiftyone downloader function and let it decides how many samples of each label to fetch. However, here I need more control over how many samples of each class to download. Naturally, fiftyone will only download the numbers we ask for if it is actually available, otherwise it will fetch as much as possible.

def download_dataset(split, classes, max_samples=None):
    print(f'>> Split: {split}, classes: {classes}, max_samples: {max_samples}')
    return fo.zoo.load_zoo_dataset(
            "open-images-v7",
            label_types=["segmentations"],
            drop_existing_dataset=False,
            split=split,
            classes=classes,
            max_samples=max_samples,
    )

Before calling the downloader function, let’s first describe the labels we need and how many of each. Notice that if we put None we achieve the “as much as possible” behavior without the need of guessing a big random number. To keep it simple we are going to fetch samples containing people or cars. A maximum of a thousand people samples and as many as possible of cars. And we want it as 70% for training, 20% validation and 10% test; these percentages are not guaranteed, given there might not be enough samples available on every subset.

target_split = {'train': 0.7, 'validation': 0.2, 'test': 0.1}
target_classes = {
    "Person": 1_000,
    "Car": None
}

Now we can run the downloader function. To this end we are going to iterate through the target classes we defined above.

for cls_name, total in target_classes.items():
    for split_name, split_pct in target_split.items():
        max_samples = int(total * split_pct) if total is not None else None
        download_dataset(split=split_name, classes=[cls_name], max_samples=max_samples)

The script might take a while to complete, even inside the robust AWS Sagemaker environment. Naturally, the time is proportional to the amount of samples requested. After downloading the dataset we are still not ready to use it with yolo. Data comes as color images, paired with a set of segmentation masks (grayscale images), one mask for each object in the image. However, yolo requires the color image paired to a text file describing, on each line, the class of a object and the coordinates that describe its polygon. Please take a look on yolo documentation. This leads to our next chapter on how to convert the dataset to a format friendly to YoloV8.

Convert OpenImagesV7 to Yolo Segmentation

The raw segmentation labels are provided as grayscale images. As it was mentioned before, Yolo requires segmentation labels to be in a text file, containing a line for each object in the image, following the pattern: object class id then a XY list describing the object polygon.

This conversion is require some tools for loading images, transforming masks to polygons, reducing the polygon complexity and writing down the polygons on text file. So let’s do this step by step, in a new and clean notebook or python script. First a basic configuration:

# base path must be the same as our previous dataset_path
base_path = '/home/ec2-user/SageMaker/dataset'

# destination of the converted dataset
target_path = '/home/ec2-user/SageMaker/dataset-yolo'

# a list with same keys as on fetch.py
target_classes = [
  "Person",
  "Car"
]

import os
import cv2
import yaml
import shutil
import pandas as pd
import polars as pl
import multiprocessing
import numpy as np
from tqdm import tqdm
from joblib import Parallel, delayed
from shapely.geometry import Polygon
from matplotlib import pyplot as plt

Now we define some functions to do the actual conversion from a grayscale mask image to a simplified polygon, and then to a XY list:

### Mask to Poly ###

def mask_to_polygon(mask_path):
    mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
    contours, _ = cv2.findContours(
        mask,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE
    )

    polygons = []
    for contour in contours:
        polygon = contour.reshape(-1, 2)
        polygon_norm = polygon.astype(float)
        polygon_norm[:, 0] /= mask.shape[1]  # X
        polygon_norm[:, 1] /= mask.shape[0]  # Y
        polygon_norm = np.round(polygon_norm, 4)

        polygon_shapely = Polygon(polygon_norm)
        polygon_simplified = polygon_shapely.simplify(0.002, preserve_topology=True)
        polygons.append(polygon_simplified)

    return polygons

def polygon_to_yolo(polygon):
    x, y = polygon.exterior.coords.xy
    xy = []
    for xx, yy in zip(x, y):
        xy.append(xx)
        xy.append(yy)
    return xy

def polygon_to_mask(polygon, shape):
    mk = np.zeros(shape, dtype=np.uint8)
    x, y = polygon.exterior.coords.xy
    xy = [
        [int(xx * shape[1]), int(yy * shape[0])]
        for xx, yy in zip(x, y)
    ]
    cv2.fillConvexPoly(mk, np.array(xy, dtype='int32'), color=255)
    return mk

Having the image processing tools in place, the next step is to load the labels locations. This will just iterate over the database filesystem looking over image files, and for each image it will look into the segmentation index and fetch the corresponding mask image location. Images are still not open up to this point. Fun fact: polars was employed here instead of pandas due to the huge difference on processing speed for this simple task.

### loading openimagesv7 labels ###

class_list_filepath = os.path.join(base_path, 'train/metadata/classes.csv')
class_df = pd.read_csv(class_list_filepath, header=None, names=['URI', 'ClassName'])
class_map_r = dict(zip(class_df.URI, class_df.ClassName))
class_map_r = {k: v for k, v in class_map_r.items() if v in target_classes}

# convert from openimagev7 label hash to an integer
class_map = { k: i for i, k in enumerate(list(class_map_r.keys()))}
# class_map = {
# '/m/01g317':  0, # 'Person'
# '/m/0k4j':    1, # 'Car'
# }
print('class_map:')
print(class_map)

def get_image_file_names(directory):
    image_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp']  # Add more extensions if needed
    image_file_names = set()

    for filename in os.listdir(directory):
        nm, ext = os.path.splitext(filename)
        if ext in image_extensions:
            image_file_names.add(nm)

    return image_file_names
  
def load_labels(split_name):
    df = pl.read_csv(os.path.join(base_path, split_name, 'labels/segmentations.csv'))
    df = df[['MaskPath', 'ImageID', 'LabelName']]

    image_ids = get_image_file_names(os.path.join(base_path, split_name, 'data'))
    df = df.filter(pl.col('ImageID').is_in(image_ids))
    
    target_ids = set(class_map.keys())
    df = df.filter(pl.col('LabelName').is_in(target_ids))

    df = df.with_columns(pl.col('MaskPath').map_elements(lambda x: x[0].upper()).alias('Subdir'))
    df = df.with_columns((base_path + f'{split_name}/labels/masks/' + pl.col('Subdir') + '/' + pl.col('MaskPath')).alias('MaskFullPath'))
    
    df = df.with_columns(pl.col(['LabelName']).map_dict(class_map).alias('LabelID'))

    return df


train_df = load_labels('train')    
valid_df = load_labels('validation')
test_df = load_labels('test')

Finally, it is the time to iterate over all the mask images, convert it to a XY polygon list and write results in a text file following yolo standards.

def macro_mask2yolopoly(p):
    try:
        poly = mask_to_polygon(p)
        xy = polygon_to_yolo(poly[0])
        return xy
    except:
        return []
    return []


def conv_mask_xy(df):
    return df.with_columns(
        pl.col('MaskFullPath').map_elements(
            lambda p: macro_mask2yolopoly(p)
        ).alias('XY')
    )

train_df = conv_mask_xy(train_df)
valid_df = conv_mask_xy(valid_df)
test_df = conv_mask_xy(test_df)

def write_yolo_labels(df, subset, persistence=True):
    df = df.filter(pl.col('XY').map_elements(len) > 0)
    
    df = df.with_columns(
        pl.col('XY').map_elements(lambda xy: xy.map_elements(lambda e: str(e))).list.join(' ').alias('TXY'))
    df = df.with_columns(
        (pl.col('LabelID').cast(pl.Utf8) + ' ' + pl.col('TXY')).alias('Sample'))
    
    g = df.group_by('ImageID').agg(['Sample'])
    g = g.with_columns(pl.col('Sample').list.join('\n').alias('StrSamples'))
    g = g.with_columns((target_path + subset + '/' + pl.col('ImageID') + '.txt').alias('Path'))

    if persistence:
        os.makedirs(os.path.join(target_path, subset), exist_ok=True)
        for row in g.iter_rows(named=True):
            with open(row['Path'], 'w') as f:
                f.write(row['StrSamples'])

    return g
            
train_df = write_yolo_labels(train_df, 'train')
valid_df = write_yolo_labels(valid_df, 'validation')
test_df = write_yolo_labels(test_df, 'test')

Well, our new dataset ended up split into images and yolo labels on different locations. So to unify we need to copy over the downloaded images to the target location, together with our labels. There are a few possibilities here, like creating the labels inside the openimagesv7 folder, moving images around, etc. I prefer to copy and keep the original just in case the script fails, later I can simply remove the entire openimages cache to free up some space.

def copy_data(df, subset):
    for iid in df.select(pl.col('ImageID')).get_columns()[0].to_list():
        try:
            fnm = f"{iid}.jpg"
            src = os.path.join(base_path, subset, "data", fnm)
            dst = os.path.join(target_path, subset)
            # print(f'{src} -> {dst}')
            shutil.copy2(src, dst)
        except:
            continue

copy_data(valid_df, 'validation')
copy_data(test_df, 'test')
copy_data(train_df, 'train')

A Yolo dataset is only complete with a YAML file describing it. It is easy to create one, just like this:

from pathlib import Path

yaml_content = f'''
path: /home/ec2-user/SageMaker/dataset-yolo
train: train
val: validation
test: test

# Classes - use the class_map as guide
names:
  0: person
  1: car
'''

with Path(os.path.join(target_path, 'seg_dataset.yaml')).open('w') as f:
    f.write(yaml_content)

Training the YoloV8-Seg model

The hardest part of this tutorial was getting the dataset in a good shape for Yolo. I will keep this session short given there are better tutorials on how to train a Yolo model out there. Please also take a look into Ultralytics documentation. But just for you not to complain this is a half baked tutorial, here we go:

# on the command line (CLI)
yolo segment train data=/home/ec2-user/SageMaker/dataset-yolo/seg_dataset.yaml model=yolov8n-seg.pt epochs=100 imgsz=640

This will use a pre-trained yolov8n-seg.pt as a base to our custom segmentation model, and will update over 100 epochs. The output of models is likely to be on ./runs/exp/ . Take a look into the docs if that is not the case.

Standing on the shoulders of giants (SSG)