• AlbumentationsAlbumentations
All TransformsGet LicenseDocumentationNews & Insights
Report IssueJoin Discord...

Mosaic

Targets:
image
mask
bboxes
keypoints
Image Types:uint8, float32

Combine multiple images and annotations into one image via a mosaic grid. Uses metadata for additional images; common in object detection training.

Mosaic creates a grid of images by placing the primary image and additional images from metadata into cells of a larger canvas, then crops a region to produce the final output. This is commonly used in object detection training to increase data diversity and help models learn to detect objects at different scales and contexts.

The transform takes a primary input image (and its annotations) and combines it with additional images/annotations provided via metadata. It calculates the geometry for a mosaic grid, selects additional items, preprocesses annotations consistently (handling label encoding updates), applies geometric transformations, and assembles the final output.

Arguments
grid_yx
tuple[int, int]
[2,2]

The number of rows (y) and columns (x) in the mosaic grid. Determines the maximum number of images involved (grid_yx[0] * grid_yx[1]). Default: (2, 2).

target_size
tuple[int, int]
[512,512]

The desired output (height, width) for the final mosaic image. after cropping the mosaic grid.

cell_shape
tuple[int, int]
[512,512]

cell shape of each cell in the mosaic grid.

fit_mode
cover | contain
cover

How to fit images into mosaic cells.

  • "cover": Scale image to fill the entire cell, potentially cropping parts.
  • "contain": Scale image to fit entirely within the cell, potentially adding padding. Default: "cover".
metadata_key
str
mosaic_metadata

Key in the input dictionary specifying the list of additional data dictionaries for the mosaic. Each dictionary in the list should represent one potential additional item. Expected keys: 'image' (required, np.ndarray), and optionally 'mask' (np.ndarray), 'masks' (np.ndarray, stacked instance masks), 'bboxes' (np.ndarray), 'keypoints' (np.ndarray), and label fields supplied via the bbox_labels and keypoint_labels wrapper dicts (see Metadata Format below). Default: "mosaic_metadata".

center_range
tuple[float, float]
[0.3,0.7]

Range [0.0-1.0] to sample the center point of the mosaic view relative to the valid central region of the conceptual large grid. This affects which parts of the assembled grid are visible in the final crop. Default: (0.3, 0.7).

interpolation
0 | 6 | 1 | 2 | 3 | 4 | 5
1

OpenCV interpolation flag used for resizing images during geometric processing. Default: cv2.INTER_LINEAR.

mask_interpolation
0 | 6 | 1 | 2 | 3 | 4 | 5
0

OpenCV interpolation flag used for resizing masks during geometric processing. Default: cv2.INTER_NEAREST.

fill
tuple[float, ...] | float
0

Value used for padding images if needed during geometric processing. Default: 0.

fill_mask
tuple[float, ...] | float
0

Value used for padding masks if needed during geometric processing. Default: 0.

p
float
0.5

Probability of applying the transform. Default: 0.5.

Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data
>>> primary_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> primary_mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> primary_bboxes = np.array([[10, 10, 40, 40], [50, 50, 90, 90]], dtype=np.float32)
>>> primary_bbox_classes = [1, 2]
>>> primary_keypoints = np.array([[25, 25], [75, 75]], dtype=np.float32)
>>> primary_keypoint_classes = ['eye', 'nose']
>>>
>>> # Prepare additional images for mosaic.
>>> # bbox_labels and keypoint_labels are dicts mapping field name -> list of values.
>>> mosaic_metadata = [
...     {
...         'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8),
...         'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8),
...         'bboxes': np.array([[20, 20, 60, 60]], dtype=np.float32),
...         'bbox_labels': {'bbox_classes': [3]},
...         'keypoints': np.array([[40, 40]], dtype=np.float32),
...         'keypoint_labels': {'keypoint_classes': ['mouth']},
...     },
...     {
...         'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8),
...         'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8),
...         'bboxes': np.array([[30, 30, 70, 70]], dtype=np.float32),
...         'bbox_labels': {'bbox_classes': [4]},
...         'keypoints': np.array([[50, 50], [65, 65]], dtype=np.float32),
...         'keypoint_labels': {'keypoint_classes': ['eye', 'eye']},
...     },
... ]
>>>
>>> transform = A.Compose([
...     A.Mosaic(
...         grid_yx=(2, 2),
...         target_size=(200, 200),
...         cell_shape=(120, 120),
...         center_range=(0.4, 0.6),
...         fit_mode="cover",
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_classes']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_classes']))
>>>
>>> transformed = transform(
...     image=primary_image,
...     mask=primary_mask,
...     bboxes=primary_bboxes,
...     bbox_classes=primary_bbox_classes,
...     keypoints=primary_keypoints,
...     keypoint_classes=primary_keypoint_classes,
...     mosaic_metadata=mosaic_metadata,
... )
>>>
>>> mosaic_image = transformed['image']
>>> mosaic_bboxes = transformed['bboxes']
>>> mosaic_bbox_classes = transformed['bbox_classes']
>>> mosaic_keypoint_classes = transformed['keypoint_classes']
Notes

If fewer additional images are provided than needed to fill the grid, the primary image will be replicated to fill the remaining cells. For example, with a 2x2 grid, if only one additional image is provided, the mosaic will contain the primary image in two cells and the additional image in one cell, with one visible cell selected from these three. Stacked instance masks on the masks key (N, H, W) are transformed via apply_to_masks like other DualTransforms; _targets only lists Targets enum values (no Targets.MASKS).