RandomSizedBBoxSafeCrop

Targets:

image

mask

bboxes

keypoints

volume

mask3d

Image Types:uint8, float32

Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.

This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.

Arguments

height

int

Height of the output image after resizing.

width

int

Width of the output image after resizing.

erosion_rate

float

A value between 0.0 and 1.0 that determines the minimum allowable size of the crop as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be at least 80% of the original image height and width. Default: 0.0 (no minimum size).

interpolation

0 | 6 | 1 | 2 | 3 | 4 | 5

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

mask_interpolation

0 | 6 | 1 | 2 | 3 | 4 | 5

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p

float

Probability of applying the transform. Default: 1.0.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (300, 300), dtype=np.uint8)
>>>
>>> # Create bounding boxes with some overlap and separation
>>> bboxes = np.array([
...     [10, 10, 80, 80],    # top-left box
...     [100, 100, 200, 200], # center box
...     [210, 210, 290, 290]  # bottom-right box
... ], dtype=np.float32)
>>> bbox_labels = ['cat', 'dog', 'bird']
>>>
>>> # Create keypoints inside the bounding boxes
>>> keypoints = np.array([
...     [45, 45],    # inside first box
...     [150, 150],  # inside second box
...     [250, 250]   # inside third box
... ], dtype=np.float32)
>>> keypoint_labels = ['nose', 'eye', 'tail']
>>>
>>> # Example 1: Basic usage with default parameters
>>> transform_basic = A.Compose([
...     A.RandomSizedBBoxSafeCrop(height=224, width=224, p=1.0),
... ], bbox_params=A.BboxParams(
...     format='pascal_voc',
...     label_fields=['bbox_labels']
... ), keypoint_params=A.KeypointParams(
...     format='xy',
...     label_fields=['keypoint_labels']
... ))
>>>
>>> # Apply the transform
>>> result_basic = transform_basic(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # Access the transformed data
>>> transformed_image = result_basic['image']  # Shape will be (224, 224, 3)
>>> transformed_mask = result_basic['mask']    # Shape will be (224, 224)
>>> transformed_bboxes = result_basic['bboxes']  # All original bounding boxes preserved
>>> transformed_bbox_labels = result_basic['bbox_labels']  # Original labels preserved
>>> transformed_keypoints = result_basic['keypoints']  # Keypoints adjusted to new coordinates
>>> transformed_keypoint_labels = result_basic['keypoint_labels']  # Original labels preserved
>>>
>>> # Example 2: With erosion_rate for more flexibility in crop placement
>>> transform_erosion = A.Compose([
...     A.RandomSizedBBoxSafeCrop(
...         height=256,
...         width=256,
...         erosion_rate=0.2,  # Allows 20% flexibility in crop placement
...         interpolation=cv2.INTER_CUBIC,  # Higher quality interpolation
...         mask_interpolation=cv2.INTER_NEAREST,  # Preserve mask edges
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(
...     format='pascal_voc',
...     label_fields=['bbox_labels'],
...     min_visibility=0.3  # Only keep bboxes with at least 30% visibility
... ), keypoint_params=A.KeypointParams(
...     format='xy',
...     label_fields=['keypoint_labels'],
...     remove_invisible=True  # Remove keypoints outside the crop
... ))
>>>
>>> # Apply the transform with erosion
>>> result_erosion = transform_erosion(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # With erosion_rate=0.2, the crop has more flexibility in placement
>>> # while still ensuring all bounding boxes are included

Notes

This transform ensures that all bounding boxes in the original image are fully contained within the cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out), it will default to cropping the entire image.
After cropping, the result is resized to the specified (height, width) size.
Bounding box coordinates are adjusted to match the new image size.
Keypoints are moved along with the crop and scaled to the new image size.
If there are no bounding boxes in the image, it will fall back to a random crop.