← Back to all transforms

RandomSizedBBoxSafeCrop

Description

Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.

    This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes
    remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for
    object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.

    Args:
        height (int): Height of the output image after resizing.
        width (int): Width of the output image after resizing.
        erosion_rate (float): A value between 0.0 and 1.0 that determines the minimum allowable size of the crop
            as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be
            at least 80% of the original image height and width. Default: 0.0 (no minimum size).
        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
            Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_NEAREST.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - This transform ensures that all bounding boxes in the original image are fully contained within the
          cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out),
          it will default to cropping the entire image.
        - After cropping, the result is resized to the specified (height, width) size.
        - Bounding box coordinates are adjusted to match the new image size.
        - Keypoints are moved along with the crop and scaled to the new image size.
        - If there are no bounding boxes in the image, it will fall back to a random crop.

    Mathematical Details:
        1. A crop region is selected that includes all bounding boxes.
        2. The crop size is determined by the erosion_rate:
           min_crop_size = (1 - erosion_rate) * original_size
        3. If the selected crop is smaller than min_crop_size, it's expanded to meet this requirement.
        4. The crop is then resized to the specified (height, width) size.
        5. Bounding box coordinates are transformed to match the new image size:
           new_coord = (old_coord - crop_start) * (new_size / crop_size)

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
        >>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
        >>> transform = A.Compose([
        ...     A.RandomSizedBBoxSafeCrop(height=224, width=224, erosion_rate=0.2, p=1.0),
        ... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
        >>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
        >>> transformed_image = transformed['image']
        >>> transformed_bboxes = transformed['bboxes']
        # transformed_image will be a 224x224 image containing all original bounding boxes,
        # with their coordinates adjusted to the new image size.
    

Parameters

  • height: int (default: null)
  • width: int (default: null)
  • erosion_rate: float (default: 0)
  • interpolation: Literal['cv2.INTER_NEAREST', 'cv2.INTER_LINEAR', 'cv2.INTER_CUBIC', 'cv2.INTER_AREA', 'cv2.INTER_LANCZOS4', 'cv2.INTER_BITS', 'cv2.INTER_NEAREST_EXACT', 'cv2.INTER_MAX'] (default: 1)
  • mask_interpolation: Literal['cv2.INTER_NEAREST', 'cv2.INTER_LINEAR', 'cv2.INTER_CUBIC', 'cv2.INTER_AREA', 'cv2.INTER_LANCZOS4', 'cv2.INTER_BITS', 'cv2.INTER_NEAREST_EXACT', 'cv2.INTER_MAX'] (default: 0)
  • p: float (default: 1)

Targets

  • Image
  • Mask
  • BBoxes
  • Keypoints

Try it out

Original Image:

Original Image: (733, 484, 3)

Original Image

Bbox Params

Keypoint Params

Mask: (733, 484, 3)

Mask

BBoxes: (733, 484, 3)

BBoxes

Keypoints: (733, 484, 3)

Keypoints