Random crop with scale and ratio ranges (torchvision-style), then resize to size. Standard for training on varying resolutions; scale and ratio control crop.
This transform first crops a random portion of the input image (or mask, bounding boxes, keypoints) and then resizes the crop to a specified size. It's particularly useful for training neural networks on images of varying sizes and aspect ratios.
sizeTarget size for the output image, i.e. (height, width) after crop and resize.
scaleRange of the random size of the crop relative to the input size. For example, (0.08, 1.0) means the crop size will be between 8% and 100% of the input size. Default: (0.08, 1.0)
ratioRange of aspect ratios of the random crop. For example, (0.75, 1.3333) allows crop aspect ratios from 3:4 to 4:3. Default: (0.75, 1.3333333333333333)
interpolationFlag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR
mask_interpolationFlag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST
area_for_downscaleControls automatic use of INTER_AREA interpolation for downscaling. Options:
pProbability of applying the transform. Default: 1.0
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples
>>> transform = A.Compose([
... A.RandomResizedCrop(
... size=(64, 64),
... scale=(0.5, 0.9), # Crop size will be 50-90% of original image
... ratio=(0.75, 1.33), # Aspect ratio will vary from 3:4 to 4:3
... interpolation=cv2.INTER_LINEAR,
... mask_interpolation=cv2.INTER_NEAREST,
... area_for_downscale="image", # Use INTER_AREA for image downscaling
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Shape: (64, 64, 3)
>>> transformed_mask = transformed['mask'] # Shape: (64, 64)
>>> transformed_bboxes = transformed['bboxes'] # Bounding boxes adjusted to new crop and size
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for the preserved bboxes
>>> transformed_keypoints = transformed['keypoints'] # Keypoints adjusted to new crop and size
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for the preserved keypoints