Perspective

Targets:
image
mask
bboxes
keypoints
volume
mask3d
Image Types:uint8, float32

Apply random four point perspective transformation to the input.

Arguments
scale
tuple[float, float] | float
[0.05,0.1]

Standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).

keep_size
bool
true

Whether to resize image back to its original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes. Default: True.

border_mode
0 | 1 | 2 | 3 | 4
0

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill
tuple[float, ...] | float
0

Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask
tuple[float, ...] | float
0

Padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fit_output
bool
false

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. This is followed by image resizing if keep_size is set to True. If False, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False.

interpolation
0 | 1 | 2 | 3 | 4
1

Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR

mask_interpolation
0 | 1 | 2 | 3 | 4
0

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p
float
0.5

Probability of applying the transform. Default: 0.5.

Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples when possible
>>> transform = A.Compose([
...     A.Perspective(
...         scale=(0.05, 0.1),
...         keep_size=True,
...         fit_output=False,
...         border_mode=cv2.BORDER_CONSTANT,
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image']      # Perspective-transformed image
>>> transformed_mask = transformed['mask']        # Perspective-transformed mask
>>> transformed_bboxes = transformed['bboxes']    # Perspective-transformed bounding boxes
>>> transformed_bbox_labels = transformed['bbox_labels']  # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints']  # Perspective-transformed keypoints
>>> transformed_keypoint_labels = transformed['keypoint_labels']  # Labels for transformed keypoints
Notes

This transformation creates a perspective effect by randomly moving the four corners of the image. The amount of movement is controlled by the 'scale' parameter.

When 'keep_size' is True, the output image will have the same size as the input image, which may cause some parts of the transformed image to be cut off or padded.

When 'fit_output' is True, the transformation ensures that the entire transformed image is visible, which may result in a larger output image if keep_size is False.