Perspective

Targets:

image

mask

bboxes

keypoints

Image Types:uint8, float32

Apply random four-point perspective transformation. Params: scale, keep_size, border_mode, fill, interpolation. Supports image, mask, bboxes, keypoints.

Arguments

scale

tuple[float, float]

[0.05,0.1]

Standard deviation range (low, high) for the normal distribution used to sample random corner displacements (as a fraction of image size). Must be a non-decreasing 2-tuple of non-negative floats. Default: (0.05, 0.1).

keep_size

bool

true

Whether to resize image back to its original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes. Default: True.

border_mode

0 | 1 | 2 | 3 | 4

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill

tuple[float, ...] | float

Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask

tuple[float, ...] | float

Padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fit_output

bool

false

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. This is followed by image resizing if keep_size is set to True. If False, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False.

interpolation

0 | 1 | 2 | 3 | 4

Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR

mask_interpolation

0 | 1 | 2 | 3 | 4

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p

float

0.5

Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples when possible
>>> transform = A.Compose([
...     A.Perspective(
...         scale=(0.05, 0.1),
...         keep_size=True,
...         fit_output=False,
...         border_mode=cv2.BORDER_CONSTANT,
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image']      # Perspective-transformed image
>>> transformed_mask = transformed['mask']        # Perspective-transformed mask
>>> transformed_bboxes = transformed['bboxes']    # Perspective-transformed bounding boxes
>>> transformed_bbox_labels = transformed['bbox_labels']  # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints']  # Perspective-transformed keypoints
>>> transformed_keypoint_labels = transformed['keypoint_labels']  # Labels for transformed keypoints

Notes

This transformation creates a perspective effect by randomly moving the four corners of the image. The amount of movement is controlled by the 'scale' parameter.

When 'keep_size' is True, the output image will have the same size as the input image, which may cause some parts of the transformed image to be cut off or padded.

When 'fit_output' is True, the transformation ensures that the entire transformed image is visible, which may result in a larger output image if keep_size is False.

>>> import numpy as np >>> import albumentations as A >>> import cv2 >>> >>> # Prepare sample data >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8) >>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8) >>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32) >>> bbox_labels = [1, 2] >>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32) >>> keypoint_labels = [0, 1] >>> >>> # Define transform with parameters as tuples when possible >>> transform = A.Compose([ ... A.Perspective( ... scale=(0.05, 0.1), ... keep_size=True, ... fit_output=False, ... border_mode=cv2.BORDER_CONSTANT, ... p=1.0 ... ), ... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']), ... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels'])) >>> >>> # Apply the transform >>> transformed = transform( ... image=image, ... mask=mask, ... bboxes=bboxes, ... bbox_labels=bbox_labels, ... keypoints=keypoints, ... keypoint_labels=keypoint_labels ... ) >>> >>> # Get the transformed data >>> transformed_image = transformed['image'] # Perspective-transformed image >>> transformed_mask = transformed['mask'] # Perspective-transformed mask >>> transformed_bboxes = transformed['bboxes'] # Perspective-transformed bounding boxes >>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes >>> transformed_keypoints = transformed['keypoints'] # Perspective-transformed keypoints >>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypoints