Affine
Augmentation to apply affine transformations to images.
Affine transformations involve:
- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)
All such transformations can create "new" pixels in the image without a defined content, e.g.
if the image is translated to the left, pixels are created on the right.
A method has to be defined to deal with these pixel values.
The parameters fill and fill_mask of this class deal with this.
Some transformations involve interpolations between several pixels
of the input image to generate output pixel values. The parameters interpolation and
mask_interpolation deals with the method of interpolation used for this.
scaleScaling factor to use, where 1.0 denotes "no change" and
0.5 is zoomed out to 50 percent of the original size.
- If a single number, then that value will be used for all images.
- If a tuple
(a, b), then a value will be uniformly sampled per image from the interval[a, b]. That the same range will be used for both x- and y-axis. To keep the aspect ratio, setkeep_ratio=True, then the same value will be used for both x- and y-axis. - If a dictionary, then it is expected to have the keys
xand/ory. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when thekeep_ratio=True, the x- and y-axis ranges should be the same.
translate_percentTranslation as a fraction of the image height/width
(x-translation, y-translation), where 0 denotes "no change"
and 0.5 denotes "half of the axis size".
* If None then equivalent to 0.0 unless translate_px has a value other than None.
* If a single number, then that value will be used for all images.
* If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b].
That sampled fraction value will be used identically for both x- and y-axis.
* If a dictionary, then it is expected to have the keys x and/or y.
Each of these keys can have the same values as described above.
Using a dictionary allows to set different values for the two axis and sampling will then happen
independently per axis, resulting in samples that differ between the axes.
translate_pxTranslation in pixels.
- If
Nonethen equivalent to0unlesstranslate_percenthas a value other thanNone. - If a single int, then that value will be used for all images.
- If a tuple
(a, b), then a value will be uniformly sampled per image from the discrete interval[a..b]. That number will be used identically for both x- and y-axis. - If a dictionary, then it is expected to have the keys
xand/ory. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.
rotateRotation in degrees (NOT radians), i.e. expected value range is
around [-360, 360]. Rotation happens around the center of the image,
not the top left corner as in some other frameworks.
* If a number, then that value will be used for all images.
* If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]
and used as the rotation value.
shearShear in degrees (NOT radians), i.e. expected value range is
around [-360, 360], with reasonable values being in the range of [-45, 45].
- If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done).
- If a tuple
(a, b), then two value will be uniformly sampled per image from the interval[a, b]and be used as the x- and y-shear value. - If a dictionary, then it is expected to have the keys
xand/ory. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.
interpolationOpenCV interpolation flag.
mask_interpolationOpenCV interpolation flag.
fillThe constant value to use when filling in newly created pixels.
(E.g. translating by 1px to the right will create a new 1px-wide column of pixels
on the left of the image).
The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.
fill_maskSame as fill but only for masks.
border_modeOpenCV border flag.
fit_outputIf True, the image plane size and position will be adjusted to tightly capture
the whole image after affine transformation (translate_percent and translate_px are ignored).
Otherwise (False), parts of the transformed image may end up outside the image plane.
Fitting the output shape can be useful to avoid corners of the image being outside the image plane
after applying rotations. Default: False
keep_ratioWhen True, the original aspect ratio will be kept when the random scale is applied. Default: True.
rotate_methodrotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"
balanced_scaleWhen True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False.
This is important because without it, scaling tends to lean towards upscaling. For example, if we want
the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is
three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly
from [0.5, 2]. With balanced_scale, the function ensures that half the time, the scaling
factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in).
This makes the zooming in and out process more balanced.
pprobability of applying the transform. Default: 0.5.
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with different parameter types
>>> transform = A.Compose([
... A.Affine(
... # Tuple for scale (will be used for both x and y)
... scale=(0.8, 1.2),
... # Dictionary with tuples for different x/y translations
... translate_percent={"x": (-0.2, 0.2), "y": (-0.1, 0.1)},
... # Tuple for rotation range
... rotate=(-30, 30),
... # Dictionary with tuples for different x/y shearing
... shear={"x": (-10, 10), "y": (-5, 5)},
... # Interpolation methods
... interpolation=cv2.INTER_LINEAR,
... mask_interpolation=cv2.INTER_NEAREST,
... # Other parameters
... fit_output=False,
... keep_ratio=True,
... rotate_method="largest_box",
... balanced_scale=True,
... border_mode=cv2.BORDER_CONSTANT,
... fill=0,
... fill_mask=0,
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Image with affine transforms applied
>>> transformed_mask = transformed['mask'] # Mask with affine transforms applied
>>> transformed_bboxes = transformed['bboxes'] # Bounding boxes with affine transforms applied
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints'] # Keypoints with affine transforms applied
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypoints
>>>
>>> # Simpler example with only essential parameters
>>> simple_transform = A.Compose([
... A.Affine(
... scale=1.1, # Single scalar value for scale
... rotate=15, # Single scalar value for rotation (degrees)
... translate_px=30, # Single scalar value for translation (pixels)
... p=1.0
... ),
... ])
>>> simple_result = simple_transform(image=image)
>>> simple_transformed = simple_result['image']- Towards Rotation Invariance in Object Detectionhttps://arxiv.org/abs/2109.13488