← Back to all transforms
Affine
Description
Augmentation to apply affine transformations to images. Affine transformations involve: - Translation ("move" image on the x-/y-axis) - Rotation - Scaling ("zoom" in/out) - Shear (move one side of the image, turning a square into a trapezoid) All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters `cval` and `mode` of this class deal with this. Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters `interpolation` and `mask_interpolation` deals with the method of interpolation used for this. Args: scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and ``0.5`` is zoomed out to ``50`` percent of the original size. * If a single number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set ``keep_ratio=True``, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes. Note that when the ``keep_ratio=True``, the x- and y-axis ranges should be the same. translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width (x-translation, y-translation), where ``0`` denotes "no change" and ``0.5`` denotes "half of the axis size". * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``. * If a single number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes. translate_px (None, int, tuple of int or dict): Translation in pixels. * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``. * If a single int, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes. rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is around ``[-360, 360]``. Rotation happens around the *center* of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]`` and used as the rotation value. shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple ``(a, b)``, then two value will be uniformly sampled per image from the interval ``[a, b]`` and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes. interpolation (int): OpenCV interpolation flag. mask_interpolation (int): OpenCV interpolation flag. cval (number or sequence of number): The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images. cval_mask (number or tuple of number): Same as cval but only for masks. mode (int): OpenCV border flag. fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (`translate_percent` and `translate_px` are ignored). Otherwise (``False``), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied. Default: False. rotate_method (Literal["largest_box", "ellipse"]): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box" balanced_scale (bool): When True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False. This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With `balanced_scale`, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced. p (float): probability of applying the transform. Default: 0.5. Targets: image, mask, keypoints, bboxes Image types: uint8, float32 Reference: [1] https://arxiv.org/abs/2109.13488
Parameters
- scale: float | tuple[float, float] | dict[str, Any] | None (default: null)
- translate_percent: float | tuple[float, float] | dict[str, Any] | None (default: null)
- translate_px: int | tuple[int, int] | dict[str, Any] | None (default: null)
- rotate: float | tuple[float, float] | None (default: null)
- shear: float | tuple[float, float] | dict[str, Any] | None (default: null)
- interpolation: Literal['cv2.INTER_NEAREST', 'cv2.INTER_LINEAR', 'cv2.INTER_CUBIC', 'cv2.INTER_AREA', 'cv2.INTER_LANCZOS4', 'cv2.INTER_BITS', 'cv2.INTER_NEAREST_EXACT', 'cv2.INTER_MAX'] (default: 1)
- mask_interpolation: Literal['cv2.INTER_NEAREST', 'cv2.INTER_LINEAR', 'cv2.INTER_CUBIC', 'cv2.INTER_AREA', 'cv2.INTER_LANCZOS4', 'cv2.INTER_BITS', 'cv2.INTER_NEAREST_EXACT', 'cv2.INTER_MAX'] (default: 0)
- cval: float | Sequence[float] (default: 0)
- cval_mask: float | Sequence[float] (default: 0)
- mode: Literal['cv2.BORDER_CONSTANT', 'cv2.BORDER_REPLICATE', 'cv2.BORDER_REFLECT', 'cv2.BORDER_WRAP', 'cv2.BORDER_DEFAULT', 'cv2.BORDER_TRANSPARENT'] (default: 0)
- fit_output: bool (default: false)
- keep_ratio: bool (default: false)
- rotate_method: Literal['largest_box', 'ellipse'] (default: 'largest_box')
- balanced_scale: bool (default: false)
- p: float (default: 0.5)
Targets
- Image
- Mask
- BBoxes
- Keypoints
Try it out
ⓘ