← Back to all transforms

Morphological

Description

Apply a morphological operation (dilation or erosion) to an image,
    with particular value for enhancing document scans.

    Morphological operations modify the structure of the image.
    Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them.
    These operations are beneficial in document processing, for example:
    - Dilation helps in closing up gaps within text or making thin lines thicker,
        enhancing legibility for OCR (Optical Character Recognition).
    - Erosion can remove small white noise and detach connected objects,
        making the structure of larger objects more pronounced.

    Args:
        scale (int or tuple/list of int): Specifies the size of the structuring element (kernel) used for the operation.
            - If an integer is provided, a square kernel of that size will be used.
            - If a tuple or list is provided, it should contain two integers representing the minimum
                and maximum sizes for the dilation kernel.
        operation (str, optional): The morphological operation to apply. Options are 'dilation' or 'erosion'.
            Default is 'dilation'.
        p (float, optional): The probability of applying this transformation. Default is 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    Reference:
        https://github.com/facebookresearch/nougat

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
        >>> ])
        >>> image = transform(image=image)["image"]
    

Parameters

  • scale: int | tuple[int, int] | float | tuple[float, float] (default: (2, 3))
  • operation: Literal['erosion', 'dilation'] (default: 'dilation')
  • p: float (default: 0.5)

Targets

  • Image
  • Mask
  • Keypoints
  • BBoxes

Try it out

Original Image:

Original Image: (733, 484, 3)

Original Image

Bbox Params

Keypoint Params

Mask: (733, 484, 3)

Mask

BBoxes: (733, 484, 3)

BBoxes

Keypoints: (733, 484, 3)

Keypoints