Morphological

Targets:
image
mask
bboxes
keypoints
volume
mask3d
Image Types:uint8, float32

Apply a morphological operation (dilation or erosion) to an image, with particular value for enhancing document scans.

Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example:

  • Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition).
  • Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.
Arguments
scale
tuple[int, int] | int
[2,3]

Specifies the size of the structuring element (kernel) used for the operation.

  • If an integer is provided, a square kernel of that size will be used.
  • If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel.
operation
erosion | dilation
dilation

The morphological operation to apply. Default is 'dilation'.

p
float
0.5

The probability of applying this transformation. Default is 0.5.

Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a document-like binary image with text
>>> image = np.ones((200, 500), dtype=np.uint8) * 255  # White background
>>> # Add some "text" (black pixels)
>>> cv2.putText(image, "Document Text", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, 0, 2)
>>> # Add some "noise" (small black dots)
>>> for _ in range(50):
...     x, y = np.random.randint(0, image.shape[1]), np.random.randint(0, image.shape[0])
...     cv2.circle(image, (x, y), 1, 0, -1)
>>>
>>> # Create a mask representing text regions
>>> mask = np.zeros_like(image)
>>> mask[image < 128] = 1  # Binary mask where text exists
>>>
>>> # Example 1: Apply dilation to thicken text and fill gaps
>>> dilation_transform = A.Morphological(
...     scale=3,               # Size of the structuring element
...     operation="dilation",  # Expand white regions (or black if inverted)
...     p=1.0                  # Always apply
... )
>>> result = dilation_transform(image=image, mask=mask)
>>> dilated_image = result['image']    # Text is thicker, gaps are filled
>>> dilated_mask = result['mask']      # Mask is expanded around text regions
>>>
>>> # Example 2: Apply erosion to thin text or remove noise
>>> erosion_transform = A.Morphological(
...     scale=(2, 3),          # Random kernel size between 2 and 3
...     operation="erosion",   # Shrink white regions (or expand black if inverted)
...     p=1.0                  # Always apply
... )
>>> result = erosion_transform(image=image, mask=mask)
>>> eroded_image = result['image']     # Text is thinner, small noise may be removed
>>> eroded_mask = result['mask']       # Mask is contracted around text regions
>>>
>>> # Note: For document processing, dilation often helps enhance readability for OCR
>>> # while erosion can help remove noise or separate connected components