Morphological
Targets:
image
mask
bboxes
keypoints
volume
mask3d
Image Types:uint8, float32
Apply a morphological operation (dilation or erosion) to an image, with particular value for enhancing document scans.
Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example:
- Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition).
- Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.
Arguments
scaletuple[int, int] | int
[2,3]
Specifies the size of the structuring element (kernel) used for the operation.
- If an integer is provided, a square kernel of that size will be used.
- If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel.
operationerosion | dilation
dilation
The morphological operation to apply. Default is 'dilation'.
pfloat
0.5
The probability of applying this transformation. Default is 0.5.
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a document-like binary image with text
>>> image = np.ones((200, 500), dtype=np.uint8) * 255 # White background
>>> # Add some "text" (black pixels)
>>> cv2.putText(image, "Document Text", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, 0, 2)
>>> # Add some "noise" (small black dots)
>>> for _ in range(50):
... x, y = np.random.randint(0, image.shape[1]), np.random.randint(0, image.shape[0])
... cv2.circle(image, (x, y), 1, 0, -1)
>>>
>>> # Create a mask representing text regions
>>> mask = np.zeros_like(image)
>>> mask[image < 128] = 1 # Binary mask where text exists
>>>
>>> # Example 1: Apply dilation to thicken text and fill gaps
>>> dilation_transform = A.Morphological(
... scale=3, # Size of the structuring element
... operation="dilation", # Expand white regions (or black if inverted)
... p=1.0 # Always apply
... )
>>> result = dilation_transform(image=image, mask=mask)
>>> dilated_image = result['image'] # Text is thicker, gaps are filled
>>> dilated_mask = result['mask'] # Mask is expanded around text regions
>>>
>>> # Example 2: Apply erosion to thin text or remove noise
>>> erosion_transform = A.Morphological(
... scale=(2, 3), # Random kernel size between 2 and 3
... operation="erosion", # Shrink white regions (or expand black if inverted)
... p=1.0 # Always apply
... )
>>> result = erosion_transform(image=image, mask=mask)
>>> eroded_image = result['image'] # Text is thinner, small noise may be removed
>>> eroded_mask = result['mask'] # Mask is contracted around text regions
>>>
>>> # Note: For document processing, dilation often helps enhance readability for OCR
>>> # while erosion can help remove noise or separate connected componentsReferences