← Back to all transforms
XYMasking
Description
Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness. At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's maximum size along each axis. Args: num_masks_x (int | tuple[int, int]): Number or range of horizontal regions to mask. Defaults to 0. num_masks_y (int | tuple[int, int]): Number or range of vertical regions to mask. Defaults to 0. mask_x_length (int | tuple[int, int]): Specifies the length of the masks along the X (horizontal) axis. If an integer is provided, it sets a fixed mask length. If a tuple of two integers (min, max) is provided, the mask length is randomly chosen within this range for each mask. This allows for variable-length masks in the horizontal direction. mask_y_length (int | tuple[int, int]): Specifies the height of the masks along the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height, while a tuple (min, max) allows for variable-height masks, chosen randomly within the specified range for each mask. This flexibility facilitates creating masks of various sizes in the vertical direction. fill_value (int | float | list[int] | list[float] | str): Value to fill image masks. Defaults to 0. mask_fill_value (int | float | list[int] | list[float] | None): Value to fill masks in the mask. If `None`, uses mask is not affected. Default: `None`. p (float): Probability of applying the transform. Defaults to 0.5. Targets: image, mask, bboxes, keypoints Image types: uint8, float32 Note: Either `max_x_length` or `max_y_length` or both must be defined.
Parameters
- num_masks_x: int | tuple[int, int] | float | tuple[float, float] (default: (1, 3))
- num_masks_y: int | tuple[int, int] | float | tuple[float, float] (default: (1, 3))
- mask_x_length: int | tuple[int, int] | float | tuple[float, float] (default: (10, 100))
- mask_y_length: int | tuple[int, int] | float | tuple[float, float] (default: (10, 100))
- fill_value: float | Sequence[float] (default: 0)
- mask_fill_value: float | Sequence[float] (default: 0)
- p: float (default: 0.5)
Targets
- Image
- Mask
- Keypoints
- BBoxes
Try it out
ⓘ