FrequencyMasking

Targets:

image

mask

bboxes

keypoints

Image Types:uint8, float32

Mask spectrogram in frequency domain. freq_mask_param sets max mask length; SpecAugment-style. Single vertical mask; use XYMasking for more flexibility.

This transform masks random segments along the frequency axis of a spectrogram, implementing the frequency masking technique proposed in the SpecAugment paper. Frequency masking helps in training models to be robust against frequency variations and missing spectral information in audio signals.

This is a specialized version of XYMasking configured for frequency masking only. For more advanced use cases (e.g., multiple masks, time masking, or custom fill values), consider using XYMasking directly.

Arguments

freq_mask_param

int

Maximum possible length of the mask in the frequency domain. Must be a positive integer. Length of the mask is uniformly sampled from (0, freq_mask_param).

p

float

0.5

probability of applying the transform. Default: 0.5.

References

SpecAugment paperhttps://arxiv.org/abs/1904.08779
Original implementationhttps://pytorch.org/audio/stable/transforms.html#freqmask