Speech Processing

Speech Processing

Wav2Vec2 on SUPURB

SUPURB
- Speech processing Universal PERformance Benchmark
- Audio classification dataset with multiple tasks
  - We used the Keyword Spotting subset
Wav2Vec2 out of Meta research
- Transformer based Method

Diagram illustrating a deep learning model processing audio data, including raw waveforms, latent speech representations, quantized representations, transformer modules, and context representations, with arrows indicating data flow and loss functions.

Graph shows validation accuracy on SUPURB competitions keyword spotting dataset
Increase in accuracy score by 0.2%
- 12% error reduction
- Average gap between top 5 papers on leaderboard for this dataset is 0.27%

Bar graph showing percent accuracy of Wav2Vec2 on SUPURB dataset, with two bars labeled 'Original' and 'PAI', indicating that PAI has higher accuracy than the original model.