Speech Processing

Wav2Vec2 on SUPURB

  • SUPURB

    • Speech processing Universal PERformance Benchmark

    • Audio classification dataset with multiple tasks

      • We used the Keyword Spotting subset

  • Wav2Vec2 out of Meta research

    • Transformer based Method 

Diagram illustrating a deep learning model processing audio data, including raw waveforms, latent speech representations, quantized representations, transformer modules, and context representations, with arrows indicating data flow and loss functions.
  • Graph shows validation accuracy on SUPURB competitions keyword spotting dataset

  • Increase in accuracy score by 0.2%

    • 12% error reduction

    • Average gap between top 5 papers on leaderboard for this dataset is 0.27%

Bar graph showing percent accuracy of Wav2Vec2 on SUPURB dataset, with two bars labeled 'Original' and 'PAI', indicating that PAI has higher accuracy than the original model.