openslr/librispeech_asr
Viewer • Updated • 585k • 98.3k • 222
How to use speech-seq2seq/wav2vec2-2-bart-large-no-adapter-frozen-enc with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="speech-seq2seq/wav2vec2-2-bart-large-no-adapter-frozen-enc") # Load model directly
from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq
tokenizer = AutoTokenizer.from_pretrained("speech-seq2seq/wav2vec2-2-bart-large-no-adapter-frozen-enc")
model = AutoModelForSpeechSeq2Seq.from_pretrained("speech-seq2seq/wav2vec2-2-bart-large-no-adapter-frozen-enc")YAML Metadata Error:"model-index[0].name" is not allowed to be empty
This model was trained from scratch on the librispeech_asr dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 6.5396 | 0.28 | 500 | 9.0401 | 1.0120 |
| 5.898 | 0.56 | 1000 | 9.3199 | 1.0 |
| 4.9595 | 0.84 | 1500 | 8.4434 | 1.4563 |
| 5.7082 | 1.12 | 2000 | 15.1805 | 1.0000 |
| 5.4377 | 1.4 | 2500 | 15.7984 | 1.0021 |
| 5.5941 | 1.68 | 3000 | 18.4928 | 1.0 |
| 5.0662 | 1.96 | 3500 | 17.4886 | 1.0000 |
| 4.8363 | 2.24 | 4000 | 18.9458 | 1.0 |
| 4.7908 | 2.52 | 4500 | 18.2794 | 1.0006 |
| 4.679 | 2.8 | 5000 | 18.7898 | 1.0 |