Updated the demo for the new version of the W2V-BERT model for Ukrainian audio recognition.
This is a classic Automatic Speech Recognition or Speech to Text task.
What's new in version three:
• more data: 1200 hours • new SentencePiece tokenizer with 512 tokens • feature extraction is done via a Rust extension
Facts:
• Training was started from the previous model to speed up the learning process. • Training takes place on two 3090 video cards with 24 GB each. • It is well suited for fine-tuning because the training data is very diverse and mostly noisy.