Instructions to use agkphysics/wav2vec2-large-xlsr-53-amharic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use agkphysics/wav2vec2-large-xlsr-53-amharic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="agkphysics/wav2vec2-large-xlsr-53-amharic")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") model = AutoModelForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") - Notebooks
- Google Colab
- Kaggle
Amharic ASR using fine-tuned Wav2vec2 XLSR-53
This is a finetuned version of facebook/wav2vec2-large-xlsr-53 trained on the Amharic Speech Corpus. This corpus was produced by Abate et al. (2005) (10.21437/Interspeech.2005-467).
The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data.
Usage
The model can be used as follows:
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
audio, _ = librosa.load("/path/to/audio.wav", sr=16000)
input_values = processor(
audio.squeeze(),
sampling_rate=16000,
return_tensors="pt"
).input_values
model.eval()
with torch.no_grad():
logits = model(input_values).logits
preds = logits.argmax(-1)
texts = processor.batch_decode(preds)
print(texts[0])
Training
The code to train this model is available at https://github.com/agkphysics/amharic-asr.
- Downloads last month
- 930