Tunisian-Speech-rec / README.md
Inherited's picture
Update README.md
26f897f verified
metadata
title: Vosk Arabic Speech-to-Text API
emoji: πŸ—£οΈ
colorFrom: gray
colorTo: green
sdk: docker
app_file: app.py
pinned: false

πŸ§β€β™‚οΈ Arabic Tunisian Speech-to-Text API

This Space hosts a lightweight speech recognition API using the vosk-model-small-ar-tn-0.1-linto, tailored for Tunisian dialect. Upload audio files or send audio input for transcription in real-time using FastAPI.


πŸ“¦ Features

  • πŸ—£οΈ Supports Tunisian dialect (not just MSA)
  • ⚑ Fast, offline, and CPU-friendly
  • 🧠 Uses vosk-model-small-ar-tn-0.1-linto (~40MB)
  • πŸ”Œ REST API endpoint for audio transcription
  • πŸ§ͺ Easy to test locally or remotely

🧠 Model Details

Model Description
vosk-model-small-ar-tn Lightweight Tunisian Arabic model by Linto
Size ~40MB
Type DeepSpeech-like, optimized for small CPUs
Accuracy Good for clear speech in Tunisian dialect
Input 16kHz mono .wav files
Output Plain Arabic text (Tunisian dialect)

βœ… Ideal for offline applications and edge devices.


πŸš€ Quick Start (API)

πŸ”§ Endpoint: POST /transcribe/tunisian

Send a .wav audio file and receive a transcription in Arabic.

βœ… Example CURL:

curl -X POST http://localhost:7860/transcribe/tunisian \
  -F "[email protected]"

πŸ“€ Example Response:

{
  "transcript": "Ψ΄Ω†ΩŠ Ψ­ΩˆΨ§Ω„Ωƒ Ψ§Ω„ΩŠΩˆΩ…ΨŸ"
}

πŸ§ͺ Local Testing

  1. Clone this repository.
  2. Install dependencies:
pip install -r requirements.txt
  1. Make sure the model is extracted under model/ like this:
model/
└── vosk-model-small-ar-tn-0.1-linto
    β”œβ”€β”€ am
    β”œβ”€β”€ conf
    └── etc.
  1. Run locally:
python app.py
  1. Test the /transcribe/tunisian endpoint with a .wav file.

🐳 Docker for Hugging Face Spaces

If you use a Docker-based Space, here’s the sample Dockerfile:

# Use a minimal base image
FROM python:3.9-slim

# Install unzip
RUN apt-get update && apt-get install -y unzip ffmpeg && rm -rf /var/lib/apt/lists/*

# Create a non-root user for security
RUN useradd -m user
USER user

# Set environment variables
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:$PATH \
    PORT=7860

# Set the working directory
WORKDIR $HOME/app

# Copy requirements and install dependencies
COPY --chown=user requirements.txt ./
RUN pip install --upgrade pip && \
    pip install -r requirements.txt

# Copy application files and the model zip
COPY --chown=user ./ $HOME/app

# Unzip the model file
RUN unzip vosk-model-small-ar-tn-0.1-linto.zip -d model && rm vosk-model-small-ar-tn-0.1-linto.zip

# Expose the correct port for Hugging Face Spaces
EXPOSE 7860

# Run the FastAPI app with uvicorn directly
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

🧾 Example Python Client

import requests

with open("sample.wav", "rb") as audio_file:
    response = requests.post(
        "http://localhost:7860/transcribe/tunisian",
        files={"audio": audio_file}
    )
    print(response.json())

πŸ“ File Structure

.
β”œβ”€β”€ app.py                 # FastAPI app with transcription endpoint
β”œβ”€β”€ model/                 # Contains the Vosk model
β”œβ”€β”€ requirements.txt       # Dependencies (FastAPI, Vosk, etc.)
β”œβ”€β”€ sample.wav             # Example audio file
└── Dockerfile             # For deployment

πŸ›  Dependencies

fastapi
uvicorn
vosk
soundfile
numpy

πŸ‘©β€πŸ’» Maintainer

Inherited Games Studio πŸ“§ [email protected] πŸ”— github.com/inheritedgames


πŸ“„ License

MIT License


🧠 Credits