Spaces:

Inherited
/

Tunisian-Speech-rec

Sleeping

App Files Files Community

Rania Mani commited on May 30

Commit

623d37e

1 Parent(s): 3e5413c

initial commit

Browse files

Files changed (7) hide show

.gitignore +1 -0
Dockerfile +34 -0
README.md +193 -5
app.py +43 -0
recognizer_tunisian_vosk.py +28 -0
requirements.txt +7 -0
vosk-model-small-ar-tn-0.1-linto.zip +3 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .DS_Store

Dockerfile ADDED Viewed

	@@ -0,0 +1,34 @@

+# Use a minimal base image
+FROM python:3.9-slim
+# Install unzip
+RUN apt-get update && apt-get install -y unzip ffmpeg && rm -rf /var/lib/apt/lists/*
+# Create a non-root user for security
+RUN useradd -m user
+USER user
+# Set environment variables
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    PORT=7860
+# Set the working directory
+WORKDIR $HOME/app
+# Copy requirements and install dependencies
+COPY --chown=user requirements.txt ./
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt
+# Copy application files and the model zip
+COPY --chown=user ./ $HOME/app
+# Unzip the model file
+RUN unzip vosk-model-small-ar-tn-0.1-linto.zip -d model && rm vosk-model-small-ar-tn-0.1-linto.zip
+# Expose the correct port for Hugging Face Spaces
+EXPOSE 7860
+# Run the FastAPI app with uvicorn directly
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,198 @@
 ---
-title: Tunisian Speech Rec
-emoji: ⚡
-colorFrom: yellow
-colorTo: red
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Vosk Arabic Speech-to-Text API
+emoji: 🗣️
+colorFrom: gray
+colorTo: green
 sdk: docker
+app_file: app.py
 pinned: false
 ---
+# 🧏‍♂️ Arabic Tunisian Speech-to-Text API
+This Space hosts a lightweight speech recognition API using the `vosk-model-small-ar-tn-0.1-linto`, tailored for Tunisian dialect. Upload audio files or send audio input for transcription in real-time using FastAPI.
+---
+## 📦 Features
+- 🗣️ Supports **Tunisian dialect** (not just MSA)
+- ⚡ Fast, offline, and CPU-friendly
+- 🧠 Uses `vosk-model-small-ar-tn-0.1-linto` (~40MB)
+- 🔌 REST API endpoint for audio transcription
+- 🧪 Easy to test locally or remotely
+---
+## 🧠 Model Details
+| Model                     | Description                                   |
+|--------------------------|-----------------------------------------------|
+| `vosk-model-small-ar-tn` | Lightweight Tunisian Arabic model by Linto    |
+| Size                     | ~40MB                                          |
+| Type                     | DeepSpeech-like, optimized for small CPUs     |
+| Accuracy                 | Good for clear speech in Tunisian dialect     |
+| Input                    | 16kHz mono `.wav` files                        |
+| Output                   | Plain Arabic text (Tunisian dialect)          |
+> ✅ Ideal for offline applications and edge devices.
+---
+## 🚀 Quick Start (API)
+### 🔧 Endpoint: `POST /transcribe/tunisian`
+Send a `.wav` audio file and receive a transcription in Arabic.
+#### ✅ Example CURL:
+```bash
+curl -X POST http://localhost:7860/transcribe/tunisian \
+  -F "[email protected]"
+````
+#### 📤 Example Response:
+```json
+{
+  "transcript": "شني حوالك اليوم؟"
+}
+```
+---
+## 🧪 Local Testing
+1. Clone this repository.
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Make sure the model is extracted under `model/` like this:
+```
+model/
+└── vosk-model-small-ar-tn-0.1-linto
+    ├── am
+    ├── conf
+    └── etc.
+```
+4. Run locally:
+```bash
+python app.py
+```
+5. Test the `/transcribe/tunisian` endpoint with a `.wav` file.
+---
+## 🐳 Docker for Hugging Face Spaces
+If you use a Docker-based Space, here’s the sample Dockerfile:
+```dockerfile
+# Use a minimal base image
+FROM python:3.9-slim
+# Install unzip
+RUN apt-get update && apt-get install -y unzip ffmpeg && rm -rf /var/lib/apt/lists/*
+# Create a non-root user for security
+RUN useradd -m user
+USER user
+# Set environment variables
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    PORT=7860
+# Set the working directory
+WORKDIR $HOME/app
+# Copy requirements and install dependencies
+COPY --chown=user requirements.txt ./
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt
+# Copy application files and the model zip
+COPY --chown=user ./ $HOME/app
+# Unzip the model file
+RUN unzip vosk-model-small-ar-tn-0.1-linto.zip -d model && rm vosk-model-small-ar-tn-0.1-linto.zip
+# Expose the correct port for Hugging Face Spaces
+EXPOSE 7860
+# Run the FastAPI app with uvicorn directly
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
+```
+---
+## 🧾 Example Python Client
+```python
+import requests
+with open("sample.wav", "rb") as audio_file:
+    response = requests.post(
+        "http://localhost:7860/transcribe/tunisian",
+        files={"audio": audio_file}
+    )
+    print(response.json())
+```
+---
+## 📁 File Structure
+```
+.
+├── app.py                 # FastAPI app with transcription endpoint
+├── model/                 # Contains the Vosk model
+├── requirements.txt       # Dependencies (FastAPI, Vosk, etc.)
+├── sample.wav             # Example audio file
+└── Dockerfile             # For deployment
+```
+---
+## 🛠 Dependencies
+```txt
+fastapi
+uvicorn
+vosk
+soundfile
+numpy
+```
+---
+## 👩‍💻 Maintainer
+**Inherited Games Studio**
+📧 [[email protected]](mailto:[email protected])
+🔗 [github.com/inheritedgames](https://github.com/inheritedgames)
+🔗 [github.com/RAMA012001](https://github.com/RAMA012001)
+---
+## 📄 License
+MIT License
+---
+## 🧠 Credits
+* Model: [`vosk-model-small-ar-tn-0.1-linto`](https://alphacephei.com/vosk/models)
+* Framework: [FastAPI](https://fastapi.tiangolo.com/)
+* Hosting: [Hugging Face Spaces](https://huggingface.co/spaces)
+```

app.py ADDED Viewed

	@@ -0,0 +1,43 @@

+from fastapi import FastAPI, UploadFile, File, HTTPException
+from fastapi.responses import PlainTextResponse
+from fastapi.middleware.gzip import GZipMiddleware
+from recognizer_tunisian_vosk import RecognizerTunisianVosk
+from pydub import AudioSegment
+import shutil
+import os
+app = FastAPI()
+app.add_middleware(GZipMiddleware, minimum_size=1000)
+vosk_recognizer = RecognizerTunisianVosk()
+TEMP_RAW = "temp_input"
+TEMP_WAV = "temp.wav"
+@app.get("/")
+def read_root():
+    return {"message": "Audio Transcription API is running."}
+@app.post("/transcribe/tunisian", response_class=PlainTextResponse)
+async def transcribe_tunisian(file: UploadFile = File(...)):
+    try:
+        # Save uploaded file
+        with open(TEMP_RAW, "wb") as buffer:
+            shutil.copyfileobj(file.file, buffer)
+        # Convert to correct format using pydub
+        audio = AudioSegment.from_file(TEMP_RAW)
+        audio = audio.set_channels(1).set_frame_rate(16000)
+        audio.export(TEMP_WAV, format="wav")
+        # Transcribe
+        text = vosk_recognizer.transcribe(TEMP_WAV)
+        return text
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f"Audio processing error: {str(e)}")
+    finally:
+        for path in [TEMP_RAW, TEMP_WAV]:
+            if os.path.exists(path):
+                os.remove(path)

recognizer_tunisian_vosk.py ADDED Viewed

	@@ -0,0 +1,28 @@

+import wave
+import json
+from vosk import Model as VoskModel, KaldiRecognizer as VoskRecognizer
+import os
+class RecognizerTunisianVosk:
+    def __init__(self, recognizer_name: str = "vosk", vosk_model_dir: str = "model/vosk-model-small-ar-tn-0.1-linto"):
+        self.recognizer_name = recognizer_name
+        self.vosk_model_dir = vosk_model_dir
+        if not os.path.exists(self.vosk_model_dir):
+            raise ValueError(f"Vosk model directory '{self.vosk_model_dir}' does not exist.")
+        self.vosk_model = VoskModel(self.vosk_model_dir)
+    def transcribe(self, audio_path: str) -> str:
+        """
+        Transcribe speech from an audio file.
+        :param audio_path: Path to the WAV file.
+        :return: Transcribed text.
+        """
+        with wave.open(audio_path, "rb") as wf:
+            if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
+                raise ValueError("Audio file must be WAV format mono PCM (16-bit, mono, uncompressed).")
+            recognizer = VoskRecognizer(self.vosk_model, wf.getframerate())
+            recognizer.AcceptWaveform(wf.readframes(wf.getnframes()))
+            result = recognizer.FinalResult()
+            return json.loads(result)["text"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi
+uvicorn
+# torch
+# torchaudio
+pydub
+python-multipart
+vosk

vosk-model-small-ar-tn-0.1-linto.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8213c0a2d281b3075108ad3ad98786263ba8ce6f5fd9552f7372de9431b071f
+size 165683177