Rania Mani commited on
Commit
623d37e
·
1 Parent(s): 3e5413c

initial commit

Browse files
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .DS_Store
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use a minimal base image
2
+ FROM python:3.9-slim
3
+
4
+ # Install unzip
5
+ RUN apt-get update && apt-get install -y unzip ffmpeg && rm -rf /var/lib/apt/lists/*
6
+
7
+ # Create a non-root user for security
8
+ RUN useradd -m user
9
+ USER user
10
+
11
+ # Set environment variables
12
+ ENV HOME=/home/user \
13
+ PATH=/home/user/.local/bin:$PATH \
14
+ PORT=7860
15
+
16
+ # Set the working directory
17
+ WORKDIR $HOME/app
18
+
19
+ # Copy requirements and install dependencies
20
+ COPY --chown=user requirements.txt ./
21
+ RUN pip install --upgrade pip && \
22
+ pip install -r requirements.txt
23
+
24
+ # Copy application files and the model zip
25
+ COPY --chown=user ./ $HOME/app
26
+
27
+ # Unzip the model file
28
+ RUN unzip vosk-model-small-ar-tn-0.1-linto.zip -d model && rm vosk-model-small-ar-tn-0.1-linto.zip
29
+
30
+ # Expose the correct port for Hugging Face Spaces
31
+ EXPOSE 7860
32
+
33
+ # Run the FastAPI app with uvicorn directly
34
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,198 @@
1
  ---
2
- title: Tunisian Speech Rec
3
- emoji:
4
- colorFrom: yellow
5
- colorTo: red
6
  sdk: docker
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Vosk Arabic Speech-to-Text API
3
+ emoji: 🗣️
4
+ colorFrom: gray
5
+ colorTo: green
6
  sdk: docker
7
+ app_file: app.py
8
  pinned: false
9
  ---
10
 
11
+ # 🧏‍♂️ Arabic Tunisian Speech-to-Text API
12
+
13
+ This Space hosts a lightweight speech recognition API using the `vosk-model-small-ar-tn-0.1-linto`, tailored for Tunisian dialect. Upload audio files or send audio input for transcription in real-time using FastAPI.
14
+
15
+ ---
16
+
17
+ ## 📦 Features
18
+
19
+ - 🗣️ Supports **Tunisian dialect** (not just MSA)
20
+ - ⚡ Fast, offline, and CPU-friendly
21
+ - 🧠 Uses `vosk-model-small-ar-tn-0.1-linto` (~40MB)
22
+ - 🔌 REST API endpoint for audio transcription
23
+ - 🧪 Easy to test locally or remotely
24
+
25
+ ---
26
+
27
+ ## 🧠 Model Details
28
+
29
+ | Model | Description |
30
+ |--------------------------|-----------------------------------------------|
31
+ | `vosk-model-small-ar-tn` | Lightweight Tunisian Arabic model by Linto |
32
+ | Size | ~40MB |
33
+ | Type | DeepSpeech-like, optimized for small CPUs |
34
+ | Accuracy | Good for clear speech in Tunisian dialect |
35
+ | Input | 16kHz mono `.wav` files |
36
+ | Output | Plain Arabic text (Tunisian dialect) |
37
+
38
+ > ✅ Ideal for offline applications and edge devices.
39
+
40
+ ---
41
+
42
+ ## 🚀 Quick Start (API)
43
+
44
+ ### 🔧 Endpoint: `POST /transcribe/tunisian`
45
+
46
+ Send a `.wav` audio file and receive a transcription in Arabic.
47
+
48
+ #### ✅ Example CURL:
49
+
50
+ ```bash
51
+ curl -X POST http://localhost:7860/transcribe/tunisian \
52
53
+ ````
54
+
55
+ #### 📤 Example Response:
56
+
57
+ ```json
58
+ {
59
+ "transcript": "شني حوالك اليوم؟"
60
+ }
61
+ ```
62
+
63
+ ---
64
+
65
+ ## 🧪 Local Testing
66
+
67
+ 1. Clone this repository.
68
+ 2. Install dependencies:
69
+
70
+ ```bash
71
+ pip install -r requirements.txt
72
+ ```
73
+
74
+ 3. Make sure the model is extracted under `model/` like this:
75
+
76
+ ```
77
+ model/
78
+ └── vosk-model-small-ar-tn-0.1-linto
79
+ ├── am
80
+ ├── conf
81
+ └── etc.
82
+ ```
83
+
84
+ 4. Run locally:
85
+
86
+ ```bash
87
+ python app.py
88
+ ```
89
+
90
+ 5. Test the `/transcribe/tunisian` endpoint with a `.wav` file.
91
+
92
+ ---
93
+
94
+ ## 🐳 Docker for Hugging Face Spaces
95
+
96
+ If you use a Docker-based Space, here’s the sample Dockerfile:
97
+
98
+ ```dockerfile
99
+ # Use a minimal base image
100
+ FROM python:3.9-slim
101
+
102
+ # Install unzip
103
+ RUN apt-get update && apt-get install -y unzip ffmpeg && rm -rf /var/lib/apt/lists/*
104
+
105
+ # Create a non-root user for security
106
+ RUN useradd -m user
107
+ USER user
108
+
109
+ # Set environment variables
110
+ ENV HOME=/home/user \
111
+ PATH=/home/user/.local/bin:$PATH \
112
+ PORT=7860
113
+
114
+ # Set the working directory
115
+ WORKDIR $HOME/app
116
+
117
+ # Copy requirements and install dependencies
118
+ COPY --chown=user requirements.txt ./
119
+ RUN pip install --upgrade pip && \
120
+ pip install -r requirements.txt
121
+
122
+ # Copy application files and the model zip
123
+ COPY --chown=user ./ $HOME/app
124
+
125
+ # Unzip the model file
126
+ RUN unzip vosk-model-small-ar-tn-0.1-linto.zip -d model && rm vosk-model-small-ar-tn-0.1-linto.zip
127
+
128
+ # Expose the correct port for Hugging Face Spaces
129
+ EXPOSE 7860
130
+
131
+ # Run the FastAPI app with uvicorn directly
132
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
133
+ ```
134
+
135
+ ---
136
+
137
+ ## 🧾 Example Python Client
138
+
139
+ ```python
140
+ import requests
141
+
142
+ with open("sample.wav", "rb") as audio_file:
143
+ response = requests.post(
144
+ "http://localhost:7860/transcribe/tunisian",
145
+ files={"audio": audio_file}
146
+ )
147
+ print(response.json())
148
+ ```
149
+
150
+ ---
151
+
152
+ ## 📁 File Structure
153
+
154
+ ```
155
+ .
156
+ ├── app.py # FastAPI app with transcription endpoint
157
+ ├── model/ # Contains the Vosk model
158
+ ├── requirements.txt # Dependencies (FastAPI, Vosk, etc.)
159
+ ├── sample.wav # Example audio file
160
+ └── Dockerfile # For deployment
161
+ ```
162
+
163
+ ---
164
+
165
+ ## 🛠 Dependencies
166
+
167
+ ```txt
168
+ fastapi
169
+ uvicorn
170
+ vosk
171
+ soundfile
172
+ numpy
173
+ ```
174
+
175
+ ---
176
+
177
+ ## 👩‍💻 Maintainer
178
+
179
+ **Inherited Games Studio**
180
181
+ 🔗 [github.com/inheritedgames](https://github.com/inheritedgames)
182
+ 🔗 [github.com/RAMA012001](https://github.com/RAMA012001)
183
+
184
+ ---
185
+
186
+ ## 📄 License
187
+
188
+ MIT License
189
+
190
+ ---
191
+
192
+ ## 🧠 Credits
193
+
194
+ * Model: [`vosk-model-small-ar-tn-0.1-linto`](https://alphacephei.com/vosk/models)
195
+ * Framework: [FastAPI](https://fastapi.tiangolo.com/)
196
+ * Hosting: [Hugging Face Spaces](https://huggingface.co/spaces)
197
+
198
+ ```
app.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, UploadFile, File, HTTPException
2
+ from fastapi.responses import PlainTextResponse
3
+ from fastapi.middleware.gzip import GZipMiddleware
4
+ from recognizer_tunisian_vosk import RecognizerTunisianVosk
5
+ from pydub import AudioSegment
6
+ import shutil
7
+ import os
8
+
9
+ app = FastAPI()
10
+ app.add_middleware(GZipMiddleware, minimum_size=1000)
11
+
12
+ vosk_recognizer = RecognizerTunisianVosk()
13
+
14
+ TEMP_RAW = "temp_input"
15
+ TEMP_WAV = "temp.wav"
16
+
17
+ @app.get("/")
18
+ def read_root():
19
+ return {"message": "Audio Transcription API is running."}
20
+
21
+ @app.post("/transcribe/tunisian", response_class=PlainTextResponse)
22
+ async def transcribe_tunisian(file: UploadFile = File(...)):
23
+ try:
24
+ # Save uploaded file
25
+ with open(TEMP_RAW, "wb") as buffer:
26
+ shutil.copyfileobj(file.file, buffer)
27
+
28
+ # Convert to correct format using pydub
29
+ audio = AudioSegment.from_file(TEMP_RAW)
30
+ audio = audio.set_channels(1).set_frame_rate(16000)
31
+ audio.export(TEMP_WAV, format="wav")
32
+
33
+ # Transcribe
34
+ text = vosk_recognizer.transcribe(TEMP_WAV)
35
+ return text
36
+
37
+ except Exception as e:
38
+ raise HTTPException(status_code=400, detail=f"Audio processing error: {str(e)}")
39
+
40
+ finally:
41
+ for path in [TEMP_RAW, TEMP_WAV]:
42
+ if os.path.exists(path):
43
+ os.remove(path)
recognizer_tunisian_vosk.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import wave
2
+ import json
3
+ from vosk import Model as VoskModel, KaldiRecognizer as VoskRecognizer
4
+ import os
5
+
6
+ class RecognizerTunisianVosk:
7
+ def __init__(self, recognizer_name: str = "vosk", vosk_model_dir: str = "model/vosk-model-small-ar-tn-0.1-linto"):
8
+ self.recognizer_name = recognizer_name
9
+ self.vosk_model_dir = vosk_model_dir
10
+ if not os.path.exists(self.vosk_model_dir):
11
+ raise ValueError(f"Vosk model directory '{self.vosk_model_dir}' does not exist.")
12
+ self.vosk_model = VoskModel(self.vosk_model_dir)
13
+
14
+ def transcribe(self, audio_path: str) -> str:
15
+ """
16
+ Transcribe speech from an audio file.
17
+
18
+ :param audio_path: Path to the WAV file.
19
+ :return: Transcribed text.
20
+ """
21
+ with wave.open(audio_path, "rb") as wf:
22
+ if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
23
+ raise ValueError("Audio file must be WAV format mono PCM (16-bit, mono, uncompressed).")
24
+
25
+ recognizer = VoskRecognizer(self.vosk_model, wf.getframerate())
26
+ recognizer.AcceptWaveform(wf.readframes(wf.getnframes()))
27
+ result = recognizer.FinalResult()
28
+ return json.loads(result)["text"]
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ # torch
4
+ # torchaudio
5
+ pydub
6
+ python-multipart
7
+ vosk
vosk-model-small-ar-tn-0.1-linto.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8213c0a2d281b3075108ad3ad98786263ba8ce6f5fd9552f7372de9431b071f
3
+ size 165683177