Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,16 +1,20 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: SmartScribe
|
| 3 |
-
emoji: 🎙️
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo:
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 5.49.1
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
# SmartScribe
|
| 13 |
|
|
|
|
| 14 |
[](https://www.python.org/downloads/)
|
| 15 |
[](https://openai.com/research/whisper)
|
| 16 |
[](https://github.com/guillaumekln/faster-whisper)
|
|
@@ -20,32 +24,63 @@ pinned: false
|
|
| 20 |
|
| 21 |
**AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
|
| 22 |
|
| 23 |
-
---
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
- [Features](#-features)
|
| 28 |
-
- [Supported Models](#-supported-models)
|
| 29 |
-
- [Requirements](#-requirements)
|
| 30 |
-
- [Installation](#-local-installation)
|
| 31 |
-
- [Configuration](#-configuration)
|
| 32 |
-
- [Deployment](#-deployment)
|
| 33 |
-
- [Usage](#-usage)
|
| 34 |
-
- [Architecture](#-architecture)
|
| 35 |
-
- [Troubleshooting](#-troubleshooting)
|
| 36 |
-
- [File Structure](#-file-structure)
|
| 37 |
-
- [License](#-license)
|
| 38 |
|
| 39 |
---
|
| 40 |
|
|
|
|
|
|
|
| 41 |
## ✨ Features
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
### 🎙️ Audio/Video Transcription
|
| 44 |
- Convert YouTube links or local audio/video files to text
|
| 45 |
- Support for multiple audio formats (MP3, WAV, M4A, etc.)
|
| 46 |
- GPU-accelerated transcription using Faster-Whisper
|
| 47 |
- Timestamped transcription output
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
### 📝 Minutes of Meeting Generation
|
| 50 |
- Automatically generate structured MOM documents
|
| 51 |
- Professional summary with participants and date
|
|
@@ -54,12 +89,6 @@ pinned: false
|
|
| 54 |
- Actionable items with clear ownership and deadlines
|
| 55 |
- Markdown-formatted output
|
| 56 |
|
| 57 |
-
### 🌍 Multi-Language Translation
|
| 58 |
-
- Translate transcriptions into any supported language
|
| 59 |
-
- Language validation using pycountry
|
| 60 |
-
- Clean, paragraph-formatted output
|
| 61 |
-
- Preserves original meaning and tone
|
| 62 |
-
|
| 63 |
### 🤖 Multi-Model Support
|
| 64 |
- LLAMA 3.2 3B Instruct
|
| 65 |
- PHI 4 Mini Instruct
|
|
@@ -67,20 +96,17 @@ pinned: false
|
|
| 67 |
- DeepSeek R1 Distill Qwen 1.5B
|
| 68 |
- Google Gemma 3 4B IT
|
| 69 |
|
| 70 |
-
### 🖥️ Interactive Web UI
|
| 71 |
-
- Beautiful Gradio interface
|
| 72 |
-
- Drag-and-drop file upload
|
| 73 |
-
- YouTube link support
|
| 74 |
-
- Side-by-side input and output panels
|
| 75 |
-
- Model selection dropdown
|
| 76 |
-
- Real-time streaming responses
|
| 77 |
-
|
| 78 |
### ⚡ Performance Optimization
|
| 79 |
- 4-bit quantization for efficient inference
|
| 80 |
- GPU acceleration support
|
| 81 |
- Memory-efficient model loading
|
| 82 |
- Garbage collection and cache clearing
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
---
|
| 85 |
|
| 86 |
## 🤖 Supported Models
|
|
@@ -104,7 +130,6 @@ pinned: false
|
|
| 104 |
- **FFmpeg** for audio processing
|
| 105 |
|
| 106 |
### Python Dependencies
|
| 107 |
-
|
| 108 |
```
|
| 109 |
gradio>=4.0.0
|
| 110 |
torch>=2.0.0
|
|
@@ -123,7 +148,6 @@ huggingface-hub>=0.16.0
|
|
| 123 |
## 🔧 Local Installation
|
| 124 |
|
| 125 |
### 1. Create Virtual Environment
|
| 126 |
-
|
| 127 |
```bash
|
| 128 |
python -m venv venv
|
| 129 |
source venv/bin/activate # On macOS/Linux
|
|
@@ -132,15 +156,12 @@ venv\Scripts\activate # On Windows
|
|
| 132 |
```
|
| 133 |
|
| 134 |
### 2. Install Dependencies
|
| 135 |
-
|
| 136 |
```bash
|
| 137 |
pip install -r requirements.txt
|
| 138 |
```
|
| 139 |
|
| 140 |
### 3. Setup HuggingFace Token
|
| 141 |
-
|
| 142 |
Create a `.env` file in the project root:
|
| 143 |
-
|
| 144 |
```env
|
| 145 |
HF_TOKEN=your_huggingface_token_here
|
| 146 |
```
|
|
@@ -148,9 +169,7 @@ HF_TOKEN=your_huggingface_token_here
|
|
| 148 |
Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
|
| 149 |
|
| 150 |
### 4. Setup YouTube Cookies (Optional)
|
| 151 |
-
|
| 152 |
For YouTube link support, set environment variable or create `cookies.txt`:
|
| 153 |
-
|
| 154 |
```bash
|
| 155 |
export YOUTUBE_COOKIES="your_cookies_content"
|
| 156 |
```
|
|
@@ -162,9 +181,7 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
|
|
| 162 |
## ⚙️ Configuration
|
| 163 |
|
| 164 |
### Model Selection
|
| 165 |
-
|
| 166 |
Edit model paths in `app.py`:
|
| 167 |
-
|
| 168 |
```python
|
| 169 |
LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
|
| 170 |
QWEN = "Qwen/Qwen3-4B-Instruct-2507"
|
|
@@ -174,7 +191,6 @@ Gemma = 'google/gemma-3-4b-it'
|
|
| 174 |
```
|
| 175 |
|
| 176 |
### Quantization Configuration
|
| 177 |
-
|
| 178 |
```python
|
| 179 |
quant_config = BitsAndBytesConfig(
|
| 180 |
load_in_4bit=True,
|
|
@@ -185,7 +201,6 @@ quant_config = BitsAndBytesConfig(
|
|
| 185 |
```
|
| 186 |
|
| 187 |
### Server Configuration
|
| 188 |
-
|
| 189 |
```python
|
| 190 |
ui.launch(server_name="0.0.0.0", server_port=7860)
|
| 191 |
```
|
|
@@ -214,6 +229,7 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
|
|
| 214 |
5. Add secrets in Space settings:
|
| 215 |
- `HF_TOKEN`: Your HuggingFace token
|
| 216 |
- `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
|
|
|
|
| 217 |
6. Space will automatically build and deploy
|
| 218 |
|
| 219 |
---
|
|
@@ -223,13 +239,11 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
|
|
| 223 |
### Quick Start - Live Demo
|
| 224 |
|
| 225 |
#### 🌐 Try Online
|
| 226 |
-
|
| 227 |
Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
|
| 228 |
|
| 229 |
No installation required! Just upload your audio/video or paste a YouTube link.
|
| 230 |
|
| 231 |
#### 1. Launch Application (Local Setup)
|
| 232 |
-
|
| 233 |
```bash
|
| 234 |
python app.py
|
| 235 |
```
|
|
@@ -254,7 +268,6 @@ The application will start at `http://0.0.0.0:7860`
|
|
| 254 |
### Programmatic Usage
|
| 255 |
|
| 256 |
#### Transcribe Audio
|
| 257 |
-
|
| 258 |
```python
|
| 259 |
from app import transcription_whisper
|
| 260 |
|
|
@@ -267,7 +280,6 @@ for seg in segments:
|
|
| 267 |
```
|
| 268 |
|
| 269 |
#### Generate Minutes of Meeting
|
| 270 |
-
|
| 271 |
```python
|
| 272 |
from app import optimize
|
| 273 |
|
|
@@ -276,7 +288,6 @@ for chunk in optimize("LLAMA", "audio.mp3"):
|
|
| 276 |
```
|
| 277 |
|
| 278 |
#### Translate Transcription
|
| 279 |
-
|
| 280 |
```python
|
| 281 |
from app import optimize_translate
|
| 282 |
|
|
@@ -286,41 +297,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
|
|
| 286 |
|
| 287 |
---
|
| 288 |
|
| 289 |
-
##
|
| 290 |
|
| 291 |
### Component Overview
|
| 292 |
|
| 293 |
```
|
| 294 |
-
|
| 295 |
-
│ Gradio Web Interface (UI Layer)
|
| 296 |
-
|
| 297 |
-
│
|
| 298 |
-
│
|
| 299 |
│ │ Audio/Video Input │ │ Model Select │ │
|
| 300 |
-
│
|
| 301 |
-
│
|
| 302 |
-
│
|
| 303 |
│ │ Transcription | MOM | Translation Output │ │
|
| 304 |
-
│
|
| 305 |
-
|
| 306 |
│ Multi-Module Processing Layer │
|
| 307 |
-
|
| 308 |
-
│
|
| 309 |
│ Transcription │ MOM Generation │ Translation │
|
| 310 |
│ Module │ Module │ Module │
|
| 311 |
-
│
|
| 312 |
│ • Download │ • System Prompt │ • Language │
|
| 313 |
│ • Convert │ • User Prompt │ Validation │
|
| 314 |
│ • Transcribe │ • Generation │ • Extraction │
|
| 315 |
-
│
|
| 316 |
-
|
| 317 |
│ LLM Integration Layer │
|
| 318 |
-
|
| 319 |
│ │
|
| 320 |
│ LLAMA | PHI | QWEN | DEEPSEEK | Gemma │
|
| 321 |
│ (with 4-bit Quantization & GPU Acceleration) │
|
| 322 |
│ │
|
| 323 |
-
|
| 324 |
```
|
| 325 |
|
| 326 |
### Key Functions
|
|
@@ -340,46 +351,36 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
|
|
| 340 |
|
| 341 |
---
|
| 342 |
|
| 343 |
-
##
|
| 344 |
|
| 345 |
### Issue: YouTube download fails
|
| 346 |
-
|
| 347 |
**Solution**: Update YouTube cookies or use direct file upload
|
| 348 |
-
|
| 349 |
```bash
|
| 350 |
export YOUTUBE_COOKIES="your_updated_cookies"
|
| 351 |
# or use direct file upload instead
|
| 352 |
```
|
| 353 |
|
| 354 |
### Issue: CUDA out of memory
|
| 355 |
-
|
| 356 |
**Solution**: Reduce model size or use CPU inference
|
| 357 |
-
|
| 358 |
```python
|
| 359 |
device = "cpu" # Force CPU usage
|
| 360 |
```
|
| 361 |
|
| 362 |
### Issue: HuggingFace authentication failed
|
| 363 |
-
|
| 364 |
**Solution**: Verify HF_TOKEN in .env file
|
| 365 |
-
|
| 366 |
```bash
|
| 367 |
huggingface-cli login # Interactive login
|
| 368 |
```
|
| 369 |
|
| 370 |
### Issue: Transcription is slow
|
| 371 |
-
|
| 372 |
**Solution**: Ensure CUDA is properly configured
|
| 373 |
-
|
| 374 |
```python
|
| 375 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 376 |
print(f"Using device: {device}")
|
| 377 |
```
|
| 378 |
|
| 379 |
### Issue: Language validation fails
|
| 380 |
-
|
| 381 |
**Solution**: Use full language name or ISO code
|
| 382 |
-
|
| 383 |
```python
|
| 384 |
# Valid formats:
|
| 385 |
valid_language("English") # Full name
|
|
@@ -388,16 +389,13 @@ valid_language("eng") # ISO 639-3 code
|
|
| 388 |
```
|
| 389 |
|
| 390 |
### Issue: Memory issues with large files
|
| 391 |
-
|
| 392 |
**Solution**: Reduce chunk size or break audio into segments
|
| 393 |
-
|
| 394 |
```python
|
| 395 |
# Process smaller chunks
|
| 396 |
segment_duration = 300 # 5 minutes per segment
|
| 397 |
```
|
| 398 |
|
| 399 |
### Issue: Generated MOM missing action items
|
| 400 |
-
|
| 401 |
**Solution**: Try different model or update system prompt
|
| 402 |
- Claude models typically produce better structured output
|
| 403 |
- QWEN is faster and generally reliable
|
|
@@ -427,7 +425,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
|
|
| 427 |
## 🎓 Citation
|
| 428 |
|
| 429 |
If you use SmartScribe in your project, please cite:
|
| 430 |
-
|
| 431 |
```bibtex
|
| 432 |
@software{smartscribe2025,
|
| 433 |
author = {Asutosha Nanda},
|
|
@@ -439,9 +436,13 @@ If you use SmartScribe in your project, please cite:
|
|
| 439 |
|
| 440 |
---
|
| 441 |
|
| 442 |
-
|
|
|
|
|
|
|
| 443 |
|
| 444 |
**Intelligent Audio Transcription & Meeting Documentation**
|
| 445 |
Powered by Advanced LLMs and Faster-Whisper
|
| 446 |
|
| 447 |
-
Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SmartScribe
|
| 3 |
+
emoji: 🎙️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.49.1
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
short_description: Transcription, Summarization & Translation
|
| 12 |
+
---
|
| 13 |
+
<div align="center">
|
| 14 |
|
| 15 |
# SmartScribe
|
| 16 |
|
| 17 |
+
|
| 18 |
[](https://www.python.org/downloads/)
|
| 19 |
[](https://openai.com/research/whisper)
|
| 20 |
[](https://github.com/guillaumekln/faster-whisper)
|
|
|
|
| 24 |
|
| 25 |
**AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
|
| 26 |
|
|
|
|
| 27 |
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
<div align="center">
|
| 31 |
+
<h2>📋 Table of Contents</h2>
|
| 32 |
+
<table>
|
| 33 |
+
<tr>
|
| 34 |
+
<td><a href="#features">✨ Features</a></td>
|
| 35 |
+
<td><a href="#supported-models">🤖 Supported Models</a></td>
|
| 36 |
+
<td><a href="#requirements">📦 Requirements</a></td>
|
| 37 |
+
<td><a href="#installation">🔧 Installation</a></td>
|
| 38 |
+
</tr>
|
| 39 |
+
<tr>
|
| 40 |
+
<td><a href="#configuration">⚙️ Configuration</a></td>
|
| 41 |
+
<td><a href="#usage">🎮 Usage</a></td>
|
| 42 |
+
<td><a href="#architecture">🏗️ Architecture</a></td>
|
| 43 |
+
<td><a href="#troubleshooting">🐛 Troubleshooting</a></td>
|
| 44 |
+
</tr>
|
| 45 |
+
</tr>
|
| 46 |
+
</table>
|
| 47 |
+
</div>
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
+
|
| 53 |
+
|
| 54 |
## ✨ Features
|
| 55 |
|
| 56 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
|
| 57 |
+
|
| 58 |
+
<div>
|
| 59 |
+
|
| 60 |
### 🎙️ Audio/Video Transcription
|
| 61 |
- Convert YouTube links or local audio/video files to text
|
| 62 |
- Support for multiple audio formats (MP3, WAV, M4A, etc.)
|
| 63 |
- GPU-accelerated transcription using Faster-Whisper
|
| 64 |
- Timestamped transcription output
|
| 65 |
|
| 66 |
+
### 🌍 Multi-Language Translation
|
| 67 |
+
- Translate transcriptions into any supported language
|
| 68 |
+
- Language validation using pycountry
|
| 69 |
+
- Clean, paragraph-formatted output
|
| 70 |
+
- Preserves original meaning and tone
|
| 71 |
+
|
| 72 |
+
### 🖥️ Interactive Web UI
|
| 73 |
+
- Beautiful Gradio interface
|
| 74 |
+
- Drag-and-drop file upload
|
| 75 |
+
- YouTube link support
|
| 76 |
+
- Side-by-side input and output panels
|
| 77 |
+
- Model selection dropdown
|
| 78 |
+
- Real-time streaming responses
|
| 79 |
+
|
| 80 |
+
</div>
|
| 81 |
+
|
| 82 |
+
<div>
|
| 83 |
+
|
| 84 |
### 📝 Minutes of Meeting Generation
|
| 85 |
- Automatically generate structured MOM documents
|
| 86 |
- Professional summary with participants and date
|
|
|
|
| 89 |
- Actionable items with clear ownership and deadlines
|
| 90 |
- Markdown-formatted output
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
### 🤖 Multi-Model Support
|
| 93 |
- LLAMA 3.2 3B Instruct
|
| 94 |
- PHI 4 Mini Instruct
|
|
|
|
| 96 |
- DeepSeek R1 Distill Qwen 1.5B
|
| 97 |
- Google Gemma 3 4B IT
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
### ⚡ Performance Optimization
|
| 100 |
- 4-bit quantization for efficient inference
|
| 101 |
- GPU acceleration support
|
| 102 |
- Memory-efficient model loading
|
| 103 |
- Garbage collection and cache clearing
|
| 104 |
|
| 105 |
+
</div>
|
| 106 |
+
|
| 107 |
+
</div>
|
| 108 |
+
|
| 109 |
+
|
| 110 |
---
|
| 111 |
|
| 112 |
## 🤖 Supported Models
|
|
|
|
| 130 |
- **FFmpeg** for audio processing
|
| 131 |
|
| 132 |
### Python Dependencies
|
|
|
|
| 133 |
```
|
| 134 |
gradio>=4.0.0
|
| 135 |
torch>=2.0.0
|
|
|
|
| 148 |
## 🔧 Local Installation
|
| 149 |
|
| 150 |
### 1. Create Virtual Environment
|
|
|
|
| 151 |
```bash
|
| 152 |
python -m venv venv
|
| 153 |
source venv/bin/activate # On macOS/Linux
|
|
|
|
| 156 |
```
|
| 157 |
|
| 158 |
### 2. Install Dependencies
|
|
|
|
| 159 |
```bash
|
| 160 |
pip install -r requirements.txt
|
| 161 |
```
|
| 162 |
|
| 163 |
### 3. Setup HuggingFace Token
|
|
|
|
| 164 |
Create a `.env` file in the project root:
|
|
|
|
| 165 |
```env
|
| 166 |
HF_TOKEN=your_huggingface_token_here
|
| 167 |
```
|
|
|
|
| 169 |
Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
|
| 170 |
|
| 171 |
### 4. Setup YouTube Cookies (Optional)
|
|
|
|
| 172 |
For YouTube link support, set environment variable or create `cookies.txt`:
|
|
|
|
| 173 |
```bash
|
| 174 |
export YOUTUBE_COOKIES="your_cookies_content"
|
| 175 |
```
|
|
|
|
| 181 |
## ⚙️ Configuration
|
| 182 |
|
| 183 |
### Model Selection
|
|
|
|
| 184 |
Edit model paths in `app.py`:
|
|
|
|
| 185 |
```python
|
| 186 |
LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
|
| 187 |
QWEN = "Qwen/Qwen3-4B-Instruct-2507"
|
|
|
|
| 191 |
```
|
| 192 |
|
| 193 |
### Quantization Configuration
|
|
|
|
| 194 |
```python
|
| 195 |
quant_config = BitsAndBytesConfig(
|
| 196 |
load_in_4bit=True,
|
|
|
|
| 201 |
```
|
| 202 |
|
| 203 |
### Server Configuration
|
|
|
|
| 204 |
```python
|
| 205 |
ui.launch(server_name="0.0.0.0", server_port=7860)
|
| 206 |
```
|
|
|
|
| 229 |
5. Add secrets in Space settings:
|
| 230 |
- `HF_TOKEN`: Your HuggingFace token
|
| 231 |
- `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
|
| 232 |
+
|
| 233 |
6. Space will automatically build and deploy
|
| 234 |
|
| 235 |
---
|
|
|
|
| 239 |
### Quick Start - Live Demo
|
| 240 |
|
| 241 |
#### 🌐 Try Online
|
|
|
|
| 242 |
Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
|
| 243 |
|
| 244 |
No installation required! Just upload your audio/video or paste a YouTube link.
|
| 245 |
|
| 246 |
#### 1. Launch Application (Local Setup)
|
|
|
|
| 247 |
```bash
|
| 248 |
python app.py
|
| 249 |
```
|
|
|
|
| 268 |
### Programmatic Usage
|
| 269 |
|
| 270 |
#### Transcribe Audio
|
|
|
|
| 271 |
```python
|
| 272 |
from app import transcription_whisper
|
| 273 |
|
|
|
|
| 280 |
```
|
| 281 |
|
| 282 |
#### Generate Minutes of Meeting
|
|
|
|
| 283 |
```python
|
| 284 |
from app import optimize
|
| 285 |
|
|
|
|
| 288 |
```
|
| 289 |
|
| 290 |
#### Translate Transcription
|
|
|
|
| 291 |
```python
|
| 292 |
from app import optimize_translate
|
| 293 |
|
|
|
|
| 297 |
|
| 298 |
---
|
| 299 |
|
| 300 |
+
## 🏗️ Architecture
|
| 301 |
|
| 302 |
### Component Overview
|
| 303 |
|
| 304 |
```
|
| 305 |
+
┌──────────────────────────────────────────────────────────┐
|
| 306 |
+
│ Gradio Web Interface (UI Layer) │
|
| 307 |
+
├──────────────────────────────────────────────────────────┤
|
| 308 |
+
│ │
|
| 309 |
+
│ ┌────────────────────┐ ┌────────────────┐ │
|
| 310 |
│ │ Audio/Video Input │ │ Model Select │ │
|
| 311 |
+
│ └────────────────────┘ └────────────────┘ │
|
| 312 |
+
│ │
|
| 313 |
+
│ ┌────────────────────────────────────────────────┐ │
|
| 314 |
│ │ Transcription | MOM | Translation Output │ │
|
| 315 |
+
│ └────────────────────────────────────────────────┘ │
|
| 316 |
+
├──────────────────────────────────────────────────────────┤
|
| 317 |
│ Multi-Module Processing Layer │
|
| 318 |
+
├─────────────────┬──────────────────┬────────���─────────┤
|
| 319 |
+
│ │ │ │
|
| 320 |
│ Transcription │ MOM Generation │ Translation │
|
| 321 |
│ Module │ Module │ Module │
|
| 322 |
+
│ ─────────── │ ────────────── │ ──────────── │
|
| 323 |
│ • Download │ • System Prompt │ • Language │
|
| 324 |
│ • Convert │ • User Prompt │ Validation │
|
| 325 |
│ • Transcribe │ • Generation │ • Extraction │
|
| 326 |
+
│ │ │ • Translation │
|
| 327 |
+
├─────────────────┴──────────────────┴──────────────────┤
|
| 328 |
│ LLM Integration Layer │
|
| 329 |
+
├─────────────────────────────────────────────────────────┤
|
| 330 |
│ │
|
| 331 |
│ LLAMA | PHI | QWEN | DEEPSEEK | Gemma │
|
| 332 |
│ (with 4-bit Quantization & GPU Acceleration) │
|
| 333 |
│ │
|
| 334 |
+
└─────────────────────────────────────────────────────────┘
|
| 335 |
```
|
| 336 |
|
| 337 |
### Key Functions
|
|
|
|
| 351 |
|
| 352 |
---
|
| 353 |
|
| 354 |
+
## 🐛 Troubleshooting
|
| 355 |
|
| 356 |
### Issue: YouTube download fails
|
|
|
|
| 357 |
**Solution**: Update YouTube cookies or use direct file upload
|
|
|
|
| 358 |
```bash
|
| 359 |
export YOUTUBE_COOKIES="your_updated_cookies"
|
| 360 |
# or use direct file upload instead
|
| 361 |
```
|
| 362 |
|
| 363 |
### Issue: CUDA out of memory
|
|
|
|
| 364 |
**Solution**: Reduce model size or use CPU inference
|
|
|
|
| 365 |
```python
|
| 366 |
device = "cpu" # Force CPU usage
|
| 367 |
```
|
| 368 |
|
| 369 |
### Issue: HuggingFace authentication failed
|
|
|
|
| 370 |
**Solution**: Verify HF_TOKEN in .env file
|
|
|
|
| 371 |
```bash
|
| 372 |
huggingface-cli login # Interactive login
|
| 373 |
```
|
| 374 |
|
| 375 |
### Issue: Transcription is slow
|
|
|
|
| 376 |
**Solution**: Ensure CUDA is properly configured
|
|
|
|
| 377 |
```python
|
| 378 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 379 |
print(f"Using device: {device}")
|
| 380 |
```
|
| 381 |
|
| 382 |
### Issue: Language validation fails
|
|
|
|
| 383 |
**Solution**: Use full language name or ISO code
|
|
|
|
| 384 |
```python
|
| 385 |
# Valid formats:
|
| 386 |
valid_language("English") # Full name
|
|
|
|
| 389 |
```
|
| 390 |
|
| 391 |
### Issue: Memory issues with large files
|
|
|
|
| 392 |
**Solution**: Reduce chunk size or break audio into segments
|
|
|
|
| 393 |
```python
|
| 394 |
# Process smaller chunks
|
| 395 |
segment_duration = 300 # 5 minutes per segment
|
| 396 |
```
|
| 397 |
|
| 398 |
### Issue: Generated MOM missing action items
|
|
|
|
| 399 |
**Solution**: Try different model or update system prompt
|
| 400 |
- Claude models typically produce better structured output
|
| 401 |
- QWEN is faster and generally reliable
|
|
|
|
| 425 |
## 🎓 Citation
|
| 426 |
|
| 427 |
If you use SmartScribe in your project, please cite:
|
|
|
|
| 428 |
```bibtex
|
| 429 |
@software{smartscribe2025,
|
| 430 |
author = {Asutosha Nanda},
|
|
|
|
| 436 |
|
| 437 |
---
|
| 438 |
|
| 439 |
+
<div align="center">
|
| 440 |
+
|
| 441 |
+
**[⬆ Back to Top](#-smartscribe)**
|
| 442 |
|
| 443 |
**Intelligent Audio Transcription & Meeting Documentation**
|
| 444 |
Powered by Advanced LLMs and Faster-Whisper
|
| 445 |
|
| 446 |
+
Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
|
| 447 |
+
|
| 448 |
+
</div>
|