Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -2,17 +2,15 @@
|
|
| 2 |
title: SmartScribe
|
| 3 |
emoji: ๐๏ธ
|
| 4 |
colorFrom: blue
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
-
<div align="center">
|
| 12 |
|
| 13 |
# SmartScribe
|
| 14 |
|
| 15 |
-
|
| 16 |
[](https://www.python.org/downloads/)
|
| 17 |
[](https://openai.com/research/whisper)
|
| 18 |
[](https://github.com/guillaumekln/faster-whisper)
|
|
@@ -22,63 +20,32 @@ pinned: false
|
|
| 22 |
|
| 23 |
**AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
|
| 24 |
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
<div align="center">
|
| 29 |
-
<h2>๐ Table of Contents</h2>
|
| 30 |
-
<table>
|
| 31 |
-
<tr>
|
| 32 |
-
<td><a href="#features">โจ Features</a></td>
|
| 33 |
-
<td><a href="#supported-models">๐ค Supported Models</a></td>
|
| 34 |
-
<td><a href="#requirements">๐ฆ Requirements</a></td>
|
| 35 |
-
<td><a href="#installation">๐ง Installation</a></td>
|
| 36 |
-
</tr>
|
| 37 |
-
<tr>
|
| 38 |
-
<td><a href="#configuration">โ๏ธ Configuration</a></td>
|
| 39 |
-
<td><a href="#usage">๐ฎ Usage</a></td>
|
| 40 |
-
<td><a href="#architecture">๐๏ธ Architecture</a></td>
|
| 41 |
-
<td><a href="#troubleshooting">๐ Troubleshooting</a></td>
|
| 42 |
-
</tr>
|
| 43 |
-
</tr>
|
| 44 |
-
</table>
|
| 45 |
-
</div>
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
## โจ Features
|
| 53 |
|
| 54 |
-
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
|
| 55 |
-
|
| 56 |
-
<div>
|
| 57 |
-
|
| 58 |
### ๐๏ธ Audio/Video Transcription
|
| 59 |
- Convert YouTube links or local audio/video files to text
|
| 60 |
- Support for multiple audio formats (MP3, WAV, M4A, etc.)
|
| 61 |
- GPU-accelerated transcription using Faster-Whisper
|
| 62 |
- Timestamped transcription output
|
| 63 |
|
| 64 |
-
### ๐ Multi-Language Translation
|
| 65 |
-
- Translate transcriptions into any supported language
|
| 66 |
-
- Language validation using pycountry
|
| 67 |
-
- Clean, paragraph-formatted output
|
| 68 |
-
- Preserves original meaning and tone
|
| 69 |
-
|
| 70 |
-
### ๐ฅ๏ธ Interactive Web UI
|
| 71 |
-
- Beautiful Gradio interface
|
| 72 |
-
- Drag-and-drop file upload
|
| 73 |
-
- YouTube link support
|
| 74 |
-
- Side-by-side input and output panels
|
| 75 |
-
- Model selection dropdown
|
| 76 |
-
- Real-time streaming responses
|
| 77 |
-
|
| 78 |
-
</div>
|
| 79 |
-
|
| 80 |
-
<div>
|
| 81 |
-
|
| 82 |
### ๐ Minutes of Meeting Generation
|
| 83 |
- Automatically generate structured MOM documents
|
| 84 |
- Professional summary with participants and date
|
|
@@ -87,6 +54,12 @@ pinned: false
|
|
| 87 |
- Actionable items with clear ownership and deadlines
|
| 88 |
- Markdown-formatted output
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
### ๐ค Multi-Model Support
|
| 91 |
- LLAMA 3.2 3B Instruct
|
| 92 |
- PHI 4 Mini Instruct
|
|
@@ -94,17 +67,20 @@ pinned: false
|
|
| 94 |
- DeepSeek R1 Distill Qwen 1.5B
|
| 95 |
- Google Gemma 3 4B IT
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
### โก Performance Optimization
|
| 98 |
- 4-bit quantization for efficient inference
|
| 99 |
- GPU acceleration support
|
| 100 |
- Memory-efficient model loading
|
| 101 |
- Garbage collection and cache clearing
|
| 102 |
|
| 103 |
-
</div>
|
| 104 |
-
|
| 105 |
-
</div>
|
| 106 |
-
|
| 107 |
-
|
| 108 |
---
|
| 109 |
|
| 110 |
## ๐ค Supported Models
|
|
@@ -128,6 +104,7 @@ pinned: false
|
|
| 128 |
- **FFmpeg** for audio processing
|
| 129 |
|
| 130 |
### Python Dependencies
|
|
|
|
| 131 |
```
|
| 132 |
gradio>=4.0.0
|
| 133 |
torch>=2.0.0
|
|
@@ -146,6 +123,7 @@ huggingface-hub>=0.16.0
|
|
| 146 |
## ๐ง Local Installation
|
| 147 |
|
| 148 |
### 1. Create Virtual Environment
|
|
|
|
| 149 |
```bash
|
| 150 |
python -m venv venv
|
| 151 |
source venv/bin/activate # On macOS/Linux
|
|
@@ -154,12 +132,15 @@ venv\Scripts\activate # On Windows
|
|
| 154 |
```
|
| 155 |
|
| 156 |
### 2. Install Dependencies
|
|
|
|
| 157 |
```bash
|
| 158 |
pip install -r requirements.txt
|
| 159 |
```
|
| 160 |
|
| 161 |
### 3. Setup HuggingFace Token
|
|
|
|
| 162 |
Create a `.env` file in the project root:
|
|
|
|
| 163 |
```env
|
| 164 |
HF_TOKEN=your_huggingface_token_here
|
| 165 |
```
|
|
@@ -167,7 +148,9 @@ HF_TOKEN=your_huggingface_token_here
|
|
| 167 |
Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
|
| 168 |
|
| 169 |
### 4. Setup YouTube Cookies (Optional)
|
|
|
|
| 170 |
For YouTube link support, set environment variable or create `cookies.txt`:
|
|
|
|
| 171 |
```bash
|
| 172 |
export YOUTUBE_COOKIES="your_cookies_content"
|
| 173 |
```
|
|
@@ -179,7 +162,9 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
|
|
| 179 |
## โ๏ธ Configuration
|
| 180 |
|
| 181 |
### Model Selection
|
|
|
|
| 182 |
Edit model paths in `app.py`:
|
|
|
|
| 183 |
```python
|
| 184 |
LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
|
| 185 |
QWEN = "Qwen/Qwen3-4B-Instruct-2507"
|
|
@@ -189,6 +174,7 @@ Gemma = 'google/gemma-3-4b-it'
|
|
| 189 |
```
|
| 190 |
|
| 191 |
### Quantization Configuration
|
|
|
|
| 192 |
```python
|
| 193 |
quant_config = BitsAndBytesConfig(
|
| 194 |
load_in_4bit=True,
|
|
@@ -199,6 +185,7 @@ quant_config = BitsAndBytesConfig(
|
|
| 199 |
```
|
| 200 |
|
| 201 |
### Server Configuration
|
|
|
|
| 202 |
```python
|
| 203 |
ui.launch(server_name="0.0.0.0", server_port=7860)
|
| 204 |
```
|
|
@@ -227,7 +214,6 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
|
|
| 227 |
5. Add secrets in Space settings:
|
| 228 |
- `HF_TOKEN`: Your HuggingFace token
|
| 229 |
- `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
|
| 230 |
-
|
| 231 |
6. Space will automatically build and deploy
|
| 232 |
|
| 233 |
---
|
|
@@ -237,11 +223,13 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
|
|
| 237 |
### Quick Start - Live Demo
|
| 238 |
|
| 239 |
#### ๐ Try Online
|
|
|
|
| 240 |
Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
|
| 241 |
|
| 242 |
No installation required! Just upload your audio/video or paste a YouTube link.
|
| 243 |
|
| 244 |
#### 1. Launch Application (Local Setup)
|
|
|
|
| 245 |
```bash
|
| 246 |
python app.py
|
| 247 |
```
|
|
@@ -266,6 +254,7 @@ The application will start at `http://0.0.0.0:7860`
|
|
| 266 |
### Programmatic Usage
|
| 267 |
|
| 268 |
#### Transcribe Audio
|
|
|
|
| 269 |
```python
|
| 270 |
from app import transcription_whisper
|
| 271 |
|
|
@@ -278,6 +267,7 @@ for seg in segments:
|
|
| 278 |
```
|
| 279 |
|
| 280 |
#### Generate Minutes of Meeting
|
|
|
|
| 281 |
```python
|
| 282 |
from app import optimize
|
| 283 |
|
|
@@ -286,6 +276,7 @@ for chunk in optimize("LLAMA", "audio.mp3"):
|
|
| 286 |
```
|
| 287 |
|
| 288 |
#### Translate Transcription
|
|
|
|
| 289 |
```python
|
| 290 |
from app import optimize_translate
|
| 291 |
|
|
@@ -295,41 +286,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
|
|
| 295 |
|
| 296 |
---
|
| 297 |
|
| 298 |
-
##
|
| 299 |
|
| 300 |
### Component Overview
|
| 301 |
|
| 302 |
```
|
| 303 |
-
|
| 304 |
-
โ Gradio Web Interface (UI Layer)
|
| 305 |
-
|
| 306 |
-
โ
|
| 307 |
-
โ
|
| 308 |
โ โ Audio/Video Input โ โ Model Select โ โ
|
| 309 |
-
โ
|
| 310 |
-
โ
|
| 311 |
-
โ
|
| 312 |
โ โ Transcription | MOM | Translation Output โ โ
|
| 313 |
-
โ
|
| 314 |
-
|
| 315 |
โ Multi-Module Processing Layer โ
|
| 316 |
-
|
| 317 |
-
โ
|
| 318 |
โ Transcription โ MOM Generation โ Translation โ
|
| 319 |
โ Module โ Module โ Module โ
|
| 320 |
-
โ
|
| 321 |
โ โข Download โ โข System Prompt โ โข Language โ
|
| 322 |
โ โข Convert โ โข User Prompt โ Validation โ
|
| 323 |
โ โข Transcribe โ โข Generation โ โข Extraction โ
|
| 324 |
-
โ
|
| 325 |
-
|
| 326 |
โ LLM Integration Layer โ
|
| 327 |
-
|
| 328 |
โ โ
|
| 329 |
โ LLAMA | PHI | QWEN | DEEPSEEK | Gemma โ
|
| 330 |
โ (with 4-bit Quantization & GPU Acceleration) โ
|
| 331 |
โ โ
|
| 332 |
-
|
| 333 |
```
|
| 334 |
|
| 335 |
### Key Functions
|
|
@@ -349,36 +340,46 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
|
|
| 349 |
|
| 350 |
---
|
| 351 |
|
| 352 |
-
##
|
| 353 |
|
| 354 |
### Issue: YouTube download fails
|
|
|
|
| 355 |
**Solution**: Update YouTube cookies or use direct file upload
|
|
|
|
| 356 |
```bash
|
| 357 |
export YOUTUBE_COOKIES="your_updated_cookies"
|
| 358 |
# or use direct file upload instead
|
| 359 |
```
|
| 360 |
|
| 361 |
### Issue: CUDA out of memory
|
|
|
|
| 362 |
**Solution**: Reduce model size or use CPU inference
|
|
|
|
| 363 |
```python
|
| 364 |
device = "cpu" # Force CPU usage
|
| 365 |
```
|
| 366 |
|
| 367 |
### Issue: HuggingFace authentication failed
|
|
|
|
| 368 |
**Solution**: Verify HF_TOKEN in .env file
|
|
|
|
| 369 |
```bash
|
| 370 |
huggingface-cli login # Interactive login
|
| 371 |
```
|
| 372 |
|
| 373 |
### Issue: Transcription is slow
|
|
|
|
| 374 |
**Solution**: Ensure CUDA is properly configured
|
|
|
|
| 375 |
```python
|
| 376 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 377 |
print(f"Using device: {device}")
|
| 378 |
```
|
| 379 |
|
| 380 |
### Issue: Language validation fails
|
|
|
|
| 381 |
**Solution**: Use full language name or ISO code
|
|
|
|
| 382 |
```python
|
| 383 |
# Valid formats:
|
| 384 |
valid_language("English") # Full name
|
|
@@ -387,13 +388,16 @@ valid_language("eng") # ISO 639-3 code
|
|
| 387 |
```
|
| 388 |
|
| 389 |
### Issue: Memory issues with large files
|
|
|
|
| 390 |
**Solution**: Reduce chunk size or break audio into segments
|
|
|
|
| 391 |
```python
|
| 392 |
# Process smaller chunks
|
| 393 |
segment_duration = 300 # 5 minutes per segment
|
| 394 |
```
|
| 395 |
|
| 396 |
### Issue: Generated MOM missing action items
|
|
|
|
| 397 |
**Solution**: Try different model or update system prompt
|
| 398 |
- Claude models typically produce better structured output
|
| 399 |
- QWEN is faster and generally reliable
|
|
@@ -423,6 +427,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
|
|
| 423 |
## ๐ Citation
|
| 424 |
|
| 425 |
If you use SmartScribe in your project, please cite:
|
|
|
|
| 426 |
```bibtex
|
| 427 |
@software{smartscribe2025,
|
| 428 |
author = {Asutosha Nanda},
|
|
@@ -434,13 +439,9 @@ If you use SmartScribe in your project, please cite:
|
|
| 434 |
|
| 435 |
---
|
| 436 |
|
| 437 |
-
|
| 438 |
-
|
| 439 |
-
**[โฌ Back to Top](#-smartscribe)**
|
| 440 |
|
| 441 |
**Intelligent Audio Transcription & Meeting Documentation**
|
| 442 |
Powered by Advanced LLMs and Faster-Whisper
|
| 443 |
|
| 444 |
-
Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
|
| 445 |
-
|
| 446 |
-
</div>
|
|
|
|
| 2 |
title: SmartScribe
|
| 3 |
emoji: ๐๏ธ
|
| 4 |
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
|
|
|
| 11 |
|
| 12 |
# SmartScribe
|
| 13 |
|
|
|
|
| 14 |
[](https://www.python.org/downloads/)
|
| 15 |
[](https://openai.com/research/whisper)
|
| 16 |
[](https://github.com/guillaumekln/faster-whisper)
|
|
|
|
| 20 |
|
| 21 |
**AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
|
| 22 |
|
| 23 |
+
---
|
| 24 |
|
| 25 |
+
## ๐ Table of Contents
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
- [Features](#-features)
|
| 28 |
+
- [Supported Models](#-supported-models)
|
| 29 |
+
- [Requirements](#-requirements)
|
| 30 |
+
- [Installation](#-local-installation)
|
| 31 |
+
- [Configuration](#-configuration)
|
| 32 |
+
- [Deployment](#-deployment)
|
| 33 |
+
- [Usage](#-usage)
|
| 34 |
+
- [Architecture](#-architecture)
|
| 35 |
+
- [Troubleshooting](#-troubleshooting)
|
| 36 |
+
- [File Structure](#-file-structure)
|
| 37 |
+
- [License](#-license)
|
| 38 |
|
| 39 |
---
|
| 40 |
|
|
|
|
|
|
|
| 41 |
## โจ Features
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
### ๐๏ธ Audio/Video Transcription
|
| 44 |
- Convert YouTube links or local audio/video files to text
|
| 45 |
- Support for multiple audio formats (MP3, WAV, M4A, etc.)
|
| 46 |
- GPU-accelerated transcription using Faster-Whisper
|
| 47 |
- Timestamped transcription output
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
### ๐ Minutes of Meeting Generation
|
| 50 |
- Automatically generate structured MOM documents
|
| 51 |
- Professional summary with participants and date
|
|
|
|
| 54 |
- Actionable items with clear ownership and deadlines
|
| 55 |
- Markdown-formatted output
|
| 56 |
|
| 57 |
+
### ๐ Multi-Language Translation
|
| 58 |
+
- Translate transcriptions into any supported language
|
| 59 |
+
- Language validation using pycountry
|
| 60 |
+
- Clean, paragraph-formatted output
|
| 61 |
+
- Preserves original meaning and tone
|
| 62 |
+
|
| 63 |
### ๐ค Multi-Model Support
|
| 64 |
- LLAMA 3.2 3B Instruct
|
| 65 |
- PHI 4 Mini Instruct
|
|
|
|
| 67 |
- DeepSeek R1 Distill Qwen 1.5B
|
| 68 |
- Google Gemma 3 4B IT
|
| 69 |
|
| 70 |
+
### ๐ฅ๏ธ Interactive Web UI
|
| 71 |
+
- Beautiful Gradio interface
|
| 72 |
+
- Drag-and-drop file upload
|
| 73 |
+
- YouTube link support
|
| 74 |
+
- Side-by-side input and output panels
|
| 75 |
+
- Model selection dropdown
|
| 76 |
+
- Real-time streaming responses
|
| 77 |
+
|
| 78 |
### โก Performance Optimization
|
| 79 |
- 4-bit quantization for efficient inference
|
| 80 |
- GPU acceleration support
|
| 81 |
- Memory-efficient model loading
|
| 82 |
- Garbage collection and cache clearing
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
---
|
| 85 |
|
| 86 |
## ๐ค Supported Models
|
|
|
|
| 104 |
- **FFmpeg** for audio processing
|
| 105 |
|
| 106 |
### Python Dependencies
|
| 107 |
+
|
| 108 |
```
|
| 109 |
gradio>=4.0.0
|
| 110 |
torch>=2.0.0
|
|
|
|
| 123 |
## ๐ง Local Installation
|
| 124 |
|
| 125 |
### 1. Create Virtual Environment
|
| 126 |
+
|
| 127 |
```bash
|
| 128 |
python -m venv venv
|
| 129 |
source venv/bin/activate # On macOS/Linux
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
### 2. Install Dependencies
|
| 135 |
+
|
| 136 |
```bash
|
| 137 |
pip install -r requirements.txt
|
| 138 |
```
|
| 139 |
|
| 140 |
### 3. Setup HuggingFace Token
|
| 141 |
+
|
| 142 |
Create a `.env` file in the project root:
|
| 143 |
+
|
| 144 |
```env
|
| 145 |
HF_TOKEN=your_huggingface_token_here
|
| 146 |
```
|
|
|
|
| 148 |
Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
|
| 149 |
|
| 150 |
### 4. Setup YouTube Cookies (Optional)
|
| 151 |
+
|
| 152 |
For YouTube link support, set environment variable or create `cookies.txt`:
|
| 153 |
+
|
| 154 |
```bash
|
| 155 |
export YOUTUBE_COOKIES="your_cookies_content"
|
| 156 |
```
|
|
|
|
| 162 |
## โ๏ธ Configuration
|
| 163 |
|
| 164 |
### Model Selection
|
| 165 |
+
|
| 166 |
Edit model paths in `app.py`:
|
| 167 |
+
|
| 168 |
```python
|
| 169 |
LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
|
| 170 |
QWEN = "Qwen/Qwen3-4B-Instruct-2507"
|
|
|
|
| 174 |
```
|
| 175 |
|
| 176 |
### Quantization Configuration
|
| 177 |
+
|
| 178 |
```python
|
| 179 |
quant_config = BitsAndBytesConfig(
|
| 180 |
load_in_4bit=True,
|
|
|
|
| 185 |
```
|
| 186 |
|
| 187 |
### Server Configuration
|
| 188 |
+
|
| 189 |
```python
|
| 190 |
ui.launch(server_name="0.0.0.0", server_port=7860)
|
| 191 |
```
|
|
|
|
| 214 |
5. Add secrets in Space settings:
|
| 215 |
- `HF_TOKEN`: Your HuggingFace token
|
| 216 |
- `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
|
|
|
|
| 217 |
6. Space will automatically build and deploy
|
| 218 |
|
| 219 |
---
|
|
|
|
| 223 |
### Quick Start - Live Demo
|
| 224 |
|
| 225 |
#### ๐ Try Online
|
| 226 |
+
|
| 227 |
Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
|
| 228 |
|
| 229 |
No installation required! Just upload your audio/video or paste a YouTube link.
|
| 230 |
|
| 231 |
#### 1. Launch Application (Local Setup)
|
| 232 |
+
|
| 233 |
```bash
|
| 234 |
python app.py
|
| 235 |
```
|
|
|
|
| 254 |
### Programmatic Usage
|
| 255 |
|
| 256 |
#### Transcribe Audio
|
| 257 |
+
|
| 258 |
```python
|
| 259 |
from app import transcription_whisper
|
| 260 |
|
|
|
|
| 267 |
```
|
| 268 |
|
| 269 |
#### Generate Minutes of Meeting
|
| 270 |
+
|
| 271 |
```python
|
| 272 |
from app import optimize
|
| 273 |
|
|
|
|
| 276 |
```
|
| 277 |
|
| 278 |
#### Translate Transcription
|
| 279 |
+
|
| 280 |
```python
|
| 281 |
from app import optimize_translate
|
| 282 |
|
|
|
|
| 286 |
|
| 287 |
---
|
| 288 |
|
| 289 |
+
## ๐บ๏ธ Architecture
|
| 290 |
|
| 291 |
### Component Overview
|
| 292 |
|
| 293 |
```
|
| 294 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 295 |
+
โ Gradio Web Interface (UI Layer) โ
|
| 296 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
| 297 |
+
โ โ
|
| 298 |
+
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
|
| 299 |
โ โ Audio/Video Input โ โ Model Select โ โ
|
| 300 |
+
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
|
| 301 |
+
โ โ
|
| 302 |
+
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
|
| 303 |
โ โ Transcription | MOM | Translation Output โ โ
|
| 304 |
+
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
|
| 305 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
| 306 |
โ Multi-Module Processing Layer โ
|
| 307 |
+
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโค
|
| 308 |
+
โ โ โ โ
|
| 309 |
โ Transcription โ MOM Generation โ Translation โ
|
| 310 |
โ Module โ Module โ Module โ
|
| 311 |
+
โ โโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโ โ
|
| 312 |
โ โข Download โ โข System Prompt โ โข Language โ
|
| 313 |
โ โข Convert โ โข User Prompt โ Validation โ
|
| 314 |
โ โข Transcribe โ โข Generation โ โข Extraction โ
|
| 315 |
+
โ โ โ โข Translation โ
|
| 316 |
+
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโค
|
| 317 |
โ LLM Integration Layer โ
|
| 318 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
| 319 |
โ โ
|
| 320 |
โ LLAMA | PHI | QWEN | DEEPSEEK | Gemma โ
|
| 321 |
โ (with 4-bit Quantization & GPU Acceleration) โ
|
| 322 |
โ โ
|
| 323 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 324 |
```
|
| 325 |
|
| 326 |
### Key Functions
|
|
|
|
| 340 |
|
| 341 |
---
|
| 342 |
|
| 343 |
+
## ๐ Troubleshooting
|
| 344 |
|
| 345 |
### Issue: YouTube download fails
|
| 346 |
+
|
| 347 |
**Solution**: Update YouTube cookies or use direct file upload
|
| 348 |
+
|
| 349 |
```bash
|
| 350 |
export YOUTUBE_COOKIES="your_updated_cookies"
|
| 351 |
# or use direct file upload instead
|
| 352 |
```
|
| 353 |
|
| 354 |
### Issue: CUDA out of memory
|
| 355 |
+
|
| 356 |
**Solution**: Reduce model size or use CPU inference
|
| 357 |
+
|
| 358 |
```python
|
| 359 |
device = "cpu" # Force CPU usage
|
| 360 |
```
|
| 361 |
|
| 362 |
### Issue: HuggingFace authentication failed
|
| 363 |
+
|
| 364 |
**Solution**: Verify HF_TOKEN in .env file
|
| 365 |
+
|
| 366 |
```bash
|
| 367 |
huggingface-cli login # Interactive login
|
| 368 |
```
|
| 369 |
|
| 370 |
### Issue: Transcription is slow
|
| 371 |
+
|
| 372 |
**Solution**: Ensure CUDA is properly configured
|
| 373 |
+
|
| 374 |
```python
|
| 375 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 376 |
print(f"Using device: {device}")
|
| 377 |
```
|
| 378 |
|
| 379 |
### Issue: Language validation fails
|
| 380 |
+
|
| 381 |
**Solution**: Use full language name or ISO code
|
| 382 |
+
|
| 383 |
```python
|
| 384 |
# Valid formats:
|
| 385 |
valid_language("English") # Full name
|
|
|
|
| 388 |
```
|
| 389 |
|
| 390 |
### Issue: Memory issues with large files
|
| 391 |
+
|
| 392 |
**Solution**: Reduce chunk size or break audio into segments
|
| 393 |
+
|
| 394 |
```python
|
| 395 |
# Process smaller chunks
|
| 396 |
segment_duration = 300 # 5 minutes per segment
|
| 397 |
```
|
| 398 |
|
| 399 |
### Issue: Generated MOM missing action items
|
| 400 |
+
|
| 401 |
**Solution**: Try different model or update system prompt
|
| 402 |
- Claude models typically produce better structured output
|
| 403 |
- QWEN is faster and generally reliable
|
|
|
|
| 427 |
## ๐ Citation
|
| 428 |
|
| 429 |
If you use SmartScribe in your project, please cite:
|
| 430 |
+
|
| 431 |
```bibtex
|
| 432 |
@software{smartscribe2025,
|
| 433 |
author = {Asutosha Nanda},
|
|
|
|
| 439 |
|
| 440 |
---
|
| 441 |
|
| 442 |
+
**[โ Back to Top](#smartscribe)**
|
|
|
|
|
|
|
| 443 |
|
| 444 |
**Intelligent Audio Transcription & Meeting Documentation**
|
| 445 |
Powered by Advanced LLMs and Faster-Whisper
|
| 446 |
|
| 447 |
+
Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
|
|
|
|
|
|