itsasutosha commited on
Commit
df75cd6
ยท
verified ยท
1 Parent(s): 463e0e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -78
README.md CHANGED
@@ -2,17 +2,15 @@
2
  title: SmartScribe
3
  emoji: ๐ŸŽ™๏ธ
4
  colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
  ---
11
- <div align="center">
12
 
13
  # SmartScribe
14
 
15
-
16
  [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
17
  [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
18
  [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
@@ -22,63 +20,32 @@ pinned: false
22
 
23
  **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
24
 
 
25
 
26
- </div>
27
-
28
- <div align="center">
29
- <h2>๐Ÿ“‹ Table of Contents</h2>
30
- <table>
31
- <tr>
32
- <td><a href="#features">โœจ Features</a></td>
33
- <td><a href="#supported-models">๐Ÿค– Supported Models</a></td>
34
- <td><a href="#requirements">๐Ÿ“ฆ Requirements</a></td>
35
- <td><a href="#installation">๐Ÿ”ง Installation</a></td>
36
- </tr>
37
- <tr>
38
- <td><a href="#configuration">โš™๏ธ Configuration</a></td>
39
- <td><a href="#usage">๐ŸŽฎ Usage</a></td>
40
- <td><a href="#architecture">๐Ÿ—๏ธ Architecture</a></td>
41
- <td><a href="#troubleshooting">๐Ÿ› Troubleshooting</a></td>
42
- </tr>
43
- </tr>
44
- </table>
45
- </div>
46
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ---
49
 
50
-
51
-
52
  ## โœจ Features
53
 
54
- <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
55
-
56
- <div>
57
-
58
  ### ๐ŸŽ™๏ธ Audio/Video Transcription
59
  - Convert YouTube links or local audio/video files to text
60
  - Support for multiple audio formats (MP3, WAV, M4A, etc.)
61
  - GPU-accelerated transcription using Faster-Whisper
62
  - Timestamped transcription output
63
 
64
- ### ๐ŸŒ Multi-Language Translation
65
- - Translate transcriptions into any supported language
66
- - Language validation using pycountry
67
- - Clean, paragraph-formatted output
68
- - Preserves original meaning and tone
69
-
70
- ### ๐Ÿ–ฅ๏ธ Interactive Web UI
71
- - Beautiful Gradio interface
72
- - Drag-and-drop file upload
73
- - YouTube link support
74
- - Side-by-side input and output panels
75
- - Model selection dropdown
76
- - Real-time streaming responses
77
-
78
- </div>
79
-
80
- <div>
81
-
82
  ### ๐Ÿ“ Minutes of Meeting Generation
83
  - Automatically generate structured MOM documents
84
  - Professional summary with participants and date
@@ -87,6 +54,12 @@ pinned: false
87
  - Actionable items with clear ownership and deadlines
88
  - Markdown-formatted output
89
 
 
 
 
 
 
 
90
  ### ๐Ÿค– Multi-Model Support
91
  - LLAMA 3.2 3B Instruct
92
  - PHI 4 Mini Instruct
@@ -94,17 +67,20 @@ pinned: false
94
  - DeepSeek R1 Distill Qwen 1.5B
95
  - Google Gemma 3 4B IT
96
 
 
 
 
 
 
 
 
 
97
  ### โšก Performance Optimization
98
  - 4-bit quantization for efficient inference
99
  - GPU acceleration support
100
  - Memory-efficient model loading
101
  - Garbage collection and cache clearing
102
 
103
- </div>
104
-
105
- </div>
106
-
107
-
108
  ---
109
 
110
  ## ๐Ÿค– Supported Models
@@ -128,6 +104,7 @@ pinned: false
128
  - **FFmpeg** for audio processing
129
 
130
  ### Python Dependencies
 
131
  ```
132
  gradio>=4.0.0
133
  torch>=2.0.0
@@ -146,6 +123,7 @@ huggingface-hub>=0.16.0
146
  ## ๐Ÿ”ง Local Installation
147
 
148
  ### 1. Create Virtual Environment
 
149
  ```bash
150
  python -m venv venv
151
  source venv/bin/activate # On macOS/Linux
@@ -154,12 +132,15 @@ venv\Scripts\activate # On Windows
154
  ```
155
 
156
  ### 2. Install Dependencies
 
157
  ```bash
158
  pip install -r requirements.txt
159
  ```
160
 
161
  ### 3. Setup HuggingFace Token
 
162
  Create a `.env` file in the project root:
 
163
  ```env
164
  HF_TOKEN=your_huggingface_token_here
165
  ```
@@ -167,7 +148,9 @@ HF_TOKEN=your_huggingface_token_here
167
  Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
168
 
169
  ### 4. Setup YouTube Cookies (Optional)
 
170
  For YouTube link support, set environment variable or create `cookies.txt`:
 
171
  ```bash
172
  export YOUTUBE_COOKIES="your_cookies_content"
173
  ```
@@ -179,7 +162,9 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
179
  ## โš™๏ธ Configuration
180
 
181
  ### Model Selection
 
182
  Edit model paths in `app.py`:
 
183
  ```python
184
  LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
185
  QWEN = "Qwen/Qwen3-4B-Instruct-2507"
@@ -189,6 +174,7 @@ Gemma = 'google/gemma-3-4b-it'
189
  ```
190
 
191
  ### Quantization Configuration
 
192
  ```python
193
  quant_config = BitsAndBytesConfig(
194
  load_in_4bit=True,
@@ -199,6 +185,7 @@ quant_config = BitsAndBytesConfig(
199
  ```
200
 
201
  ### Server Configuration
 
202
  ```python
203
  ui.launch(server_name="0.0.0.0", server_port=7860)
204
  ```
@@ -227,7 +214,6 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
227
  5. Add secrets in Space settings:
228
  - `HF_TOKEN`: Your HuggingFace token
229
  - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
230
-
231
  6. Space will automatically build and deploy
232
 
233
  ---
@@ -237,11 +223,13 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
237
  ### Quick Start - Live Demo
238
 
239
  #### ๐ŸŒ Try Online
 
240
  Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
241
 
242
  No installation required! Just upload your audio/video or paste a YouTube link.
243
 
244
  #### 1. Launch Application (Local Setup)
 
245
  ```bash
246
  python app.py
247
  ```
@@ -266,6 +254,7 @@ The application will start at `http://0.0.0.0:7860`
266
  ### Programmatic Usage
267
 
268
  #### Transcribe Audio
 
269
  ```python
270
  from app import transcription_whisper
271
 
@@ -278,6 +267,7 @@ for seg in segments:
278
  ```
279
 
280
  #### Generate Minutes of Meeting
 
281
  ```python
282
  from app import optimize
283
 
@@ -286,6 +276,7 @@ for chunk in optimize("LLAMA", "audio.mp3"):
286
  ```
287
 
288
  #### Translate Transcription
 
289
  ```python
290
  from app import optimize_translate
291
 
@@ -295,41 +286,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
295
 
296
  ---
297
 
298
- ## ๐Ÿ—๏ธ Architecture
299
 
300
  ### Component Overview
301
 
302
  ```
303
- โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
304
- โ”‚ Gradio Web Interface (UI Layer) โ”‚
305
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
306
- โ”‚ โ”‚
307
- โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
308
  โ”‚ โ”‚ Audio/Video Input โ”‚ โ”‚ Model Select โ”‚ โ”‚
309
- โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
310
- โ”‚ โ”‚
311
- โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
312
  โ”‚ โ”‚ Transcription | MOM | Translation Output โ”‚ โ”‚
313
- โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
314
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
315
  โ”‚ Multi-Module Processing Layer โ”‚
316
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
317
- โ”‚ โ”‚ โ”‚ โ”‚
318
  โ”‚ Transcription โ”‚ MOM Generation โ”‚ Translation โ”‚
319
  โ”‚ Module โ”‚ Module โ”‚ Module โ”‚
320
- โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
321
  โ”‚ โ€ข Download โ”‚ โ€ข System Prompt โ”‚ โ€ข Language โ”‚
322
  โ”‚ โ€ข Convert โ”‚ โ€ข User Prompt โ”‚ Validation โ”‚
323
  โ”‚ โ€ข Transcribe โ”‚ โ€ข Generation โ”‚ โ€ข Extraction โ”‚
324
- โ”‚ โ”‚ โ”‚ โ€ข Translation โ”‚
325
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
326
  โ”‚ LLM Integration Layer โ”‚
327
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
328
  โ”‚ โ”‚
329
  โ”‚ LLAMA | PHI | QWEN | DEEPSEEK | Gemma โ”‚
330
  โ”‚ (with 4-bit Quantization & GPU Acceleration) โ”‚
331
  โ”‚ โ”‚
332
- โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
333
  ```
334
 
335
  ### Key Functions
@@ -349,36 +340,46 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
349
 
350
  ---
351
 
352
- ## ๐Ÿ› Troubleshooting
353
 
354
  ### Issue: YouTube download fails
 
355
  **Solution**: Update YouTube cookies or use direct file upload
 
356
  ```bash
357
  export YOUTUBE_COOKIES="your_updated_cookies"
358
  # or use direct file upload instead
359
  ```
360
 
361
  ### Issue: CUDA out of memory
 
362
  **Solution**: Reduce model size or use CPU inference
 
363
  ```python
364
  device = "cpu" # Force CPU usage
365
  ```
366
 
367
  ### Issue: HuggingFace authentication failed
 
368
  **Solution**: Verify HF_TOKEN in .env file
 
369
  ```bash
370
  huggingface-cli login # Interactive login
371
  ```
372
 
373
  ### Issue: Transcription is slow
 
374
  **Solution**: Ensure CUDA is properly configured
 
375
  ```python
376
  device = "cuda" if torch.cuda.is_available() else "cpu"
377
  print(f"Using device: {device}")
378
  ```
379
 
380
  ### Issue: Language validation fails
 
381
  **Solution**: Use full language name or ISO code
 
382
  ```python
383
  # Valid formats:
384
  valid_language("English") # Full name
@@ -387,13 +388,16 @@ valid_language("eng") # ISO 639-3 code
387
  ```
388
 
389
  ### Issue: Memory issues with large files
 
390
  **Solution**: Reduce chunk size or break audio into segments
 
391
  ```python
392
  # Process smaller chunks
393
  segment_duration = 300 # 5 minutes per segment
394
  ```
395
 
396
  ### Issue: Generated MOM missing action items
 
397
  **Solution**: Try different model or update system prompt
398
  - Claude models typically produce better structured output
399
  - QWEN is faster and generally reliable
@@ -423,6 +427,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
423
  ## ๐ŸŽ“ Citation
424
 
425
  If you use SmartScribe in your project, please cite:
 
426
  ```bibtex
427
  @software{smartscribe2025,
428
  author = {Asutosha Nanda},
@@ -434,13 +439,9 @@ If you use SmartScribe in your project, please cite:
434
 
435
  ---
436
 
437
- <div align="center">
438
-
439
- **[โฌ† Back to Top](#-smartscribe)**
440
 
441
  **Intelligent Audio Transcription & Meeting Documentation**
442
  Powered by Advanced LLMs and Faster-Whisper
443
 
444
- Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
445
-
446
- </div>
 
2
  title: SmartScribe
3
  emoji: ๐ŸŽ™๏ธ
4
  colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
  ---
 
11
 
12
  # SmartScribe
13
 
 
14
  [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
15
  [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
16
  [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
 
20
 
21
  **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
22
 
23
+ ---
24
 
25
+ ## ๐Ÿ“‹ Table of Contents
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ - [Features](#-features)
28
+ - [Supported Models](#-supported-models)
29
+ - [Requirements](#-requirements)
30
+ - [Installation](#-local-installation)
31
+ - [Configuration](#-configuration)
32
+ - [Deployment](#-deployment)
33
+ - [Usage](#-usage)
34
+ - [Architecture](#-architecture)
35
+ - [Troubleshooting](#-troubleshooting)
36
+ - [File Structure](#-file-structure)
37
+ - [License](#-license)
38
 
39
  ---
40
 
 
 
41
  ## โœจ Features
42
 
 
 
 
 
43
  ### ๐ŸŽ™๏ธ Audio/Video Transcription
44
  - Convert YouTube links or local audio/video files to text
45
  - Support for multiple audio formats (MP3, WAV, M4A, etc.)
46
  - GPU-accelerated transcription using Faster-Whisper
47
  - Timestamped transcription output
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ### ๐Ÿ“ Minutes of Meeting Generation
50
  - Automatically generate structured MOM documents
51
  - Professional summary with participants and date
 
54
  - Actionable items with clear ownership and deadlines
55
  - Markdown-formatted output
56
 
57
+ ### ๐ŸŒ Multi-Language Translation
58
+ - Translate transcriptions into any supported language
59
+ - Language validation using pycountry
60
+ - Clean, paragraph-formatted output
61
+ - Preserves original meaning and tone
62
+
63
  ### ๐Ÿค– Multi-Model Support
64
  - LLAMA 3.2 3B Instruct
65
  - PHI 4 Mini Instruct
 
67
  - DeepSeek R1 Distill Qwen 1.5B
68
  - Google Gemma 3 4B IT
69
 
70
+ ### ๐Ÿ–ฅ๏ธ Interactive Web UI
71
+ - Beautiful Gradio interface
72
+ - Drag-and-drop file upload
73
+ - YouTube link support
74
+ - Side-by-side input and output panels
75
+ - Model selection dropdown
76
+ - Real-time streaming responses
77
+
78
  ### โšก Performance Optimization
79
  - 4-bit quantization for efficient inference
80
  - GPU acceleration support
81
  - Memory-efficient model loading
82
  - Garbage collection and cache clearing
83
 
 
 
 
 
 
84
  ---
85
 
86
  ## ๐Ÿค– Supported Models
 
104
  - **FFmpeg** for audio processing
105
 
106
  ### Python Dependencies
107
+
108
  ```
109
  gradio>=4.0.0
110
  torch>=2.0.0
 
123
  ## ๐Ÿ”ง Local Installation
124
 
125
  ### 1. Create Virtual Environment
126
+
127
  ```bash
128
  python -m venv venv
129
  source venv/bin/activate # On macOS/Linux
 
132
  ```
133
 
134
  ### 2. Install Dependencies
135
+
136
  ```bash
137
  pip install -r requirements.txt
138
  ```
139
 
140
  ### 3. Setup HuggingFace Token
141
+
142
  Create a `.env` file in the project root:
143
+
144
  ```env
145
  HF_TOKEN=your_huggingface_token_here
146
  ```
 
148
  Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
149
 
150
  ### 4. Setup YouTube Cookies (Optional)
151
+
152
  For YouTube link support, set environment variable or create `cookies.txt`:
153
+
154
  ```bash
155
  export YOUTUBE_COOKIES="your_cookies_content"
156
  ```
 
162
  ## โš™๏ธ Configuration
163
 
164
  ### Model Selection
165
+
166
  Edit model paths in `app.py`:
167
+
168
  ```python
169
  LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
170
  QWEN = "Qwen/Qwen3-4B-Instruct-2507"
 
174
  ```
175
 
176
  ### Quantization Configuration
177
+
178
  ```python
179
  quant_config = BitsAndBytesConfig(
180
  load_in_4bit=True,
 
185
  ```
186
 
187
  ### Server Configuration
188
+
189
  ```python
190
  ui.launch(server_name="0.0.0.0", server_port=7860)
191
  ```
 
214
  5. Add secrets in Space settings:
215
  - `HF_TOKEN`: Your HuggingFace token
216
  - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 
217
  6. Space will automatically build and deploy
218
 
219
  ---
 
223
  ### Quick Start - Live Demo
224
 
225
  #### ๐ŸŒ Try Online
226
+
227
  Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
228
 
229
  No installation required! Just upload your audio/video or paste a YouTube link.
230
 
231
  #### 1. Launch Application (Local Setup)
232
+
233
  ```bash
234
  python app.py
235
  ```
 
254
  ### Programmatic Usage
255
 
256
  #### Transcribe Audio
257
+
258
  ```python
259
  from app import transcription_whisper
260
 
 
267
  ```
268
 
269
  #### Generate Minutes of Meeting
270
+
271
  ```python
272
  from app import optimize
273
 
 
276
  ```
277
 
278
  #### Translate Transcription
279
+
280
  ```python
281
  from app import optimize_translate
282
 
 
286
 
287
  ---
288
 
289
+ ## ๐Ÿ—บ๏ธ Architecture
290
 
291
  ### Component Overview
292
 
293
  ```
294
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
295
+ โ”‚ Gradio Web Interface (UI Layer) โ”‚
296
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
297
+ โ”‚ โ”‚
298
+ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
299
  โ”‚ โ”‚ Audio/Video Input โ”‚ โ”‚ Model Select โ”‚ โ”‚
300
+ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
301
+ โ”‚ โ”‚
302
+ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
303
  โ”‚ โ”‚ Transcription | MOM | Translation Output โ”‚ โ”‚
304
+ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
305
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
306
  โ”‚ Multi-Module Processing Layer โ”‚
307
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
308
+ โ”‚ โ”‚ โ”‚ โ”‚
309
  โ”‚ Transcription โ”‚ MOM Generation โ”‚ Translation โ”‚
310
  โ”‚ Module โ”‚ Module โ”‚ Module โ”‚
311
+ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
312
  โ”‚ โ€ข Download โ”‚ โ€ข System Prompt โ”‚ โ€ข Language โ”‚
313
  โ”‚ โ€ข Convert โ”‚ โ€ข User Prompt โ”‚ Validation โ”‚
314
  โ”‚ โ€ข Transcribe โ”‚ โ€ข Generation โ”‚ โ€ข Extraction โ”‚
315
+ โ”‚ โ”‚ โ”‚ โ€ข Translation โ”‚
316
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
317
  โ”‚ LLM Integration Layer โ”‚
318
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
319
  โ”‚ โ”‚
320
  โ”‚ LLAMA | PHI | QWEN | DEEPSEEK | Gemma โ”‚
321
  โ”‚ (with 4-bit Quantization & GPU Acceleration) โ”‚
322
  โ”‚ โ”‚
323
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
324
  ```
325
 
326
  ### Key Functions
 
340
 
341
  ---
342
 
343
+ ## ๐Ÿ›  Troubleshooting
344
 
345
  ### Issue: YouTube download fails
346
+
347
  **Solution**: Update YouTube cookies or use direct file upload
348
+
349
  ```bash
350
  export YOUTUBE_COOKIES="your_updated_cookies"
351
  # or use direct file upload instead
352
  ```
353
 
354
  ### Issue: CUDA out of memory
355
+
356
  **Solution**: Reduce model size or use CPU inference
357
+
358
  ```python
359
  device = "cpu" # Force CPU usage
360
  ```
361
 
362
  ### Issue: HuggingFace authentication failed
363
+
364
  **Solution**: Verify HF_TOKEN in .env file
365
+
366
  ```bash
367
  huggingface-cli login # Interactive login
368
  ```
369
 
370
  ### Issue: Transcription is slow
371
+
372
  **Solution**: Ensure CUDA is properly configured
373
+
374
  ```python
375
  device = "cuda" if torch.cuda.is_available() else "cpu"
376
  print(f"Using device: {device}")
377
  ```
378
 
379
  ### Issue: Language validation fails
380
+
381
  **Solution**: Use full language name or ISO code
382
+
383
  ```python
384
  # Valid formats:
385
  valid_language("English") # Full name
 
388
  ```
389
 
390
  ### Issue: Memory issues with large files
391
+
392
  **Solution**: Reduce chunk size or break audio into segments
393
+
394
  ```python
395
  # Process smaller chunks
396
  segment_duration = 300 # 5 minutes per segment
397
  ```
398
 
399
  ### Issue: Generated MOM missing action items
400
+
401
  **Solution**: Try different model or update system prompt
402
  - Claude models typically produce better structured output
403
  - QWEN is faster and generally reliable
 
427
  ## ๐ŸŽ“ Citation
428
 
429
  If you use SmartScribe in your project, please cite:
430
+
431
  ```bibtex
432
  @software{smartscribe2025,
433
  author = {Asutosha Nanda},
 
439
 
440
  ---
441
 
442
+ **[โ†‘ Back to Top](#smartscribe)**
 
 
443
 
444
  **Intelligent Audio Transcription & Meeting Documentation**
445
  Powered by Advanced LLMs and Faster-Whisper
446
 
447
+ Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)