itsasutosha commited on
Commit
ec90699
·
verified ·
1 Parent(s): df75cd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -88
README.md CHANGED
@@ -1,16 +1,20 @@
1
- ---
2
- title: SmartScribe
3
- emoji: 🎙️
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- ---
 
 
 
11
 
12
  # SmartScribe
13
 
 
14
  [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
15
  [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
16
  [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
@@ -20,32 +24,63 @@ pinned: false
20
 
21
  **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
22
 
23
- ---
24
 
25
- ## 📋 Table of Contents
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- - [Features](#-features)
28
- - [Supported Models](#-supported-models)
29
- - [Requirements](#-requirements)
30
- - [Installation](#-local-installation)
31
- - [Configuration](#-configuration)
32
- - [Deployment](#-deployment)
33
- - [Usage](#-usage)
34
- - [Architecture](#-architecture)
35
- - [Troubleshooting](#-troubleshooting)
36
- - [File Structure](#-file-structure)
37
- - [License](#-license)
38
 
39
  ---
40
 
 
 
41
  ## ✨ Features
42
 
 
 
 
 
43
  ### 🎙️ Audio/Video Transcription
44
  - Convert YouTube links or local audio/video files to text
45
  - Support for multiple audio formats (MP3, WAV, M4A, etc.)
46
  - GPU-accelerated transcription using Faster-Whisper
47
  - Timestamped transcription output
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ### 📝 Minutes of Meeting Generation
50
  - Automatically generate structured MOM documents
51
  - Professional summary with participants and date
@@ -54,12 +89,6 @@ pinned: false
54
  - Actionable items with clear ownership and deadlines
55
  - Markdown-formatted output
56
 
57
- ### 🌍 Multi-Language Translation
58
- - Translate transcriptions into any supported language
59
- - Language validation using pycountry
60
- - Clean, paragraph-formatted output
61
- - Preserves original meaning and tone
62
-
63
  ### 🤖 Multi-Model Support
64
  - LLAMA 3.2 3B Instruct
65
  - PHI 4 Mini Instruct
@@ -67,20 +96,17 @@ pinned: false
67
  - DeepSeek R1 Distill Qwen 1.5B
68
  - Google Gemma 3 4B IT
69
 
70
- ### 🖥️ Interactive Web UI
71
- - Beautiful Gradio interface
72
- - Drag-and-drop file upload
73
- - YouTube link support
74
- - Side-by-side input and output panels
75
- - Model selection dropdown
76
- - Real-time streaming responses
77
-
78
  ### ⚡ Performance Optimization
79
  - 4-bit quantization for efficient inference
80
  - GPU acceleration support
81
  - Memory-efficient model loading
82
  - Garbage collection and cache clearing
83
 
 
 
 
 
 
84
  ---
85
 
86
  ## 🤖 Supported Models
@@ -104,7 +130,6 @@ pinned: false
104
  - **FFmpeg** for audio processing
105
 
106
  ### Python Dependencies
107
-
108
  ```
109
  gradio>=4.0.0
110
  torch>=2.0.0
@@ -123,7 +148,6 @@ huggingface-hub>=0.16.0
123
  ## 🔧 Local Installation
124
 
125
  ### 1. Create Virtual Environment
126
-
127
  ```bash
128
  python -m venv venv
129
  source venv/bin/activate # On macOS/Linux
@@ -132,15 +156,12 @@ venv\Scripts\activate # On Windows
132
  ```
133
 
134
  ### 2. Install Dependencies
135
-
136
  ```bash
137
  pip install -r requirements.txt
138
  ```
139
 
140
  ### 3. Setup HuggingFace Token
141
-
142
  Create a `.env` file in the project root:
143
-
144
  ```env
145
  HF_TOKEN=your_huggingface_token_here
146
  ```
@@ -148,9 +169,7 @@ HF_TOKEN=your_huggingface_token_here
148
  Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
149
 
150
  ### 4. Setup YouTube Cookies (Optional)
151
-
152
  For YouTube link support, set environment variable or create `cookies.txt`:
153
-
154
  ```bash
155
  export YOUTUBE_COOKIES="your_cookies_content"
156
  ```
@@ -162,9 +181,7 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
162
  ## ⚙️ Configuration
163
 
164
  ### Model Selection
165
-
166
  Edit model paths in `app.py`:
167
-
168
  ```python
169
  LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
170
  QWEN = "Qwen/Qwen3-4B-Instruct-2507"
@@ -174,7 +191,6 @@ Gemma = 'google/gemma-3-4b-it'
174
  ```
175
 
176
  ### Quantization Configuration
177
-
178
  ```python
179
  quant_config = BitsAndBytesConfig(
180
  load_in_4bit=True,
@@ -185,7 +201,6 @@ quant_config = BitsAndBytesConfig(
185
  ```
186
 
187
  ### Server Configuration
188
-
189
  ```python
190
  ui.launch(server_name="0.0.0.0", server_port=7860)
191
  ```
@@ -214,6 +229,7 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
214
  5. Add secrets in Space settings:
215
  - `HF_TOKEN`: Your HuggingFace token
216
  - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 
217
  6. Space will automatically build and deploy
218
 
219
  ---
@@ -223,13 +239,11 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
223
  ### Quick Start - Live Demo
224
 
225
  #### 🌐 Try Online
226
-
227
  Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
228
 
229
  No installation required! Just upload your audio/video or paste a YouTube link.
230
 
231
  #### 1. Launch Application (Local Setup)
232
-
233
  ```bash
234
  python app.py
235
  ```
@@ -254,7 +268,6 @@ The application will start at `http://0.0.0.0:7860`
254
  ### Programmatic Usage
255
 
256
  #### Transcribe Audio
257
-
258
  ```python
259
  from app import transcription_whisper
260
 
@@ -267,7 +280,6 @@ for seg in segments:
267
  ```
268
 
269
  #### Generate Minutes of Meeting
270
-
271
  ```python
272
  from app import optimize
273
 
@@ -276,7 +288,6 @@ for chunk in optimize("LLAMA", "audio.mp3"):
276
  ```
277
 
278
  #### Translate Transcription
279
-
280
  ```python
281
  from app import optimize_translate
282
 
@@ -286,41 +297,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
286
 
287
  ---
288
 
289
- ## 🗺️ Architecture
290
 
291
  ### Component Overview
292
 
293
  ```
294
- ┌──────────────────────────────────────────────────────────────────┐
295
- │ Gradio Web Interface (UI Layer)
296
- ├──────────────────────────────────────────────────────────────────┤
297
-
298
- ┌────────────────────── ┌──────────────────────
299
  │ │ Audio/Video Input │ │ Model Select │ │
300
- └────────────────────── └──────────────────────
301
-
302
- ┌────────────────────────────────────────────────────┐
303
  │ │ Transcription | MOM | Translation Output │ │
304
- └────────────────────────────────────────────────────┘
305
- ├──────────────────────────────────────────────────────────────────┤
306
  │ Multi-Module Processing Layer │
307
- ├──────────────────────┬─────────────────────┬──────────────────────┤
308
- │ │ │
309
  │ Transcription │ MOM Generation │ Translation │
310
  │ Module │ Module │ Module │
311
- ────────────── ──────────────── │ ──────────── │
312
  │ • Download │ • System Prompt │ • Language │
313
  │ • Convert │ • User Prompt │ Validation │
314
  │ • Transcribe │ • Generation │ • Extraction │
315
- │ │ • Translation │
316
- ├──────────────────┴─────────────────┴──────────────────────┤
317
  │ LLM Integration Layer │
318
- ├──────────────────────────────────────────────────────────┤
319
  │ │
320
  │ LLAMA | PHI | QWEN | DEEPSEEK | Gemma │
321
  │ (with 4-bit Quantization & GPU Acceleration) │
322
  │ │
323
- └──────────────────────────────────────────────────────────┘
324
  ```
325
 
326
  ### Key Functions
@@ -340,46 +351,36 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
340
 
341
  ---
342
 
343
- ## 🛠 Troubleshooting
344
 
345
  ### Issue: YouTube download fails
346
-
347
  **Solution**: Update YouTube cookies or use direct file upload
348
-
349
  ```bash
350
  export YOUTUBE_COOKIES="your_updated_cookies"
351
  # or use direct file upload instead
352
  ```
353
 
354
  ### Issue: CUDA out of memory
355
-
356
  **Solution**: Reduce model size or use CPU inference
357
-
358
  ```python
359
  device = "cpu" # Force CPU usage
360
  ```
361
 
362
  ### Issue: HuggingFace authentication failed
363
-
364
  **Solution**: Verify HF_TOKEN in .env file
365
-
366
  ```bash
367
  huggingface-cli login # Interactive login
368
  ```
369
 
370
  ### Issue: Transcription is slow
371
-
372
  **Solution**: Ensure CUDA is properly configured
373
-
374
  ```python
375
  device = "cuda" if torch.cuda.is_available() else "cpu"
376
  print(f"Using device: {device}")
377
  ```
378
 
379
  ### Issue: Language validation fails
380
-
381
  **Solution**: Use full language name or ISO code
382
-
383
  ```python
384
  # Valid formats:
385
  valid_language("English") # Full name
@@ -388,16 +389,13 @@ valid_language("eng") # ISO 639-3 code
388
  ```
389
 
390
  ### Issue: Memory issues with large files
391
-
392
  **Solution**: Reduce chunk size or break audio into segments
393
-
394
  ```python
395
  # Process smaller chunks
396
  segment_duration = 300 # 5 minutes per segment
397
  ```
398
 
399
  ### Issue: Generated MOM missing action items
400
-
401
  **Solution**: Try different model or update system prompt
402
  - Claude models typically produce better structured output
403
  - QWEN is faster and generally reliable
@@ -427,7 +425,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
427
  ## 🎓 Citation
428
 
429
  If you use SmartScribe in your project, please cite:
430
-
431
  ```bibtex
432
  @software{smartscribe2025,
433
  author = {Asutosha Nanda},
@@ -439,9 +436,13 @@ If you use SmartScribe in your project, please cite:
439
 
440
  ---
441
 
442
- **[↑ Back to Top](#smartscribe)**
 
 
443
 
444
  **Intelligent Audio Transcription & Meeting Documentation**
445
  Powered by Advanced LLMs and Faster-Whisper
446
 
447
- Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
 
 
 
1
+ ---
2
+ title: SmartScribe
3
+ emoji: 🎙️
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ short_description: Transcription, Summarization & Translation
12
+ ---
13
+ <div align="center">
14
 
15
  # SmartScribe
16
 
17
+
18
  [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
19
  [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
20
  [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
 
24
 
25
  **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
26
 
 
27
 
28
+ </div>
29
+
30
+ <div align="center">
31
+ <h2>📋 Table of Contents</h2>
32
+ <table>
33
+ <tr>
34
+ <td><a href="#features">✨ Features</a></td>
35
+ <td><a href="#supported-models">🤖 Supported Models</a></td>
36
+ <td><a href="#requirements">📦 Requirements</a></td>
37
+ <td><a href="#installation">🔧 Installation</a></td>
38
+ </tr>
39
+ <tr>
40
+ <td><a href="#configuration">⚙️ Configuration</a></td>
41
+ <td><a href="#usage">🎮 Usage</a></td>
42
+ <td><a href="#architecture">🏗️ Architecture</a></td>
43
+ <td><a href="#troubleshooting">🐛 Troubleshooting</a></td>
44
+ </tr>
45
+ </tr>
46
+ </table>
47
+ </div>
48
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ---
51
 
52
+
53
+
54
  ## ✨ Features
55
 
56
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
57
+
58
+ <div>
59
+
60
  ### 🎙️ Audio/Video Transcription
61
  - Convert YouTube links or local audio/video files to text
62
  - Support for multiple audio formats (MP3, WAV, M4A, etc.)
63
  - GPU-accelerated transcription using Faster-Whisper
64
  - Timestamped transcription output
65
 
66
+ ### 🌍 Multi-Language Translation
67
+ - Translate transcriptions into any supported language
68
+ - Language validation using pycountry
69
+ - Clean, paragraph-formatted output
70
+ - Preserves original meaning and tone
71
+
72
+ ### 🖥️ Interactive Web UI
73
+ - Beautiful Gradio interface
74
+ - Drag-and-drop file upload
75
+ - YouTube link support
76
+ - Side-by-side input and output panels
77
+ - Model selection dropdown
78
+ - Real-time streaming responses
79
+
80
+ </div>
81
+
82
+ <div>
83
+
84
  ### 📝 Minutes of Meeting Generation
85
  - Automatically generate structured MOM documents
86
  - Professional summary with participants and date
 
89
  - Actionable items with clear ownership and deadlines
90
  - Markdown-formatted output
91
 
 
 
 
 
 
 
92
  ### 🤖 Multi-Model Support
93
  - LLAMA 3.2 3B Instruct
94
  - PHI 4 Mini Instruct
 
96
  - DeepSeek R1 Distill Qwen 1.5B
97
  - Google Gemma 3 4B IT
98
 
 
 
 
 
 
 
 
 
99
  ### ⚡ Performance Optimization
100
  - 4-bit quantization for efficient inference
101
  - GPU acceleration support
102
  - Memory-efficient model loading
103
  - Garbage collection and cache clearing
104
 
105
+ </div>
106
+
107
+ </div>
108
+
109
+
110
  ---
111
 
112
  ## 🤖 Supported Models
 
130
  - **FFmpeg** for audio processing
131
 
132
  ### Python Dependencies
 
133
  ```
134
  gradio>=4.0.0
135
  torch>=2.0.0
 
148
  ## 🔧 Local Installation
149
 
150
  ### 1. Create Virtual Environment
 
151
  ```bash
152
  python -m venv venv
153
  source venv/bin/activate # On macOS/Linux
 
156
  ```
157
 
158
  ### 2. Install Dependencies
 
159
  ```bash
160
  pip install -r requirements.txt
161
  ```
162
 
163
  ### 3. Setup HuggingFace Token
 
164
  Create a `.env` file in the project root:
 
165
  ```env
166
  HF_TOKEN=your_huggingface_token_here
167
  ```
 
169
  Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
170
 
171
  ### 4. Setup YouTube Cookies (Optional)
 
172
  For YouTube link support, set environment variable or create `cookies.txt`:
 
173
  ```bash
174
  export YOUTUBE_COOKIES="your_cookies_content"
175
  ```
 
181
  ## ⚙️ Configuration
182
 
183
  ### Model Selection
 
184
  Edit model paths in `app.py`:
 
185
  ```python
186
  LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
187
  QWEN = "Qwen/Qwen3-4B-Instruct-2507"
 
191
  ```
192
 
193
  ### Quantization Configuration
 
194
  ```python
195
  quant_config = BitsAndBytesConfig(
196
  load_in_4bit=True,
 
201
  ```
202
 
203
  ### Server Configuration
 
204
  ```python
205
  ui.launch(server_name="0.0.0.0", server_port=7860)
206
  ```
 
229
  5. Add secrets in Space settings:
230
  - `HF_TOKEN`: Your HuggingFace token
231
  - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
232
+
233
  6. Space will automatically build and deploy
234
 
235
  ---
 
239
  ### Quick Start - Live Demo
240
 
241
  #### 🌐 Try Online
 
242
  Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
243
 
244
  No installation required! Just upload your audio/video or paste a YouTube link.
245
 
246
  #### 1. Launch Application (Local Setup)
 
247
  ```bash
248
  python app.py
249
  ```
 
268
  ### Programmatic Usage
269
 
270
  #### Transcribe Audio
 
271
  ```python
272
  from app import transcription_whisper
273
 
 
280
  ```
281
 
282
  #### Generate Minutes of Meeting
 
283
  ```python
284
  from app import optimize
285
 
 
288
  ```
289
 
290
  #### Translate Transcription
 
291
  ```python
292
  from app import optimize_translate
293
 
 
297
 
298
  ---
299
 
300
+ ## 🏗️ Architecture
301
 
302
  ### Component Overview
303
 
304
  ```
305
+ ┌──────────────────────────────────────────────────────────┐
306
+ │ Gradio Web Interface (UI Layer)
307
+ ├──────────────────────────────────────────────────────────┤
308
+
309
+ ┌────────────────────┐ ┌────────────────┐
310
  │ │ Audio/Video Input │ │ Model Select │ │
311
+ └────────────────────┘ └────────────────┘
312
+
313
+ ┌────────────────────────────────────────────────┐
314
  │ │ Transcription | MOM | Translation Output │ │
315
+ └────────────────────────────────────────────────┘
316
+ ├──────────────────────────────────────────────────────────┤
317
  │ Multi-Module Processing Layer │
318
+ ├─────────────────┬──────────────────┬────────���─────────┤
319
+ │ │ │
320
  │ Transcription │ MOM Generation │ Translation │
321
  │ Module │ Module │ Module │
322
+ ─────────── ────────────── │ ──────────── │
323
  │ • Download │ • System Prompt │ • Language │
324
  │ • Convert │ • User Prompt │ Validation │
325
  │ • Transcribe │ • Generation │ • Extraction │
326
+ │ │ • Translation │
327
+ ├─────────────────┴──────────────────┴──────────────────┤
328
  │ LLM Integration Layer │
329
+ ├─────────────────────────────────────────────────────────┤
330
  │ │
331
  │ LLAMA | PHI | QWEN | DEEPSEEK | Gemma │
332
  │ (with 4-bit Quantization & GPU Acceleration) │
333
  │ │
334
+ └─────────────────────────────────────────────────────────┘
335
  ```
336
 
337
  ### Key Functions
 
351
 
352
  ---
353
 
354
+ ## 🐛 Troubleshooting
355
 
356
  ### Issue: YouTube download fails
 
357
  **Solution**: Update YouTube cookies or use direct file upload
 
358
  ```bash
359
  export YOUTUBE_COOKIES="your_updated_cookies"
360
  # or use direct file upload instead
361
  ```
362
 
363
  ### Issue: CUDA out of memory
 
364
  **Solution**: Reduce model size or use CPU inference
 
365
  ```python
366
  device = "cpu" # Force CPU usage
367
  ```
368
 
369
  ### Issue: HuggingFace authentication failed
 
370
  **Solution**: Verify HF_TOKEN in .env file
 
371
  ```bash
372
  huggingface-cli login # Interactive login
373
  ```
374
 
375
  ### Issue: Transcription is slow
 
376
  **Solution**: Ensure CUDA is properly configured
 
377
  ```python
378
  device = "cuda" if torch.cuda.is_available() else "cpu"
379
  print(f"Using device: {device}")
380
  ```
381
 
382
  ### Issue: Language validation fails
 
383
  **Solution**: Use full language name or ISO code
 
384
  ```python
385
  # Valid formats:
386
  valid_language("English") # Full name
 
389
  ```
390
 
391
  ### Issue: Memory issues with large files
 
392
  **Solution**: Reduce chunk size or break audio into segments
 
393
  ```python
394
  # Process smaller chunks
395
  segment_duration = 300 # 5 minutes per segment
396
  ```
397
 
398
  ### Issue: Generated MOM missing action items
 
399
  **Solution**: Try different model or update system prompt
400
  - Claude models typically produce better structured output
401
  - QWEN is faster and generally reliable
 
425
  ## 🎓 Citation
426
 
427
  If you use SmartScribe in your project, please cite:
 
428
  ```bibtex
429
  @software{smartscribe2025,
430
  author = {Asutosha Nanda},
 
436
 
437
  ---
438
 
439
+ <div align="center">
440
+
441
+ **[⬆ Back to Top](#-smartscribe)**
442
 
443
  **Intelligent Audio Transcription & Meeting Documentation**
444
  Powered by Advanced LLMs and Faster-Whisper
445
 
446
+ Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
447
+
448
+ </div>