Spaces:

itsasutosha
/

SmartScribe

Sleeping

App Files Files Community

itsasutosha commited on 30 days ago

Commit

df75cd6

verified ·

1 Parent(s): 463e0e4

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -78

README.md CHANGED Viewed

@@ -2,17 +2,15 @@
 title: SmartScribe
 emoji: 🎙️
 colorFrom: blue
-colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
-<div align="center">
 # SmartScribe
 [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
 [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
 [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
@@ -22,63 +20,32 @@ pinned: false
 **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
-</div>
-<div align="center">
-<h2>📋 Table of Contents</h2>
-<table>
-  <tr>
-    <td><a href="#features">✨ Features</a></td>
-    <td><a href="#supported-models">🤖 Supported Models</a></td>
-    <td><a href="#requirements">📦 Requirements</a></td>
-    <td><a href="#installation">🔧 Installation</a></td>
-  </tr>
-  <tr>
-    <td><a href="#configuration">⚙️ Configuration</a></td>
-    <td><a href="#usage">🎮 Usage</a></td>
-    <td><a href="#architecture">🏗️ Architecture</a></td>
-    <td><a href="#troubleshooting">🐛 Troubleshooting</a></td>
-  </tr>
-  </tr>
-</table>
-</div>
 ---
 ## ✨ Features
-<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
-<div>
 ### 🎙️ Audio/Video Transcription
 - Convert YouTube links or local audio/video files to text
 - Support for multiple audio formats (MP3, WAV, M4A, etc.)
 - GPU-accelerated transcription using Faster-Whisper
 - Timestamped transcription output
-### 🌍 Multi-Language Translation
-- Translate transcriptions into any supported language
-- Language validation using pycountry
-- Clean, paragraph-formatted output
-- Preserves original meaning and tone
-### 🖥️ Interactive Web UI
-- Beautiful Gradio interface
-- Drag-and-drop file upload
-- YouTube link support
-- Side-by-side input and output panels
-- Model selection dropdown
-- Real-time streaming responses
-</div>
-<div>
 ### 📝 Minutes of Meeting Generation
 - Automatically generate structured MOM documents
 - Professional summary with participants and date
@@ -87,6 +54,12 @@ pinned: false
 - Actionable items with clear ownership and deadlines
 - Markdown-formatted output
 ### 🤖 Multi-Model Support
 - LLAMA 3.2 3B Instruct
 - PHI 4 Mini Instruct
@@ -94,17 +67,20 @@ pinned: false
 - DeepSeek R1 Distill Qwen 1.5B
 - Google Gemma 3 4B IT
 ### ⚡ Performance Optimization
 - 4-bit quantization for efficient inference
 - GPU acceleration support
 - Memory-efficient model loading
 - Garbage collection and cache clearing
-</div>
-</div>
 ---
 ## 🤖 Supported Models
@@ -128,6 +104,7 @@ pinned: false
 - **FFmpeg** for audio processing
 ### Python Dependencies
 ```
 gradio>=4.0.0
 torch>=2.0.0
@@ -146,6 +123,7 @@ huggingface-hub>=0.16.0
 ## 🔧 Local Installation
 ### 1. Create Virtual Environment
 ```bash
 python -m venv venv
 source venv/bin/activate  # On macOS/Linux
@@ -154,12 +132,15 @@ venv\Scripts\activate  # On Windows
 ```
 ### 2. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 3. Setup HuggingFace Token
 Create a `.env` file in the project root:
 ```env
 HF_TOKEN=your_huggingface_token_here
 ```
@@ -167,7 +148,9 @@ HF_TOKEN=your_huggingface_token_here
 Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
 ### 4. Setup YouTube Cookies (Optional)
 For YouTube link support, set environment variable or create `cookies.txt`:
 ```bash
 export YOUTUBE_COOKIES="your_cookies_content"
 ```
@@ -179,7 +162,9 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
 ## ⚙️ Configuration
 ### Model Selection
 Edit model paths in `app.py`:
 ```python
 LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
 QWEN = "Qwen/Qwen3-4B-Instruct-2507"
@@ -189,6 +174,7 @@ Gemma = 'google/gemma-3-4b-it'
 ```
 ### Quantization Configuration
 ```python
 quant_config = BitsAndBytesConfig(
     load_in_4bit=True,
@@ -199,6 +185,7 @@ quant_config = BitsAndBytesConfig(
 ```
 ### Server Configuration
 ```python
 ui.launch(server_name="0.0.0.0", server_port=7860)
 ```
@@ -227,7 +214,6 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
 5. Add secrets in Space settings:
    - `HF_TOKEN`: Your HuggingFace token
    - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 6. Space will automatically build and deploy
 ---
@@ -237,11 +223,13 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
 ### Quick Start - Live Demo
 #### 🌐 Try Online
 Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
 No installation required! Just upload your audio/video or paste a YouTube link.
 #### 1. Launch Application (Local Setup)
 ```bash
 python app.py
 ```
@@ -266,6 +254,7 @@ The application will start at `http://0.0.0.0:7860`
 ### Programmatic Usage
 #### Transcribe Audio
 ```python
 from app import transcription_whisper
@@ -278,6 +267,7 @@ for seg in segments:
 ```
 #### Generate Minutes of Meeting
 ```python
 from app import optimize
@@ -286,6 +276,7 @@ for chunk in optimize("LLAMA", "audio.mp3"):
 ```
 #### Translate Transcription
 ```python
 from app import optimize_translate
@@ -295,41 +286,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
 ---
-## 🏗️ Architecture
 ### Component Overview
 ```
-┌──────────────────────────────────────────────────────────┐
-│            Gradio Web Interface (UI Layer)               │
-├──────────────────────────────────────────────────────────┤
-│                                                            │
-│  ┌────────────────────┐  ┌────────────────┐              │
 │  │ Audio/Video Input  │  │  Model Select  │              │
-│  └────────────────────┘  └────────────────┘              │
-│                                                            │
-│  ┌────────────────────────────────────────────────┐      │
 │  │  Transcription | MOM | Translation Output     │      │
-│  └────────────────────────────────────────────────┘      │
-├──────────────────────────────────────────────────────────┤
 │          Multi-Module Processing Layer                   │
-├─────────────────┬──────────────────┬──────────────────┤
-│                 │                  │                  │
 │  Transcription  │  MOM Generation  │   Translation   │
 │  Module         │  Module          │   Module        │
-│  ───────────    │  ──────────────  │   ────────────  │
 │  • Download     │  • System Prompt │  • Language     │
 │  • Convert      │  • User Prompt   │    Validation   │
 │  • Transcribe   │  • Generation    │  • Extraction   │
-│                 │                  │  • Translation  │
-├─────────────────┴──────────────────┴──────────────────┤
 │              LLM Integration Layer                      │
-├─────────────────────────────────────────────────────────┤
 │                                                          │
 │  LLAMA | PHI | QWEN | DEEPSEEK | Gemma                │
 │  (with 4-bit Quantization & GPU Acceleration)          │
 │                                                          │
-└─────────────────────────────────────────────────────────┘
 ```
 ### Key Functions
@@ -349,36 +340,46 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
 ---
-## 🐛 Troubleshooting
 ### Issue: YouTube download fails
 **Solution**: Update YouTube cookies or use direct file upload
 ```bash
 export YOUTUBE_COOKIES="your_updated_cookies"
 # or use direct file upload instead
 ```
 ### Issue: CUDA out of memory
 **Solution**: Reduce model size or use CPU inference
 ```python
 device = "cpu"  # Force CPU usage
 ```
 ### Issue: HuggingFace authentication failed
 **Solution**: Verify HF_TOKEN in .env file
 ```bash
 huggingface-cli login  # Interactive login
 ```
 ### Issue: Transcription is slow
 **Solution**: Ensure CUDA is properly configured
 ```python
 device = "cuda" if torch.cuda.is_available() else "cpu"
 print(f"Using device: {device}")
 ```
 ### Issue: Language validation fails
 **Solution**: Use full language name or ISO code
 ```python
 # Valid formats:
 valid_language("English")  # Full name
@@ -387,13 +388,16 @@ valid_language("eng")      # ISO 639-3 code
 ```
 ### Issue: Memory issues with large files
 **Solution**: Reduce chunk size or break audio into segments
 ```python
 # Process smaller chunks
 segment_duration = 300  # 5 minutes per segment
 ```
 ### Issue: Generated MOM missing action items
 **Solution**: Try different model or update system prompt
 - Claude models typically produce better structured output
 - QWEN is faster and generally reliable
@@ -423,6 +427,7 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
 ## 🎓 Citation
 If you use SmartScribe in your project, please cite:
 ```bibtex
 @software{smartscribe2025,
   author = {Asutosha Nanda},
@@ -434,13 +439,9 @@ If you use SmartScribe in your project, please cite:
 ---
-<div align="center">
-**[⬆ Back to Top](#-smartscribe)**
 **Intelligent Audio Transcription & Meeting Documentation**
 Powered by Advanced LLMs and Faster-Whisper
-Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
-</div>

 title: SmartScribe
 emoji: 🎙️
 colorFrom: blue
+colorTo: indigo
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
 # SmartScribe
 [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
 [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
 [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
 **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
+---
+## 📋 Table of Contents
+- [Features](#-features)
+- [Supported Models](#-supported-models)
+- [Requirements](#-requirements)
+- [Installation](#-local-installation)
+- [Configuration](#-configuration)
+- [Deployment](#-deployment)
+- [Usage](#-usage)
+- [Architecture](#-architecture)
+- [Troubleshooting](#-troubleshooting)
+- [File Structure](#-file-structure)
+- [License](#-license)
 ---
 ## ✨ Features
 ### 🎙️ Audio/Video Transcription
 - Convert YouTube links or local audio/video files to text
 - Support for multiple audio formats (MP3, WAV, M4A, etc.)
 - GPU-accelerated transcription using Faster-Whisper
 - Timestamped transcription output
 ### 📝 Minutes of Meeting Generation
 - Automatically generate structured MOM documents
 - Professional summary with participants and date
 - Actionable items with clear ownership and deadlines
 - Markdown-formatted output
+### 🌍 Multi-Language Translation
+- Translate transcriptions into any supported language
+- Language validation using pycountry
+- Clean, paragraph-formatted output
+- Preserves original meaning and tone
 ### 🤖 Multi-Model Support
 - LLAMA 3.2 3B Instruct
 - PHI 4 Mini Instruct
 - DeepSeek R1 Distill Qwen 1.5B
 - Google Gemma 3 4B IT
+### 🖥️ Interactive Web UI
+- Beautiful Gradio interface
+- Drag-and-drop file upload
+- YouTube link support
+- Side-by-side input and output panels
+- Model selection dropdown
+- Real-time streaming responses
 ### ⚡ Performance Optimization
 - 4-bit quantization for efficient inference
 - GPU acceleration support
 - Memory-efficient model loading
 - Garbage collection and cache clearing
 ---
 ## 🤖 Supported Models
 - **FFmpeg** for audio processing
 ### Python Dependencies
 ```
 gradio>=4.0.0
 torch>=2.0.0
 ## 🔧 Local Installation
 ### 1. Create Virtual Environment
 ```bash
 python -m venv venv
 source venv/bin/activate  # On macOS/Linux
 ```
 ### 2. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 3. Setup HuggingFace Token
 Create a `.env` file in the project root:
 ```env
 HF_TOKEN=your_huggingface_token_here
 ```
 Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
 ### 4. Setup YouTube Cookies (Optional)
 For YouTube link support, set environment variable or create `cookies.txt`:
 ```bash
 export YOUTUBE_COOKIES="your_cookies_content"
 ```
 ## ⚙️ Configuration
 ### Model Selection
 Edit model paths in `app.py`:
 ```python
 LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
 QWEN = "Qwen/Qwen3-4B-Instruct-2507"
 ```
 ### Quantization Configuration
 ```python
 quant_config = BitsAndBytesConfig(
     load_in_4bit=True,
 ```
 ### Server Configuration
 ```python
 ui.launch(server_name="0.0.0.0", server_port=7860)
 ```
 5. Add secrets in Space settings:
    - `HF_TOKEN`: Your HuggingFace token
    - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 6. Space will automatically build and deploy
 ---
 ### Quick Start - Live Demo
 #### 🌐 Try Online
 Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
 No installation required! Just upload your audio/video or paste a YouTube link.
 #### 1. Launch Application (Local Setup)
 ```bash
 python app.py
 ```
 ### Programmatic Usage
 #### Transcribe Audio
 ```python
 from app import transcription_whisper
 ```
 #### Generate Minutes of Meeting
 ```python
 from app import optimize
 ```
 #### Translate Transcription
 ```python
 from app import optimize_translate
 ---
+## 🗺️ Architecture
 ### Component Overview
 ```
+┌──────────────────────────────────────────────────────────────────┐
+│            Gradio Web Interface (UI Layer)                      │
+├──────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────────────  ┌──────────────────────              │
 │  │ Audio/Video Input  │  │  Model Select  │              │
+│  └──────────────────────  └──────────────────────              │
+│                                                                  │
+│  ┌────────────────────────────────────────────────────┐      │
 │  │  Transcription | MOM | Translation Output     │      │
+│  └────────────────────────────────────────────────────┘      │
+├──────────────────────────────────────────────────────────────────┤
 │          Multi-Module Processing Layer                   │
+├──────────────────────┬─────────────────────┬──────────────────────┤
+│                  │                  │                  │
 │  Transcription  │  MOM Generation  │   Translation   │
 │  Module         │  Module          │   Module        │
+│  ──────────────  │  ────────────────  │   ────────────  │
 │  • Download     │  • System Prompt │  • Language     │
 │  • Convert      │  • User Prompt   │    Validation   │
 │  • Transcribe   │  • Generation    │  • Extraction   │
+│                  │                  │  • Translation  │
+├──────────────────┴─────────────────┴──────────────────────┤
 │              LLM Integration Layer                      │
+├──────────────────────────────────────────────────────────┤
 │                                                          │
 │  LLAMA | PHI | QWEN | DEEPSEEK | Gemma                │
 │  (with 4-bit Quantization & GPU Acceleration)          │
 │                                                          │
+└──────────────────────────────────────────────────────────┘
 ```
 ### Key Functions
 ---
+## 🛠 Troubleshooting
 ### Issue: YouTube download fails
 **Solution**: Update YouTube cookies or use direct file upload
 ```bash
 export YOUTUBE_COOKIES="your_updated_cookies"
 # or use direct file upload instead
 ```
 ### Issue: CUDA out of memory
 **Solution**: Reduce model size or use CPU inference
 ```python
 device = "cpu"  # Force CPU usage
 ```
 ### Issue: HuggingFace authentication failed
 **Solution**: Verify HF_TOKEN in .env file
 ```bash
 huggingface-cli login  # Interactive login
 ```
 ### Issue: Transcription is slow
 **Solution**: Ensure CUDA is properly configured
 ```python
 device = "cuda" if torch.cuda.is_available() else "cpu"
 print(f"Using device: {device}")
 ```
 ### Issue: Language validation fails
 **Solution**: Use full language name or ISO code
 ```python
 # Valid formats:
 valid_language("English")  # Full name
 ```
 ### Issue: Memory issues with large files
 **Solution**: Reduce chunk size or break audio into segments
 ```python
 # Process smaller chunks
 segment_duration = 300  # 5 minutes per segment
 ```
 ### Issue: Generated MOM missing action items
 **Solution**: Try different model or update system prompt
 - Claude models typically produce better structured output
 - QWEN is faster and generally reliable
 ## 🎓 Citation
 If you use SmartScribe in your project, please cite:
 ```bibtex
 @software{smartscribe2025,
   author = {Asutosha Nanda},
 ---
+**[↑ Back to Top](#smartscribe)**
 **Intelligent Audio Transcription & Meeting Documentation**
 Powered by Advanced LLMs and Faster-Whisper
+Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)