Spaces:

itsasutosha
/

SmartScribe

Sleeping

App Files Files Community

itsasutosha commited on 27 days ago

Commit

ec90699

verified ·

1 Parent(s): df75cd6

Update README.md

Browse files

Files changed (1) hide show

README.md +89 -88

README.md CHANGED Viewed

@@ -1,16 +1,20 @@
----
-title: SmartScribe
-emoji: 🎙️
-colorFrom: blue
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
----
 # SmartScribe
 [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
 [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
 [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
@@ -20,32 +24,63 @@ pinned: false
 **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
----
-## 📋 Table of Contents
-- [Features](#-features)
-- [Supported Models](#-supported-models)
-- [Requirements](#-requirements)
-- [Installation](#-local-installation)
-- [Configuration](#-configuration)
-- [Deployment](#-deployment)
-- [Usage](#-usage)
-- [Architecture](#-architecture)
-- [Troubleshooting](#-troubleshooting)
-- [File Structure](#-file-structure)
-- [License](#-license)
 ---
 ## ✨ Features
 ### 🎙️ Audio/Video Transcription
 - Convert YouTube links or local audio/video files to text
 - Support for multiple audio formats (MP3, WAV, M4A, etc.)
 - GPU-accelerated transcription using Faster-Whisper
 - Timestamped transcription output
 ### 📝 Minutes of Meeting Generation
 - Automatically generate structured MOM documents
 - Professional summary with participants and date
@@ -54,12 +89,6 @@ pinned: false
 - Actionable items with clear ownership and deadlines
 - Markdown-formatted output
-### 🌍 Multi-Language Translation
-- Translate transcriptions into any supported language
-- Language validation using pycountry
-- Clean, paragraph-formatted output
-- Preserves original meaning and tone
 ### 🤖 Multi-Model Support
 - LLAMA 3.2 3B Instruct
 - PHI 4 Mini Instruct
@@ -67,20 +96,17 @@ pinned: false
 - DeepSeek R1 Distill Qwen 1.5B
 - Google Gemma 3 4B IT
-### 🖥️ Interactive Web UI
-- Beautiful Gradio interface
-- Drag-and-drop file upload
-- YouTube link support
-- Side-by-side input and output panels
-- Model selection dropdown
-- Real-time streaming responses
 ### ⚡ Performance Optimization
 - 4-bit quantization for efficient inference
 - GPU acceleration support
 - Memory-efficient model loading
 - Garbage collection and cache clearing
 ---
 ## 🤖 Supported Models
@@ -104,7 +130,6 @@ pinned: false
 - **FFmpeg** for audio processing
 ### Python Dependencies
 ```
 gradio>=4.0.0
 torch>=2.0.0
@@ -123,7 +148,6 @@ huggingface-hub>=0.16.0
 ## 🔧 Local Installation
 ### 1. Create Virtual Environment
 ```bash
 python -m venv venv
 source venv/bin/activate  # On macOS/Linux
@@ -132,15 +156,12 @@ venv\Scripts\activate  # On Windows
 ```
 ### 2. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 3. Setup HuggingFace Token
 Create a `.env` file in the project root:
 ```env
 HF_TOKEN=your_huggingface_token_here
 ```
@@ -148,9 +169,7 @@ HF_TOKEN=your_huggingface_token_here
 Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
 ### 4. Setup YouTube Cookies (Optional)
 For YouTube link support, set environment variable or create `cookies.txt`:
 ```bash
 export YOUTUBE_COOKIES="your_cookies_content"
 ```
@@ -162,9 +181,7 @@ Or create `cookies.txt` with Netscape HTTP Cookie format.
 ## ⚙️ Configuration
 ### Model Selection
 Edit model paths in `app.py`:
 ```python
 LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
 QWEN = "Qwen/Qwen3-4B-Instruct-2507"
@@ -174,7 +191,6 @@ Gemma = 'google/gemma-3-4b-it'
 ```
 ### Quantization Configuration
 ```python
 quant_config = BitsAndBytesConfig(
     load_in_4bit=True,
@@ -185,7 +201,6 @@ quant_config = BitsAndBytesConfig(
 ```
 ### Server Configuration
 ```python
 ui.launch(server_name="0.0.0.0", server_port=7860)
 ```
@@ -214,6 +229,7 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
 5. Add secrets in Space settings:
    - `HF_TOKEN`: Your HuggingFace token
    - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 6. Space will automatically build and deploy
 ---
@@ -223,13 +239,11 @@ SmartScribe is deployed and available at: [https://huggingface.co/spaces/itsasut
 ### Quick Start - Live Demo
 #### 🌐 Try Online
 Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
 No installation required! Just upload your audio/video or paste a YouTube link.
 #### 1. Launch Application (Local Setup)
 ```bash
 python app.py
 ```
@@ -254,7 +268,6 @@ The application will start at `http://0.0.0.0:7860`
 ### Programmatic Usage
 #### Transcribe Audio
 ```python
 from app import transcription_whisper
@@ -267,7 +280,6 @@ for seg in segments:
 ```
 #### Generate Minutes of Meeting
 ```python
 from app import optimize
@@ -276,7 +288,6 @@ for chunk in optimize("LLAMA", "audio.mp3"):
 ```
 #### Translate Transcription
 ```python
 from app import optimize_translate
@@ -286,41 +297,41 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
 ---
-## 🗺️ Architecture
 ### Component Overview
 ```
-┌──────────────────────────────────────────────────────────────────┐
-│            Gradio Web Interface (UI Layer)                      │
-├──────────────────────────────────────────────────────────────────┤
-│                                                                  │
-│  ┌──────────────────────  ┌──────────────────────              │
 │  │ Audio/Video Input  │  │  Model Select  │              │
-│  └──────────────────────  └──────────────────────              │
-│                                                                  │
-│  ┌────────────────────────────────────────────────────┐      │
 │  │  Transcription | MOM | Translation Output     │      │
-│  └────────────────────────────────────────────────────┘      │
-├──────────────────────────────────────────────────────────────────┤
 │          Multi-Module Processing Layer                   │
-├──────────────────────┬─────────────────────┬──────────────────────┤
-│                  │                  │                  │
 │  Transcription  │  MOM Generation  │   Translation   │
 │  Module         │  Module          │   Module        │
-│  ──────────────  │  ────────────────  │   ────────────  │
 │  • Download     │  • System Prompt │  • Language     │
 │  • Convert      │  • User Prompt   │    Validation   │
 │  • Transcribe   │  • Generation    │  • Extraction   │
-│                  │                  │  • Translation  │
-├──────────────────┴─────────────────┴──────────────────────┤
 │              LLM Integration Layer                      │
-├──────────────────────────────────────────────────────────┤
 │                                                          │
 │  LLAMA | PHI | QWEN | DEEPSEEK | Gemma                │
 │  (with 4-bit Quantization & GPU Acceleration)          │
 │                                                          │
-└──────────────────────────────────────────────────────────┘
 ```
 ### Key Functions
@@ -340,46 +351,36 @@ for chunk in optimize_translate("LLAMA", "audio.mp3", "Spanish"):
 ---
-## 🛠 Troubleshooting
 ### Issue: YouTube download fails
 **Solution**: Update YouTube cookies or use direct file upload
 ```bash
 export YOUTUBE_COOKIES="your_updated_cookies"
 # or use direct file upload instead
 ```
 ### Issue: CUDA out of memory
 **Solution**: Reduce model size or use CPU inference
 ```python
 device = "cpu"  # Force CPU usage
 ```
 ### Issue: HuggingFace authentication failed
 **Solution**: Verify HF_TOKEN in .env file
 ```bash
 huggingface-cli login  # Interactive login
 ```
 ### Issue: Transcription is slow
 **Solution**: Ensure CUDA is properly configured
 ```python
 device = "cuda" if torch.cuda.is_available() else "cpu"
 print(f"Using device: {device}")
 ```
 ### Issue: Language validation fails
 **Solution**: Use full language name or ISO code
 ```python
 # Valid formats:
 valid_language("English")  # Full name
@@ -388,16 +389,13 @@ valid_language("eng")      # ISO 639-3 code
 ```
 ### Issue: Memory issues with large files
 **Solution**: Reduce chunk size or break audio into segments
 ```python
 # Process smaller chunks
 segment_duration = 300  # 5 minutes per segment
 ```
 ### Issue: Generated MOM missing action items
 **Solution**: Try different model or update system prompt
 - Claude models typically produce better structured output
 - QWEN is faster and generally reliable
@@ -427,7 +425,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
 ## 🎓 Citation
 If you use SmartScribe in your project, please cite:
 ```bibtex
 @software{smartscribe2025,
   author = {Asutosha Nanda},
@@ -439,9 +436,13 @@ If you use SmartScribe in your project, please cite:
 ---
-**[↑ Back to Top](#smartscribe)**
 **Intelligent Audio Transcription & Meeting Documentation**
 Powered by Advanced LLMs and Faster-Whisper
-Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)

+---
+title: SmartScribe
+emoji: 🎙️
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.49.1
+app_file: app.py
+pinned: false
+license: apache-2.0
+short_description: Transcription, Summarization & Translation
+---
+<div align="center">
 # SmartScribe
 [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
 [![Whisper](https://img.shields.io/badge/OpenAI-Whisper-green.svg)](https://openai.com/research/whisper)
 [![Faster-Whisper](https://img.shields.io/badge/Faster--Whisper-Audio-orange.svg)](https://github.com/guillaumekln/faster-whisper)
 **AI-Powered Audio Transcription, Meeting Minutes Generation, and Multi-Language Translation**
+</div>
+<div align="center">
+<h2>📋 Table of Contents</h2>
+<table>
+  <tr>
+    <td><a href="#features">✨ Features</a></td>
+    <td><a href="#supported-models">🤖 Supported Models</a></td>
+    <td><a href="#requirements">📦 Requirements</a></td>
+    <td><a href="#installation">🔧 Installation</a></td>
+  </tr>
+  <tr>
+    <td><a href="#configuration">⚙️ Configuration</a></td>
+    <td><a href="#usage">🎮 Usage</a></td>
+    <td><a href="#architecture">🏗️ Architecture</a></td>
+    <td><a href="#troubleshooting">🐛 Troubleshooting</a></td>
+  </tr>
+  </tr>
+</table>
+</div>
 ---
 ## ✨ Features
+<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px;">
+<div>
 ### 🎙️ Audio/Video Transcription
 - Convert YouTube links or local audio/video files to text
 - Support for multiple audio formats (MP3, WAV, M4A, etc.)
 - GPU-accelerated transcription using Faster-Whisper
 - Timestamped transcription output
+### 🌍 Multi-Language Translation
+- Translate transcriptions into any supported language
+- Language validation using pycountry
+- Clean, paragraph-formatted output
+- Preserves original meaning and tone
+### 🖥️ Interactive Web UI
+- Beautiful Gradio interface
+- Drag-and-drop file upload
+- YouTube link support
+- Side-by-side input and output panels
+- Model selection dropdown
+- Real-time streaming responses
+</div>
+<div>
 ### 📝 Minutes of Meeting Generation
 - Automatically generate structured MOM documents
 - Professional summary with participants and date
 - Actionable items with clear ownership and deadlines
 - Markdown-formatted output
 ### 🤖 Multi-Model Support
 - LLAMA 3.2 3B Instruct
 - PHI 4 Mini Instruct
 - DeepSeek R1 Distill Qwen 1.5B
 - Google Gemma 3 4B IT
 ### ⚡ Performance Optimization
 - 4-bit quantization for efficient inference
 - GPU acceleration support
 - Memory-efficient model loading
 - Garbage collection and cache clearing
+</div>
+</div>
 ---
 ## 🤖 Supported Models
 - **FFmpeg** for audio processing
 ### Python Dependencies
 ```
 gradio>=4.0.0
 torch>=2.0.0
 ## 🔧 Local Installation
 ### 1. Create Virtual Environment
 ```bash
 python -m venv venv
 source venv/bin/activate  # On macOS/Linux
 ```
 ### 2. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 3. Setup HuggingFace Token
 Create a `.env` file in the project root:
 ```env
 HF_TOKEN=your_huggingface_token_here
 ```
 Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
 ### 4. Setup YouTube Cookies (Optional)
 For YouTube link support, set environment variable or create `cookies.txt`:
 ```bash
 export YOUTUBE_COOKIES="your_cookies_content"
 ```
 ## ⚙️ Configuration
 ### Model Selection
 Edit model paths in `app.py`:
 ```python
 LLAMA = "meta-llama/Llama-3.2-3B-Instruct"
 QWEN = "Qwen/Qwen3-4B-Instruct-2507"
 ```
 ### Quantization Configuration
 ```python
 quant_config = BitsAndBytesConfig(
     load_in_4bit=True,
 ```
 ### Server Configuration
 ```python
 ui.launch(server_name="0.0.0.0", server_port=7860)
 ```
 5. Add secrets in Space settings:
    - `HF_TOKEN`: Your HuggingFace token
    - `YOUTUBE_COOKIES`: (Optional) YouTube authentication cookies
 6. Space will automatically build and deploy
 ---
 ### Quick Start - Live Demo
 #### 🌐 Try Online
 Visit the live application at: **[SmartScribe on HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)**
 No installation required! Just upload your audio/video or paste a YouTube link.
 #### 1. Launch Application (Local Setup)
 ```bash
 python app.py
 ```
 ### Programmatic Usage
 #### Transcribe Audio
 ```python
 from app import transcription_whisper
 ```
 #### Generate Minutes of Meeting
 ```python
 from app import optimize
 ```
 #### Translate Transcription
 ```python
 from app import optimize_translate
 ---
+## 🏗️ Architecture
 ### Component Overview
 ```
+┌──────────────────────────────────────────────────────────┐
+│            Gradio Web Interface (UI Layer)               │
+├──────────────────────────────────────────────────────────┤
+│                                                            │
+│  ┌────────────────────┐  ┌────────────────┐              │
 │  │ Audio/Video Input  │  │  Model Select  │              │
+│  └────────────────────┘  └────────────────┘              │
+│                                                            │
+│  ┌────────────────────────────────────────────────┐      │
 │  │  Transcription | MOM | Translation Output     │      │
+│  └────────────────────────────────────────────────┘      │
+├──────────────────────────────────────────────────────────┤
 │          Multi-Module Processing Layer                   │
+├─────────────────┬──────────────────┬────────���─────────┤
+│                 │                  │                  │
 │  Transcription  │  MOM Generation  │   Translation   │
 │  Module         │  Module          │   Module        │
+│  ───────────    │  ──────────────  │   ────────────  │
 │  • Download     │  • System Prompt │  • Language     │
 │  • Convert      │  • User Prompt   │    Validation   │
 │  • Transcribe   │  • Generation    │  • Extraction   │
+│                 │                  │  • Translation  │
+├─────────────────┴──────────────────┴──────────────────┤
 │              LLM Integration Layer                      │
+├─────────────────────────────────────────────────────────┤
 │                                                          │
 │  LLAMA | PHI | QWEN | DEEPSEEK | Gemma                │
 │  (with 4-bit Quantization & GPU Acceleration)          │
 │                                                          │
+└─────────────────────────────────────────────────────────┘
 ```
 ### Key Functions
 ---
+## 🐛 Troubleshooting
 ### Issue: YouTube download fails
 **Solution**: Update YouTube cookies or use direct file upload
 ```bash
 export YOUTUBE_COOKIES="your_updated_cookies"
 # or use direct file upload instead
 ```
 ### Issue: CUDA out of memory
 **Solution**: Reduce model size or use CPU inference
 ```python
 device = "cpu"  # Force CPU usage
 ```
 ### Issue: HuggingFace authentication failed
 **Solution**: Verify HF_TOKEN in .env file
 ```bash
 huggingface-cli login  # Interactive login
 ```
 ### Issue: Transcription is slow
 **Solution**: Ensure CUDA is properly configured
 ```python
 device = "cuda" if torch.cuda.is_available() else "cpu"
 print(f"Using device: {device}")
 ```
 ### Issue: Language validation fails
 **Solution**: Use full language name or ISO code
 ```python
 # Valid formats:
 valid_language("English")  # Full name
 ```
 ### Issue: Memory issues with large files
 **Solution**: Reduce chunk size or break audio into segments
 ```python
 # Process smaller chunks
 segment_duration = 300  # 5 minutes per segment
 ```
 ### Issue: Generated MOM missing action items
 **Solution**: Try different model or update system prompt
 - Claude models typically produce better structured output
 - QWEN is faster and generally reliable
 ## 🎓 Citation
 If you use SmartScribe in your project, please cite:
 ```bibtex
 @software{smartscribe2025,
   author = {Asutosha Nanda},
 ---
+<div align="center">
+**[⬆ Back to Top](#-smartscribe)**
 **Intelligent Audio Transcription & Meeting Documentation**
 Powered by Advanced LLMs and Faster-Whisper
+Deployed on [HuggingFace Spaces](https://huggingface.co/spaces/itsasutosha/SmartScribe)
+</div>