pravaah / README.md
prathameshsutar's picture
Deployment 2
f4d6026
metadata
title: Pravaah - Ocean Hazard Detection System
emoji: 🌊
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: AI-powered system to detect ocean hazards

🌊 Ocean Hazard Detection System

An AI-powered system that analyzes social media posts to detect ocean-related hazards in real-time. This system uses advanced natural language processing to identify hazardous tweets, translate them to English, analyze sentiment, and extract location information.

πŸš€ Features

  • Multilingual Support: Analyzes tweets in 20+ Indian languages including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, and English
  • Hazard Classification: Uses XLM-RoBERTa zero-shot classification to identify ocean hazards
  • Sentiment Analysis: Analyzes emotional context using GoEmotions model
  • Named Entity Recognition: Extracts hazard types and locations from text
  • Real-time Processing: Processes tweets from Indian coastal regions
  • Database Storage: Stores hazardous tweets for tracking and analysis

πŸ” What It Detects

Hazard Types

  • Floods and tsunamis
  • Cyclones and storm surges
  • High tides and waves
  • Coastal flooding and erosion
  • Rip currents and marine debris
  • Water discoloration and algal blooms
  • Marine pollution

Geographic Coverage

  • Major Cities: Mumbai, Chennai, Kolkata, Vizag, Puri
  • States: Odisha, Kerala, Gujarat, Goa, Andhra Pradesh, West Bengal
  • Water Bodies: Bay of Bengal, Arabian Sea

πŸ› οΈ Technical Stack

  • AI Models:
    • DeBERTa-v3 for hazard classification
    • Helsinki-NLP for translation
    • GoEmotions for sentiment analysis
    • DistilBERT NER for location extraction
  • Backend: FastAPI + Gradio
  • Database: PostgreSQL
  • Languages: Python 3.9+

πŸ“Š How It Works

  1. Tweet Collection: Scrapes tweets using Twitter API with hazard and location keywords
  2. Translation: Translates ALL tweets to English for consistent processing (more efficient)
  3. Hazard Classification: Uses zero-shot learning on translated text to classify as hazardous or safe
  4. Sentiment Analysis: Analyzes emotional context (panic, calm, confusion, neutral) for hazardous tweets
  5. Entity Extraction: Identifies specific hazard types and locations from translated text
  6. Database Storage: Stores hazardous tweets with metadata for tracking

πŸš€ Usage

Web Interface (Gradio)

  1. Set Tweet Limit: Choose how many tweets to analyze (1-50)
  2. Click Analyze: The system will process tweets and show results
  3. View Results: See hazardous tweets with sentiment, location, and hazard type
  4. Export Data: Download complete analysis as JSON

API Endpoints (FastAPI)

POST /analyze

Analyze tweets for ocean hazards

# Basic analysis
curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{"limit": 20}'

# Keyword-based search
curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{"limit": 20, "hazard_type": "tsunami", "location": "Mumbai", "days_back": 2}'

# Custom query
curl -X POST "http://localhost:8000/analyze" \
  -H "Content-Type: application/json" \
  -d '{"limit": 20, "query": "flood OR tsunami"}'

GET /hazardous-tweets

Get stored hazardous tweets

curl "http://localhost:8000/hazardous-tweets?limit=50&offset=0"

GET /keywords/hazards

Get available hazard types for keyword search

curl "http://localhost:8000/keywords/hazards"

GET /keywords/locations

Get available locations for keyword search

curl "http://localhost:8000/keywords/locations"

GET /stats

Get analysis statistics

curl "http://localhost:8000/stats"

GET /health

Health check endpoint

curl "http://localhost:8000/health"

API Documentation

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

πŸ”§ Environment Variables

The system requires the following environment variables:

# Twitter API (required)
TWITTER_API_KEY=your_twitter_api_key

# PostgreSQL Database (optional for demo)
PGHOST=localhost
PGPORT=5432
PGDATABASE=postgres
PGUSER=postgres
PGPASSWORD=your_password

πŸ“ˆ Use Cases

  • Emergency Response: Early detection of ocean hazards for rapid response
  • Environmental Monitoring: Track marine pollution and coastal issues
  • Research: Analyze public sentiment about ocean-related events
  • Policy Making: Data-driven insights for coastal management policies

πŸ”¬ Model Details

  • Classification Model: cross-encoder/nli-deberta-v3-base
  • Translation Model: Helsinki-NLP OPUS-MT models
  • Sentiment Model: Google GoEmotions
  • NER: DistilBERT NER with keyword-based fallback

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ž Support

For support, please open an issue in the GitHub repository.


Note: This is a demonstration system. In production, it would process real-time tweets and integrate with emergency response systems.