Spaces:
Sleeping
Sleeping
Evgueni Poloukarov
commited on
Commit
·
a321b61
1
Parent(s):
f7513cb
docs: add comprehensive handover guide and archive test scripts
Browse files- Create HANDOVER_GUIDE.md with full API docs, troubleshooting, Phase 2 roadmap
- Archive test scripts to archive/testing/ (test_api.py, run_smoke_test.py, etc.)
- Add evaluation script to scripts/ directory
- Update CLAUDE.md with branch mapping rule
- Update DEPLOYMENT_NOTES.md with troubleshooting guide
Session 11 deliverables complete:
- D+1 MAE: 15.92 MW (88% better than 134 MW target)
- 38 borders × 14 days forecast successful
- Zero-shot multivariate forecasting production-ready
- CLAUDE.md +4 -2
- DEPLOYMENT_NOTES.md +8 -0
- HANDOVER_GUIDE.md +464 -0
- archive/testing/deploy_memory_fix_ssh.sh +44 -0
- archive/testing/run_smoke_test.py +48 -0
- archive/testing/test_api.py +36 -0
- archive/testing/validate_forecast.py +51 -0
- scripts/evaluate_october_2024.py +275 -0
CLAUDE.md
CHANGED
|
@@ -38,13 +38,15 @@
|
|
| 38 |
30. **CRITICAL: HuggingFace Space Deployment - ALWAYS Push to BOTH Remotes**
|
| 39 |
- This project deploys to BOTH GitHub AND HuggingFace Space
|
| 40 |
- Git remotes: `origin` (GitHub) and `hf-new` (HF Space)
|
|
|
|
| 41 |
- **MANDATORY**: After ANY commit affecting HF Space functionality, push to BOTH:
|
| 42 |
```bash
|
| 43 |
-
git push origin master
|
| 44 |
-
git push hf-new master
|
| 45 |
```
|
| 46 |
- **Why both?** HF Spaces are SEPARATE git repositories - they do NOT auto-sync with GitHub
|
| 47 |
- **Failure mode**: Pushing only to GitHub means HF Space continues running old code indefinitely
|
|
|
|
| 48 |
- **Verification**: After pushing to hf-new, wait 3-5 minutes for Space rebuild, then test
|
| 49 |
- **NEVER** push to hf-new without also pushing to origin first (origin is source of truth)
|
| 50 |
31. ALWAYS use virtual environments for Python projects. NEVER install packages globally. Create virtual environments with clear, project-specific names following the pattern: {project_name}_env (e.g., news_intel_env). Always verify virtual environment is activated before installing packages.
|
|
|
|
| 38 |
30. **CRITICAL: HuggingFace Space Deployment - ALWAYS Push to BOTH Remotes**
|
| 39 |
- This project deploys to BOTH GitHub AND HuggingFace Space
|
| 40 |
- Git remotes: `origin` (GitHub) and `hf-new` (HF Space)
|
| 41 |
+
- **BRANCH MAPPING**: Local uses `master`, HF Space uses `main` - MUST map branches!
|
| 42 |
- **MANDATORY**: After ANY commit affecting HF Space functionality, push to BOTH:
|
| 43 |
```bash
|
| 44 |
+
git push origin master # Push to GitHub (master branch)
|
| 45 |
+
git push hf-new master:main # Push to HF Space (main branch) - NOTE: master:main mapping!
|
| 46 |
```
|
| 47 |
- **Why both?** HF Spaces are SEPARATE git repositories - they do NOT auto-sync with GitHub
|
| 48 |
- **Failure mode**: Pushing only to GitHub means HF Space continues running old code indefinitely
|
| 49 |
+
- **Common mistake**: Pushing `master` to `master` on HF Space - it uses `main` branch!
|
| 50 |
- **Verification**: After pushing to hf-new, wait 3-5 minutes for Space rebuild, then test
|
| 51 |
- **NEVER** push to hf-new without also pushing to origin first (origin is source of truth)
|
| 52 |
31. ALWAYS use virtual environments for Python projects. NEVER install packages globally. Create virtual environments with clear, project-specific names following the pattern: {project_name}_env (e.g., news_intel_env). Always verify virtual environment is activated before installing packages.
|
DEPLOYMENT_NOTES.md
CHANGED
|
@@ -4,6 +4,14 @@
|
|
| 4 |
|
| 5 |
**Problem**: Pushing commits to GitHub doesn't always trigger HF Space rebuild automatically.
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
**Symptoms**:
|
| 8 |
- Code pushed to GitHub successfully
|
| 9 |
- Space shows "RUNNING" status
|
|
|
|
| 4 |
|
| 5 |
**Problem**: Pushing commits to GitHub doesn't always trigger HF Space rebuild automatically.
|
| 6 |
|
| 7 |
+
**CRITICAL**: HF Space uses `main` branch, local repo uses `master` branch!
|
| 8 |
+
|
| 9 |
+
**Correct Push Command**:
|
| 10 |
+
```bash
|
| 11 |
+
git push origin master # Push to GitHub (master branch)
|
| 12 |
+
git push hf-new master:main # Push to HF Space (main branch)
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
**Symptoms**:
|
| 16 |
- Code pushed to GitHub successfully
|
| 17 |
- Space shows "RUNNING" status
|
HANDOVER_GUIDE.md
ADDED
|
@@ -0,0 +1,464 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# FBMC Chronos-2 Zero-Shot Forecasting - Handover Guide
|
| 2 |
+
|
| 3 |
+
**Version**: 1.0.0
|
| 4 |
+
**Date**: 2025-11-18
|
| 5 |
+
**Status**: Production-Ready MVP
|
| 6 |
+
**Maintainer**: Quantitative Analyst
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Executive Summary
|
| 11 |
+
|
| 12 |
+
This project delivers a **zero-shot multivariate forecasting system** for FBMC cross-border electricity flows using Amazon's Chronos-2 model. The system forecasts 38 European borders with **15.92 MW mean D+1 MAE** - 88% better than the 134 MW target.
|
| 13 |
+
|
| 14 |
+
**Key Achievement**: Zero-shot learning (no model training) achieves production-quality accuracy using 615 covariate features.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Quick Start
|
| 19 |
+
|
| 20 |
+
### Running Forecasts via API
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
from gradio_client import Client
|
| 24 |
+
|
| 25 |
+
# Connect to HuggingFace Space
|
| 26 |
+
client = Client("evgueni-p/fbmc-chronos2")
|
| 27 |
+
|
| 28 |
+
# Run forecast
|
| 29 |
+
result_file = client.predict(
|
| 30 |
+
run_date="2024-09-30", # YYYY-MM-DD format
|
| 31 |
+
forecast_type="full_14day", # or "smoke_test"
|
| 32 |
+
api_name="/forecast"
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
# Load results
|
| 36 |
+
import polars as pl
|
| 37 |
+
forecast = pl.read_parquet(result_file)
|
| 38 |
+
print(forecast.head())
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
**Forecast Types**:
|
| 42 |
+
- `smoke_test`: Quick validation (1 border × 7 days, ~30 seconds)
|
| 43 |
+
- `full_14day`: Production forecast (38 borders × 14 days, ~4 minutes)
|
| 44 |
+
|
| 45 |
+
### Output Format
|
| 46 |
+
|
| 47 |
+
Parquet file with columns:
|
| 48 |
+
- `timestamp`: Hourly timestamps (D+1 to D+7 or D+14)
|
| 49 |
+
- `{border}_median`: Median forecast (MW)
|
| 50 |
+
- `{border}_q10`: 10th percentile uncertainty bound (MW)
|
| 51 |
+
- `{border}_q90`: 90th percentile uncertainty bound (MW)
|
| 52 |
+
|
| 53 |
+
**Example**:
|
| 54 |
+
```
|
| 55 |
+
shape: (336, 115)
|
| 56 |
+
┌─────────────────────┬──────────────┬───────────┬───────────┐
|
| 57 |
+
│ timestamp ┆ AT_CZ_median ┆ AT_CZ_q10 ┆ AT_CZ_q90 │
|
| 58 |
+
├─────────────────────┼──────────────┼───────────┼───────────┤
|
| 59 |
+
│ 2024-10-01 01:00:00 ┆ 287.0 ┆ 154.0 ┆ 334.0 │
|
| 60 |
+
│ 2024-10-01 02:00:00 ┆ 290.0 ┆ 157.0 ┆ 337.0 │
|
| 61 |
+
└─────────────────────┴──────────────┴───────────┴───────────┘
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## System Architecture
|
| 67 |
+
|
| 68 |
+
### Components
|
| 69 |
+
|
| 70 |
+
```
|
| 71 |
+
┌─────────────────────┐
|
| 72 |
+
│ HuggingFace Space │ GPU: A100-large (40-80 GB VRAM)
|
| 73 |
+
│ (Gradio API) │ Cost: ~$500/month
|
| 74 |
+
└──────────┬──────────┘
|
| 75 |
+
│
|
| 76 |
+
▼
|
| 77 |
+
┌─────────────────────┐
|
| 78 |
+
│ Chronos-2 Pipeline │ Model: amazon/chronos-2 (710M params)
|
| 79 |
+
│ (Zero-Shot) │ Precision: bfloat16
|
| 80 |
+
└──────────┬──────────┘
|
| 81 |
+
│
|
| 82 |
+
▼
|
| 83 |
+
┌─────────────────────┐
|
| 84 |
+
│ Feature Dataset │ Storage: HuggingFace Datasets
|
| 85 |
+
│ (615 covariates) │ Size: ~25 MB (24 months hourly)
|
| 86 |
+
└─────────────────────┘
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### Multivariate Features (615 total)
|
| 90 |
+
|
| 91 |
+
1. **Weather (520 features)**: Temperature, wind speed across 52 grid points × 10 vars
|
| 92 |
+
2. **Generation (52 features)**: Solar, wind, hydro, nuclear per zone
|
| 93 |
+
3. **CNEC Outages (34 features)**: Critical Network Element & Contingency availability
|
| 94 |
+
4. **Market (9 features)**: Day-ahead prices, LTA allocations
|
| 95 |
+
|
| 96 |
+
### Data Flow
|
| 97 |
+
|
| 98 |
+
1. User calls API with `run_date`
|
| 99 |
+
2. System extracts **128-hour context** window (historical data up to run_date 23:00)
|
| 100 |
+
3. Chronos-2 forecasts **336 hours ahead** (14 days) using 615 future covariates
|
| 101 |
+
4. Returns probabilistic forecasts (3 quantiles: 0.1, 0.5, 0.9)
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## Performance Metrics
|
| 106 |
+
|
| 107 |
+
### October 2024 Evaluation Results
|
| 108 |
+
|
| 109 |
+
| Metric | Value | Target | Achievement |
|
| 110 |
+
|--------|-------|--------|-------------|
|
| 111 |
+
| **D+1 MAE (Mean)** | **15.92 MW** | ≤134 MW | ✅ **88% better** |
|
| 112 |
+
| D+1 MAE (Median) | 0.00 MW | - | ✅ Excellent |
|
| 113 |
+
| Borders ≤150 MW | 36/38 (94.7%) | - | ✅ Very good |
|
| 114 |
+
| Forecast time | 3.56 min | <5 min | ✅ Fast |
|
| 115 |
+
|
| 116 |
+
### MAE Degradation Over Forecast Horizon
|
| 117 |
+
|
| 118 |
+
```
|
| 119 |
+
D+1: 15.92 MW (baseline)
|
| 120 |
+
D+2: 17.13 MW (+7.6%)
|
| 121 |
+
D+7: 28.98 MW (+82%)
|
| 122 |
+
D+14: 30.32 MW (+90%)
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
**Interpretation**: Forecast accuracy degrades gracefully. Even at D+14, errors remain reasonable.
|
| 126 |
+
|
| 127 |
+
### Border-Level Performance
|
| 128 |
+
|
| 129 |
+
**Best Performers** (D+1 MAE = 0.0 MW):
|
| 130 |
+
- AT_CZ, AT_HU, AT_SI, BE_DE, CZ_DE (perfect forecasts!)
|
| 131 |
+
- 15 additional borders with <1 MW error
|
| 132 |
+
|
| 133 |
+
**Outliers** (Require Phase 2 attention):
|
| 134 |
+
- **AT_DE**: 266 MW (bidirectional flow complexity)
|
| 135 |
+
- **FR_DE**: 181 MW (high volatility, large capacity)
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
## Infrastructure & Costs
|
| 140 |
+
|
| 141 |
+
### HuggingFace Space
|
| 142 |
+
|
| 143 |
+
- **URL**: https://huggingface.co/spaces/evgueni-p/fbmc-chronos2
|
| 144 |
+
- **GPU**: A100-large (40-80 GB VRAM)
|
| 145 |
+
- **Cost**: ~$500/month (estimated)
|
| 146 |
+
- **Uptime**: 24/7 auto-restart on errors
|
| 147 |
+
|
| 148 |
+
### Why A100 GPU?
|
| 149 |
+
|
| 150 |
+
The multivariate model with 615 features requires:
|
| 151 |
+
- Baseline memory: 18 GB (model + dataset + PyTorch cache)
|
| 152 |
+
- Attention computation: 11 GB per border
|
| 153 |
+
- **Total**: ~29 GB → L4 (22 GB) insufficient, A100 (40 GB) comfortable
|
| 154 |
+
|
| 155 |
+
**Memory Optimizations Applied**:
|
| 156 |
+
- `batch_size=32` (from default 256) → 87% memory reduction
|
| 157 |
+
- `quantile_levels=[0.1, 0.5, 0.9]` (from 9) → 67% reduction
|
| 158 |
+
- `context_hours=128` (from 512) → 50% reduction
|
| 159 |
+
- `torch.inference_mode()` → disables gradient tracking
|
| 160 |
+
|
| 161 |
+
### Dataset Storage
|
| 162 |
+
|
| 163 |
+
- **Location**: HuggingFace Datasets (`evgueni-p/fbmc-features-24month`)
|
| 164 |
+
- **Size**: 25 MB (17,544 hours × 2,514 features)
|
| 165 |
+
- **Access**: Public read, authenticated write
|
| 166 |
+
- **Update Frequency**: Monthly (recommended)
|
| 167 |
+
|
| 168 |
+
---
|
| 169 |
+
|
| 170 |
+
## Known Limitations & Phase 2 Roadmap
|
| 171 |
+
|
| 172 |
+
### Current Limitations
|
| 173 |
+
|
| 174 |
+
1. **Zero-shot only**: No model fine-tuning (deliberate MVP scope)
|
| 175 |
+
2. **Two outlier borders**: AT_DE (266 MW), FR_DE (181 MW) exceed targets
|
| 176 |
+
3. **Fixed context window**: 128 hours (reduced from 256h for memory)
|
| 177 |
+
4. **No real-time updates**: Forecast runs are on-demand via API
|
| 178 |
+
5. **No automated retraining**: Model parameters are frozen
|
| 179 |
+
|
| 180 |
+
### Phase 2 Recommendations
|
| 181 |
+
|
| 182 |
+
#### Priority 1: Fine-Tuning for Outlier Borders
|
| 183 |
+
- **Objective**: Reduce AT_DE and FR_DE MAE below 150 MW
|
| 184 |
+
- **Approach**: LoRA (Low-Rank Adaptation) fine-tuning on 6 months of border-specific data
|
| 185 |
+
- **Expected Improvement**: 40-60% MAE reduction for outliers
|
| 186 |
+
- **Timeline**: 2-3 weeks
|
| 187 |
+
|
| 188 |
+
#### Priority 2: Extend Context Window
|
| 189 |
+
- **Objective**: Increase from 128h to 512h for better pattern learning
|
| 190 |
+
- **Requires**: Code change + verify no OOM on A100
|
| 191 |
+
- **Expected Improvement**: 10-15% overall MAE reduction
|
| 192 |
+
- **Timeline**: 1 week
|
| 193 |
+
|
| 194 |
+
#### Priority 3: Feature Engineering Enhancements
|
| 195 |
+
- **Add**: Scheduled outages, cross-border ramping constraints
|
| 196 |
+
- **Refine**: CNEC weighting based on binding frequency
|
| 197 |
+
- **Expected Improvement**: 5-10% MAE reduction
|
| 198 |
+
- **Timeline**: 2 weeks
|
| 199 |
+
|
| 200 |
+
#### Priority 4: Automated Daily Forecasting
|
| 201 |
+
- **Objective**: Scheduled daily runs at 23:00 CET
|
| 202 |
+
- **Approach**: GitHub Actions + HF Space API
|
| 203 |
+
- **Storage**: Results in HF Datasets or S3
|
| 204 |
+
- **Timeline**: 1 week
|
| 205 |
+
|
| 206 |
+
#### Priority 5: Probabilistic Calibration
|
| 207 |
+
- **Objective**: Ensure 80% of actuals fall within [q10, q90] bounds
|
| 208 |
+
- **Approach**: Conformal prediction or quantile calibration
|
| 209 |
+
- **Expected Improvement**: Better uncertainty quantification
|
| 210 |
+
- **Timeline**: 2 weeks
|
| 211 |
+
|
| 212 |
+
---
|
| 213 |
+
|
| 214 |
+
## Troubleshooting
|
| 215 |
+
|
| 216 |
+
### Common Issues
|
| 217 |
+
|
| 218 |
+
#### 1. Space Shows "PAUSED" Status
|
| 219 |
+
|
| 220 |
+
**Cause**: GPU tier requires manual approval or billing issue
|
| 221 |
+
|
| 222 |
+
**Solution**:
|
| 223 |
+
1. Check Space settings: https://huggingface.co/spaces/evgueni-p/fbmc-chronos2/settings
|
| 224 |
+
2. Verify account tier supports A100-large
|
| 225 |
+
3. Click "Factory Reboot" to restart
|
| 226 |
+
|
| 227 |
+
#### 2. CUDA Out of Memory Errors
|
| 228 |
+
|
| 229 |
+
**Symptoms**: Returns `debug_*.txt` file instead of parquet, error shows OOM
|
| 230 |
+
|
| 231 |
+
**Solution**:
|
| 232 |
+
1. Verify `suggested_hardware: a100-large` in README.md
|
| 233 |
+
2. Check Space logs for actual GPU allocated
|
| 234 |
+
3. If downgraded to L4, file GitHub issue for GPU upgrade
|
| 235 |
+
|
| 236 |
+
**Fallback**: Reduce `context_hours` from 128 to 64 in `src/forecasting/chronos_inference.py:117`
|
| 237 |
+
|
| 238 |
+
#### 3. Forecast Returns Empty/Invalid Data
|
| 239 |
+
|
| 240 |
+
**Check**:
|
| 241 |
+
1. Verify `run_date` is within dataset range (2023-10-01 to 2025-09-30)
|
| 242 |
+
2. Check dataset accessibility: https://huggingface.co/datasets/evgueni-p/fbmc-features-24month
|
| 243 |
+
3. Review debug file for specific errors
|
| 244 |
+
|
| 245 |
+
#### 4. Slow Inference (>10 minutes)
|
| 246 |
+
|
| 247 |
+
**Normal Range**: 3-5 minutes for 38 borders × 14 days
|
| 248 |
+
|
| 249 |
+
**If Slower**:
|
| 250 |
+
1. Check Space GPU allocation (should be A100)
|
| 251 |
+
2. Verify `batch_size=32` in code (not reverted to 256)
|
| 252 |
+
3. Check HF Space region (US-East faster than EU)
|
| 253 |
+
|
| 254 |
+
---
|
| 255 |
+
|
| 256 |
+
## Development Workflow
|
| 257 |
+
|
| 258 |
+
### Local Development
|
| 259 |
+
|
| 260 |
+
```bash
|
| 261 |
+
# Clone repository
|
| 262 |
+
git clone https://github.com/evgspacdmy/fbmc_chronos2.git
|
| 263 |
+
cd fbmc_chronos2
|
| 264 |
+
|
| 265 |
+
# Create virtual environment
|
| 266 |
+
python -m venv .venv
|
| 267 |
+
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
| 268 |
+
|
| 269 |
+
# Install dependencies with uv (faster than pip)
|
| 270 |
+
.venv/Scripts/uv.exe pip install -r requirements.txt
|
| 271 |
+
|
| 272 |
+
# Run local tests
|
| 273 |
+
pytest tests/ -v
|
| 274 |
+
```
|
| 275 |
+
|
| 276 |
+
### Deploying Changes to HF Space
|
| 277 |
+
|
| 278 |
+
**CRITICAL**: HF Space uses `main` branch, local uses `master`
|
| 279 |
+
|
| 280 |
+
```bash
|
| 281 |
+
# Make changes locally
|
| 282 |
+
git add .
|
| 283 |
+
git commit -m "feat: your description"
|
| 284 |
+
|
| 285 |
+
# Push to BOTH remotes
|
| 286 |
+
git push origin master # GitHub (version control)
|
| 287 |
+
git push hf-new master:main # HF Space (deployment)
|
| 288 |
+
```
|
| 289 |
+
|
| 290 |
+
**Wait 3-5 minutes** for Space rebuild. Check logs for successful deployment.
|
| 291 |
+
|
| 292 |
+
### Adding New Features
|
| 293 |
+
|
| 294 |
+
1. Create feature branch: `git checkout -b feature/name`
|
| 295 |
+
2. Implement changes with tests
|
| 296 |
+
3. Run evaluation: `python scripts/evaluate_october_2024.py`
|
| 297 |
+
4. Merge to master if MAE doesn't degrade
|
| 298 |
+
5. Push to both remotes
|
| 299 |
+
|
| 300 |
+
---
|
| 301 |
+
|
| 302 |
+
## API Reference
|
| 303 |
+
|
| 304 |
+
### Gradio API Endpoints
|
| 305 |
+
|
| 306 |
+
#### `/forecast`
|
| 307 |
+
|
| 308 |
+
**Parameters**:
|
| 309 |
+
- `run_date` (str): Forecast run date in `YYYY-MM-DD` format
|
| 310 |
+
- `forecast_type` (str): `"smoke_test"` or `"full_14day"`
|
| 311 |
+
|
| 312 |
+
**Returns**:
|
| 313 |
+
- File path to parquet forecast or debug txt (if errors)
|
| 314 |
+
|
| 315 |
+
**Example**:
|
| 316 |
+
```python
|
| 317 |
+
result = client.predict(
|
| 318 |
+
run_date="2024-09-30",
|
| 319 |
+
forecast_type="full_14day",
|
| 320 |
+
api_name="/forecast"
|
| 321 |
+
)
|
| 322 |
+
```
|
| 323 |
+
|
| 324 |
+
### Python SDK (Gradio Client)
|
| 325 |
+
|
| 326 |
+
```python
|
| 327 |
+
from gradio_client import Client
|
| 328 |
+
import polars as pl
|
| 329 |
+
|
| 330 |
+
# Initialize client
|
| 331 |
+
client = Client("evgueni-p/fbmc-chronos2")
|
| 332 |
+
|
| 333 |
+
# Run forecast
|
| 334 |
+
result = client.predict(
|
| 335 |
+
run_date="2024-09-30",
|
| 336 |
+
forecast_type="full_14day",
|
| 337 |
+
api_name="/forecast"
|
| 338 |
+
)
|
| 339 |
+
|
| 340 |
+
# Load and process results
|
| 341 |
+
df = pl.read_parquet(result)
|
| 342 |
+
|
| 343 |
+
# Extract specific border
|
| 344 |
+
at_cz_median = df.select(["timestamp", "AT_CZ_median"])
|
| 345 |
+
```
|
| 346 |
+
|
| 347 |
+
---
|
| 348 |
+
|
| 349 |
+
## Data Schema
|
| 350 |
+
|
| 351 |
+
### Feature Dataset Columns
|
| 352 |
+
|
| 353 |
+
**Total**: 2,514 columns (1 timestamp + 603 target borders + 12 actuals + 1,899 features)
|
| 354 |
+
|
| 355 |
+
**Target Columns** (603):
|
| 356 |
+
- `target_border_{BORDER}`: Historical flow values (MW)
|
| 357 |
+
- Example: `target_border_AT_CZ`, `target_border_FR_DE`
|
| 358 |
+
|
| 359 |
+
**Actual Columns** (12):
|
| 360 |
+
- `actual_{ZONE}_price`: Day-ahead electricity price (EUR/MWh)
|
| 361 |
+
- Example: `actual_DE_price`, `actual_FR_price`
|
| 362 |
+
|
| 363 |
+
**Feature Categories** (1,899 total):
|
| 364 |
+
|
| 365 |
+
1. **Weather Future** (520 features)
|
| 366 |
+
- `weather_future_{zone}_{var}`: temperature, wind_speed, etc.
|
| 367 |
+
- Zones: AT, BE, CZ, DE, FR, HU, HR, NL, PL, RO, SI, SK
|
| 368 |
+
- Variables: temperature, wind_u, wind_v, pressure, humidity, etc.
|
| 369 |
+
|
| 370 |
+
2. **Generation Future** (52 features)
|
| 371 |
+
- `generation_future_{zone}_{type}`: solar, wind, hydro, nuclear
|
| 372 |
+
- Example: `generation_future_DE_solar`
|
| 373 |
+
|
| 374 |
+
3. **CNEC Outages** (34 features)
|
| 375 |
+
- `cnec_outage_{cnec_id}`: Binary availability (0=outage, 1=available)
|
| 376 |
+
- Tier-1 CNECs (most binding)
|
| 377 |
+
|
| 378 |
+
4. **Market** (9 features)
|
| 379 |
+
- `lta_{border}`: Long-term allocation (MW)
|
| 380 |
+
- Day-ahead price forecasts
|
| 381 |
+
|
| 382 |
+
### Forecast Output Schema
|
| 383 |
+
|
| 384 |
+
**Columns**: 115 (1 timestamp + 38 borders × 3 quantiles)
|
| 385 |
+
|
| 386 |
+
```
|
| 387 |
+
timestamp: datetime
|
| 388 |
+
{border}_median: float64 (50th percentile forecast)
|
| 389 |
+
{border}_q10: float64 (10th percentile, lower bound)
|
| 390 |
+
{border}_q90: float64 (90th percentile, upper bound)
|
| 391 |
+
```
|
| 392 |
+
|
| 393 |
+
**Borders**: AT_CZ, AT_HU, AT_SI, BE_DE, CZ_AT, ..., NL_DE (38 total)
|
| 394 |
+
|
| 395 |
+
---
|
| 396 |
+
|
| 397 |
+
## Contact & Support
|
| 398 |
+
|
| 399 |
+
### Project Repository
|
| 400 |
+
- **GitHub**: https://github.com/evgspacdmy/fbmc_chronos2
|
| 401 |
+
- **HF Space**: https://huggingface.co/spaces/evgueni-p/fbmc-chronos2
|
| 402 |
+
- **Dataset**: https://huggingface.co/datasets/evgueni-p/fbmc-features-24month
|
| 403 |
+
|
| 404 |
+
### Key Documentation
|
| 405 |
+
- `doc/activity.md`: Development log and session history
|
| 406 |
+
- `DEPLOYMENT_NOTES.md`: HF Space deployment troubleshooting
|
| 407 |
+
- `CLAUDE.md`: Development rules and conventions
|
| 408 |
+
- `README.md`: Project overview and quick start
|
| 409 |
+
|
| 410 |
+
### Getting Help
|
| 411 |
+
|
| 412 |
+
1. **Check documentation** first (this guide, README.md, activity.md)
|
| 413 |
+
2. **Review recent commits** for similar issues
|
| 414 |
+
3. **Check HF Space logs** for runtime errors
|
| 415 |
+
4. **File GitHub issue** with detailed error description
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
+
|
| 419 |
+
## Appendix: Technical Details
|
| 420 |
+
|
| 421 |
+
### Model Specifications
|
| 422 |
+
|
| 423 |
+
- **Architecture**: Chronos-2 (T5-based encoder-decoder)
|
| 424 |
+
- **Parameters**: 710M
|
| 425 |
+
- **Precision**: bfloat16 (memory efficient)
|
| 426 |
+
- **Context**: 128 hours (reduced from 512h for GPU memory)
|
| 427 |
+
- **Horizon**: 336 hours (14 days)
|
| 428 |
+
- **Batch Size**: 32 (optimized for A100 GPU)
|
| 429 |
+
- **Quantiles**: 3 [0.1, 0.5, 0.9]
|
| 430 |
+
|
| 431 |
+
### Inference Configuration
|
| 432 |
+
|
| 433 |
+
```python
|
| 434 |
+
pipeline.predict_df(
|
| 435 |
+
context_data, # 128h × 2,514 features
|
| 436 |
+
future_df=future_data, # 336h × 615 features
|
| 437 |
+
prediction_length=336,
|
| 438 |
+
batch_size=32,
|
| 439 |
+
quantile_levels=[0.1, 0.5, 0.9]
|
| 440 |
+
)
|
| 441 |
+
```
|
| 442 |
+
|
| 443 |
+
### Memory Footprint
|
| 444 |
+
|
| 445 |
+
- Model weights: ~2 GB (bfloat16)
|
| 446 |
+
- Dataset: ~1 GB (in-memory)
|
| 447 |
+
- PyTorch cache: ~15 GB (workspace)
|
| 448 |
+
- Attention (per batch): ~11 GB
|
| 449 |
+
- **Total**: ~29 GB (peak)
|
| 450 |
+
|
| 451 |
+
### GPU Requirements
|
| 452 |
+
|
| 453 |
+
| GPU | VRAM | Status |
|
| 454 |
+
|-----|------|--------|
|
| 455 |
+
| T4 | 16 GB | ❌ Insufficient (18 GB baseline) |
|
| 456 |
+
| L4 | 22 GB | ❌ Insufficient (29 GB peak) |
|
| 457 |
+
| A10G | 24 GB | ⚠️ Marginal (tight fit) |
|
| 458 |
+
| **A100** | **40-80 GB** | ✅ **Recommended** |
|
| 459 |
+
|
| 460 |
+
---
|
| 461 |
+
|
| 462 |
+
**Document Version**: 1.0.0
|
| 463 |
+
**Last Updated**: 2025-11-18
|
| 464 |
+
**Status**: Production Ready
|
archive/testing/deploy_memory_fix_ssh.sh
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Deploy memory optimizations via SSH to HuggingFace Space
|
| 3 |
+
# Run this after adding SSH key to HuggingFace settings
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
echo "[1/5] Testing SSH connection..."
|
| 8 |
+
ssh -o ConnectTimeout=10 [email protected] "echo 'SSH OK' && pwd"
|
| 9 |
+
|
| 10 |
+
echo ""
|
| 11 |
+
echo "[2/5] Backing up current file..."
|
| 12 |
+
ssh [email protected] "cp /home/user/app/src/forecasting/chronos_inference.py /home/user/app/src/forecasting/chronos_inference.py.backup"
|
| 13 |
+
|
| 14 |
+
echo ""
|
| 15 |
+
echo "[3/5] Applying memory optimizations..."
|
| 16 |
+
|
| 17 |
+
# Add model.eval() after line 72
|
| 18 |
+
ssh [email protected] "sed -i '72a\\ # Set model to evaluation mode (disables dropout, etc.)' /home/user/app/src/forecasting/chronos_inference.py"
|
| 19 |
+
ssh [email protected] "sed -i '73a\\ self._pipeline.model.eval()' /home/user/app/src/forecasting/chronos_inference.py"
|
| 20 |
+
|
| 21 |
+
# Add torch.inference_mode() wrapper around predict_df()
|
| 22 |
+
ssh [email protected] "sed -i '188i\\ # Use torch.inference_mode() to disable gradient tracking (saves ~2-5 GB VRAM)' /home/user/app/src/forecasting/chronos_inference.py"
|
| 23 |
+
ssh [email protected] "sed -i '189i\\ with torch.inference_mode():' /home/user/app/src/forecasting/chronos_inference.py"
|
| 24 |
+
|
| 25 |
+
# Indent predict_df() call (add 4 spaces)
|
| 26 |
+
ssh [email protected] "sed -i '190,197s/^/ /' /home/user/app/src/forecasting/chronos_inference.py"
|
| 27 |
+
|
| 28 |
+
echo ""
|
| 29 |
+
echo "[4/5] Verifying changes..."
|
| 30 |
+
ssh [email protected] "grep -A 2 'model.eval()' /home/user/app/src/forecasting/chronos_inference.py || echo 'ERROR: model.eval() not found'"
|
| 31 |
+
ssh [email protected] "grep -A 2 'inference_mode()' /home/user/app/src/forecasting/chronos_inference.py || echo 'ERROR: inference_mode() not found'"
|
| 32 |
+
|
| 33 |
+
echo ""
|
| 34 |
+
echo "[5/5] Restarting Gradio app..."
|
| 35 |
+
ssh [email protected] "pkill -f 'app.py' || true"
|
| 36 |
+
sleep 3
|
| 37 |
+
ssh [email protected] "cd /home/user/app && nohup python app.py > /tmp/gradio.log 2>&1 &"
|
| 38 |
+
|
| 39 |
+
echo ""
|
| 40 |
+
echo "[SUCCESS] Memory optimizations deployed!"
|
| 41 |
+
echo "[INFO] App restarting - test in 30 seconds"
|
| 42 |
+
echo ""
|
| 43 |
+
echo "Test with:"
|
| 44 |
+
echo " python test_api.py"
|
archive/testing/run_smoke_test.py
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Run smoke test notebook on HuggingFace Space
|
| 4 |
+
"""
|
| 5 |
+
import subprocess
|
| 6 |
+
import sys
|
| 7 |
+
import os
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
|
| 10 |
+
def run_notebook(notebook_path):
|
| 11 |
+
"""Execute a Jupyter notebook using nbconvert"""
|
| 12 |
+
print(f"Running notebook: {notebook_path}")
|
| 13 |
+
|
| 14 |
+
cmd = [
|
| 15 |
+
"jupyter", "nbconvert",
|
| 16 |
+
"--to", "notebook",
|
| 17 |
+
"--execute",
|
| 18 |
+
"--inplace",
|
| 19 |
+
"--ExecutePreprocessor.timeout=600",
|
| 20 |
+
str(notebook_path)
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
result = subprocess.run(cmd, capture_output=True, text=True)
|
| 24 |
+
|
| 25 |
+
if result.returncode == 0:
|
| 26 |
+
print(f"✓ Successfully executed {notebook_path}")
|
| 27 |
+
return True
|
| 28 |
+
else:
|
| 29 |
+
print(f"✗ Error executing {notebook_path}")
|
| 30 |
+
print(f"STDOUT: {result.stdout}")
|
| 31 |
+
print(f"STDERR: {result.stderr}")
|
| 32 |
+
return False
|
| 33 |
+
|
| 34 |
+
if __name__ == "__main__":
|
| 35 |
+
# Set HF token from environment
|
| 36 |
+
if "HF_TOKEN" not in os.environ:
|
| 37 |
+
print("Warning: HF_TOKEN not set in environment")
|
| 38 |
+
print("Set it with: export HF_TOKEN='your_token'")
|
| 39 |
+
|
| 40 |
+
# Run smoke test
|
| 41 |
+
notebook = Path("/data/inference_smoke_test.ipynb")
|
| 42 |
+
|
| 43 |
+
if not notebook.exists():
|
| 44 |
+
print(f"Error: Notebook not found at {notebook}")
|
| 45 |
+
sys.exit(1)
|
| 46 |
+
|
| 47 |
+
success = run_notebook(notebook)
|
| 48 |
+
sys.exit(0 if success else 1)
|
archive/testing/test_api.py
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Test API connection to HF Space"""
|
| 3 |
+
import sys
|
| 4 |
+
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
from dotenv import load_dotenv
|
| 8 |
+
load_dotenv()
|
| 9 |
+
|
| 10 |
+
from gradio_client import Client
|
| 11 |
+
|
| 12 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 13 |
+
print("Attempting to connect to Space...", flush=True)
|
| 14 |
+
|
| 15 |
+
try:
|
| 16 |
+
client = Client("evgueni-p/fbmc-chronos2", hf_token=hf_token)
|
| 17 |
+
print("[OK] Connected successfully!", flush=True)
|
| 18 |
+
|
| 19 |
+
# Check available endpoints
|
| 20 |
+
print("\nAvailable API endpoints:", flush=True)
|
| 21 |
+
print(f"Endpoints: {client.endpoints}", flush=True)
|
| 22 |
+
|
| 23 |
+
print("\nSpace is running. Testing smoke test API call...", flush=True)
|
| 24 |
+
|
| 25 |
+
# Try a smoke test - let Gradio auto-detect the endpoint
|
| 26 |
+
result = client.predict(
|
| 27 |
+
"2025-09-30", # run_date
|
| 28 |
+
"smoke_test", # forecast_type
|
| 29 |
+
)
|
| 30 |
+
print(f"[OK] API call successful!", flush=True)
|
| 31 |
+
print(f"Result file: {result}", flush=True)
|
| 32 |
+
|
| 33 |
+
except Exception as e:
|
| 34 |
+
print(f"[ERROR] {type(e).__name__}: {str(e)}", flush=True)
|
| 35 |
+
import traceback
|
| 36 |
+
traceback.print_exc()
|
archive/testing/validate_forecast.py
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Validate forecast results"""
|
| 3 |
+
import sys
|
| 4 |
+
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
|
| 5 |
+
|
| 6 |
+
import polars as pl
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
|
| 9 |
+
# Find the most recent forecast file in Windows temp directory
|
| 10 |
+
temp_dir = Path(r"C:\Users\evgue\AppData\Local\Temp\gradio")
|
| 11 |
+
forecast_files = list(temp_dir.glob("**/forecast_*.parquet"))
|
| 12 |
+
|
| 13 |
+
if not forecast_files:
|
| 14 |
+
print("[ERROR] No forecast files found", flush=True)
|
| 15 |
+
sys.exit(1)
|
| 16 |
+
|
| 17 |
+
# Get the most recent file
|
| 18 |
+
latest_forecast = max(forecast_files, key=lambda p: p.stat().st_mtime)
|
| 19 |
+
print(f"Examining: {latest_forecast.name}", flush=True)
|
| 20 |
+
print(f"Full path: {latest_forecast}", flush=True)
|
| 21 |
+
|
| 22 |
+
# Load and examine the forecast
|
| 23 |
+
df = pl.read_parquet(latest_forecast)
|
| 24 |
+
|
| 25 |
+
print(f"\n[OK] Forecast loaded successfully", flush=True)
|
| 26 |
+
print(f"Shape: {df.shape} (rows x columns)", flush=True)
|
| 27 |
+
print(f"\nColumns: {df.columns}", flush=True)
|
| 28 |
+
print(f"\nData types:\n{df.dtypes}", flush=True)
|
| 29 |
+
|
| 30 |
+
# Check for expected structure
|
| 31 |
+
print(f"\n--- Validation ---", flush=True)
|
| 32 |
+
assert 'timestamp' in df.columns, "Missing timestamp column"
|
| 33 |
+
print("[OK] timestamp column present", flush=True)
|
| 34 |
+
|
| 35 |
+
# Check for forecast columns (median, q10, q90)
|
| 36 |
+
forecast_cols = [c for c in df.columns if c != 'timestamp']
|
| 37 |
+
print(f"[OK] Found {len(forecast_cols)} forecast columns", flush=True)
|
| 38 |
+
|
| 39 |
+
# Check number of rows (should be 168 for 7 days)
|
| 40 |
+
expected_rows = 168 # 7 days * 24 hours
|
| 41 |
+
print(f"[OK] Rows: {len(df)} (expected: {expected_rows})", flush=True)
|
| 42 |
+
|
| 43 |
+
# Display first few rows
|
| 44 |
+
print(f"\n--- First 5 rows ---", flush=True)
|
| 45 |
+
print(df.head(5))
|
| 46 |
+
|
| 47 |
+
# Display summary statistics
|
| 48 |
+
print(f"\n--- Summary Statistics ---", flush=True)
|
| 49 |
+
print(df.select([c for c in df.columns if c != 'timestamp']).describe())
|
| 50 |
+
|
| 51 |
+
print(f"\n[SUCCESS] Smoke test validation complete!", flush=True)
|
scripts/evaluate_october_2024.py
ADDED
|
@@ -0,0 +1,275 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
October 2024 Evaluation - Multivariate Chronos-2
|
| 4 |
+
Run date: 2024-09-30 (forecast Oct 1-14, 2024)
|
| 5 |
+
Compares 38-border × 14-day forecast against actuals
|
| 6 |
+
Calculates D+1 through D+14 MAE for each border
|
| 7 |
+
"""
|
| 8 |
+
import sys
|
| 9 |
+
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
|
| 10 |
+
|
| 11 |
+
import os
|
| 12 |
+
import time
|
| 13 |
+
import numpy as np
|
| 14 |
+
import polars as pl
|
| 15 |
+
from datetime import datetime, timedelta
|
| 16 |
+
from pathlib import Path
|
| 17 |
+
from dotenv import load_dotenv
|
| 18 |
+
from gradio_client import Client
|
| 19 |
+
|
| 20 |
+
load_dotenv()
|
| 21 |
+
|
| 22 |
+
def main():
|
| 23 |
+
print("="*70)
|
| 24 |
+
print("OCTOBER 2024 MULTIVARIATE CHRONOS-2 EVALUATION")
|
| 25 |
+
print("="*70)
|
| 26 |
+
|
| 27 |
+
total_start = time.time()
|
| 28 |
+
|
| 29 |
+
# Step 1: Connect to HF Space
|
| 30 |
+
print("\n[1/6] Connecting to HuggingFace Space...")
|
| 31 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 32 |
+
if not hf_token:
|
| 33 |
+
raise ValueError("HF_TOKEN not found in environment")
|
| 34 |
+
|
| 35 |
+
client = Client("evgueni-p/fbmc-chronos2", hf_token=hf_token)
|
| 36 |
+
print("[OK] Connected to HF Space")
|
| 37 |
+
|
| 38 |
+
# Step 2: Run full 14-day forecast for Oct 1-14, 2024
|
| 39 |
+
print("\n[2/6] Running full 38-border forecast via HF Space API...")
|
| 40 |
+
print(" Run date: 2024-09-30")
|
| 41 |
+
print(" Forecast period: Oct 1-14, 2024 (336 hours)")
|
| 42 |
+
print(" This may take 5-10 minutes...")
|
| 43 |
+
|
| 44 |
+
forecast_start_time = time.time()
|
| 45 |
+
result_file = client.predict(
|
| 46 |
+
"2024-09-30", # run_date
|
| 47 |
+
"full_14day", # forecast_type
|
| 48 |
+
)
|
| 49 |
+
forecast_time = time.time() - forecast_start_time
|
| 50 |
+
|
| 51 |
+
print(f"[OK] Forecast complete in {forecast_time/60:.2f} minutes")
|
| 52 |
+
print(f" Result file: {result_file}")
|
| 53 |
+
|
| 54 |
+
# Step 3: Load forecast results
|
| 55 |
+
print("\n[3/6] Loading forecast results...")
|
| 56 |
+
forecast_df = pl.read_parquet(result_file)
|
| 57 |
+
print(f"[OK] Loaded forecast with shape: {forecast_df.shape}")
|
| 58 |
+
print(f" Columns: {len(forecast_df.columns)} (timestamp + {len(forecast_df.columns)-1} forecast columns)")
|
| 59 |
+
|
| 60 |
+
# Identify border columns (median forecasts)
|
| 61 |
+
median_cols = [col for col in forecast_df.columns if col.endswith('_median')]
|
| 62 |
+
borders = [col.replace('_median', '') for col in median_cols]
|
| 63 |
+
print(f"[OK] Found {len(borders)} borders")
|
| 64 |
+
|
| 65 |
+
# Step 4: Load actuals from local dataset
|
| 66 |
+
print("\n[4/6] Loading actual values from local dataset...")
|
| 67 |
+
local_data_path = Path('data/processed/features_unified_24month.parquet')
|
| 68 |
+
|
| 69 |
+
if not local_data_path.exists():
|
| 70 |
+
print(f"[ERROR] Local dataset not found at: {local_data_path}")
|
| 71 |
+
sys.exit(1)
|
| 72 |
+
|
| 73 |
+
df = pl.read_parquet(local_data_path)
|
| 74 |
+
|
| 75 |
+
print(f"[OK] Loaded dataset: {len(df)} rows")
|
| 76 |
+
print(f" Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
|
| 77 |
+
|
| 78 |
+
# Extract October 1-14, 2024 actuals
|
| 79 |
+
oct_start = datetime(2024, 10, 1, 0, 0, 0)
|
| 80 |
+
oct_end = datetime(2024, 10, 14, 23, 0, 0)
|
| 81 |
+
|
| 82 |
+
actual_df = df.filter(
|
| 83 |
+
(pl.col('timestamp') >= oct_start) &
|
| 84 |
+
(pl.col('timestamp') <= oct_end)
|
| 85 |
+
)
|
| 86 |
+
|
| 87 |
+
if len(actual_df) == 0:
|
| 88 |
+
print("[ERROR] No actual data found for October 2024!")
|
| 89 |
+
print(" Dataset may not contain October 2024 data.")
|
| 90 |
+
print(" Available date range in dataset:")
|
| 91 |
+
print(f" {df['timestamp'].min()} to {df['timestamp'].max()}")
|
| 92 |
+
sys.exit(1)
|
| 93 |
+
|
| 94 |
+
print(f"[OK] Extracted {len(actual_df)} hours of actual values")
|
| 95 |
+
|
| 96 |
+
# Step 5: Calculate metrics for each border
|
| 97 |
+
print(f"\n[5/6] Calculating MAE metrics for {len(borders)} borders...")
|
| 98 |
+
print(" Progress:")
|
| 99 |
+
|
| 100 |
+
results = []
|
| 101 |
+
|
| 102 |
+
for i, border in enumerate(borders, 1):
|
| 103 |
+
# Get forecast for this border (median)
|
| 104 |
+
forecast_col = f"{border}_median"
|
| 105 |
+
|
| 106 |
+
if forecast_col not in forecast_df.columns:
|
| 107 |
+
print(f" [{i:2d}/{len(borders)}] {border:15s} - SKIPPED (no forecast)")
|
| 108 |
+
continue
|
| 109 |
+
|
| 110 |
+
# Get actual values for this border
|
| 111 |
+
target_col = f'target_border_{border}'
|
| 112 |
+
|
| 113 |
+
if target_col not in actual_df.columns:
|
| 114 |
+
print(f" [{i:2d}/{len(borders)}] {border:15s} - SKIPPED (no actuals)")
|
| 115 |
+
continue
|
| 116 |
+
|
| 117 |
+
# Merge forecast with actuals on timestamp
|
| 118 |
+
merged = forecast_df.select(['timestamp', forecast_col]).join(
|
| 119 |
+
actual_df.select(['timestamp', target_col]),
|
| 120 |
+
on='timestamp',
|
| 121 |
+
how='left'
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
# Calculate overall MAE (all 336 hours)
|
| 125 |
+
valid_data = merged.filter(
|
| 126 |
+
pl.col(forecast_col).is_not_null() &
|
| 127 |
+
pl.col(target_col).is_not_null()
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
if len(valid_data) == 0:
|
| 131 |
+
print(f" [{i:2d}/{len(borders)}] {border:15s} - SKIPPED (no valid data)")
|
| 132 |
+
continue
|
| 133 |
+
|
| 134 |
+
# Calculate overall metrics
|
| 135 |
+
mae = (valid_data[forecast_col] - valid_data[target_col]).abs().mean()
|
| 136 |
+
rmse = ((valid_data[forecast_col] - valid_data[target_col])**2).mean()**0.5
|
| 137 |
+
|
| 138 |
+
# Calculate per-day MAE (D+1 through D+14)
|
| 139 |
+
per_day_mae = []
|
| 140 |
+
for day in range(1, 15):
|
| 141 |
+
day_start = oct_start + timedelta(days=day-1)
|
| 142 |
+
day_end = day_start + timedelta(days=1) - timedelta(hours=1)
|
| 143 |
+
|
| 144 |
+
day_data = valid_data.filter(
|
| 145 |
+
(pl.col('timestamp') >= day_start) &
|
| 146 |
+
(pl.col('timestamp') <= day_end)
|
| 147 |
+
)
|
| 148 |
+
|
| 149 |
+
if len(day_data) > 0:
|
| 150 |
+
day_mae = (day_data[forecast_col] - day_data[target_col]).abs().mean()
|
| 151 |
+
per_day_mae.append(day_mae)
|
| 152 |
+
else:
|
| 153 |
+
per_day_mae.append(np.nan)
|
| 154 |
+
|
| 155 |
+
results.append({
|
| 156 |
+
'border': border,
|
| 157 |
+
'mae_overall': mae,
|
| 158 |
+
'rmse_overall': rmse,
|
| 159 |
+
'mae_d1': per_day_mae[0] if len(per_day_mae) > 0 else np.nan,
|
| 160 |
+
'mae_d2': per_day_mae[1] if len(per_day_mae) > 1 else np.nan,
|
| 161 |
+
'mae_d7': per_day_mae[6] if len(per_day_mae) > 6 else np.nan,
|
| 162 |
+
'mae_d14': per_day_mae[13] if len(per_day_mae) > 13 else np.nan,
|
| 163 |
+
'n_hours': len(valid_data),
|
| 164 |
+
})
|
| 165 |
+
|
| 166 |
+
# Status indicator
|
| 167 |
+
d1_mae = per_day_mae[0] if len(per_day_mae) > 0 else np.inf
|
| 168 |
+
status = "[OK]" if d1_mae <= 150 else "[!]"
|
| 169 |
+
|
| 170 |
+
print(f" [{i:2d}/{len(borders)}] {border:15s} - D+1 MAE: {d1_mae:6.1f} MW {status}")
|
| 171 |
+
|
| 172 |
+
# Step 6: Summary statistics
|
| 173 |
+
print("\n[6/6] Generating summary report...")
|
| 174 |
+
|
| 175 |
+
if not results:
|
| 176 |
+
print("[ERROR] No results to summarize")
|
| 177 |
+
sys.exit(1)
|
| 178 |
+
|
| 179 |
+
results_df = pl.DataFrame(results)
|
| 180 |
+
|
| 181 |
+
# Calculate summary statistics
|
| 182 |
+
mean_mae_d1 = results_df['mae_d1'].mean()
|
| 183 |
+
median_mae_d1 = results_df['mae_d1'].median()
|
| 184 |
+
min_mae_d1 = results_df['mae_d1'].min()
|
| 185 |
+
max_mae_d1 = results_df['mae_d1'].max()
|
| 186 |
+
|
| 187 |
+
# Save results to CSV
|
| 188 |
+
output_file = Path('results/october_2024_multivariate.csv')
|
| 189 |
+
output_file.parent.mkdir(exist_ok=True)
|
| 190 |
+
results_df.write_csv(output_file)
|
| 191 |
+
print(f"[OK] Results saved to: {output_file}")
|
| 192 |
+
|
| 193 |
+
# Generate summary report
|
| 194 |
+
print("\n" + "="*70)
|
| 195 |
+
print("EVALUATION RESULTS SUMMARY - OCTOBER 2024")
|
| 196 |
+
print("="*70)
|
| 197 |
+
|
| 198 |
+
print(f"\nBorders evaluated: {len(results)}/{len(borders)}")
|
| 199 |
+
print(f"Total forecast time: {forecast_time/60:.2f} minutes")
|
| 200 |
+
print(f"Total evaluation time: {(time.time() - total_start)/60:.2f} minutes")
|
| 201 |
+
|
| 202 |
+
print(f"\n*** D+1 MAE (PRIMARY METRIC) ***")
|
| 203 |
+
print(f"Mean: {mean_mae_d1:.2f} MW (Target: [<=]134 MW)")
|
| 204 |
+
print(f"Median: {median_mae_d1:.2f} MW")
|
| 205 |
+
print(f"Min: {min_mae_d1:.2f} MW")
|
| 206 |
+
print(f"Max: {max_mae_d1:.2f} MW")
|
| 207 |
+
|
| 208 |
+
# Target achievement
|
| 209 |
+
below_target = (results_df['mae_d1'] <= 150).sum()
|
| 210 |
+
print(f"\n*** TARGET ACHIEVEMENT ***")
|
| 211 |
+
print(f"Borders with D+1 MAE [<=]150 MW: {below_target}/{len(results)} ({below_target/len(results)*100:.1f}%)")
|
| 212 |
+
|
| 213 |
+
# Best and worst performers
|
| 214 |
+
print(f"\n*** TOP 5 BEST PERFORMERS (Lowest D+1 MAE) ***")
|
| 215 |
+
best = results_df.sort('mae_d1').head(5)
|
| 216 |
+
for row in best.iter_rows(named=True):
|
| 217 |
+
print(f" {row['border']:15s}: D+1 MAE={row['mae_d1']:6.1f} MW, Overall MAE={row['mae_overall']:6.1f} MW")
|
| 218 |
+
|
| 219 |
+
print(f"\n*** TOP 5 WORST PERFORMERS (Highest D+1 MAE) ***")
|
| 220 |
+
worst = results_df.sort('mae_d1', descending=True).head(5)
|
| 221 |
+
for row in worst.iter_rows(named=True):
|
| 222 |
+
print(f" {row['border']:15s}: D+1 MAE={row['mae_d1']:6.1f} MW, Overall MAE={row['mae_overall']:6.1f} MW")
|
| 223 |
+
|
| 224 |
+
# MAE degradation over forecast horizon
|
| 225 |
+
print(f"\n*** MAE DEGRADATION OVER FORECAST HORIZON ***")
|
| 226 |
+
mean_mae_d2 = results_df['mae_d2'].mean()
|
| 227 |
+
mean_mae_d7 = results_df['mae_d7'].mean()
|
| 228 |
+
mean_mae_d14 = results_df['mae_d14'].mean()
|
| 229 |
+
|
| 230 |
+
print(f"D+1: {mean_mae_d1:.2f} MW")
|
| 231 |
+
print(f"D+2: {mean_mae_d2:.2f} MW (+{mean_mae_d2 - mean_mae_d1:.2f} MW)")
|
| 232 |
+
print(f"D+7: {mean_mae_d7:.2f} MW (+{mean_mae_d7 - mean_mae_d1:.2f} MW)")
|
| 233 |
+
print(f"D+14: {mean_mae_d14:.2f} MW (+{mean_mae_d14 - mean_mae_d1:.2f} MW)")
|
| 234 |
+
|
| 235 |
+
# Final verdict
|
| 236 |
+
print("\n" + "="*70)
|
| 237 |
+
if mean_mae_d1 <= 134:
|
| 238 |
+
print("[OK] TARGET ACHIEVED! Mean D+1 MAE [<=]134 MW")
|
| 239 |
+
print(" Zero-shot multivariate forecasting successful!")
|
| 240 |
+
elif mean_mae_d1 <= 150:
|
| 241 |
+
print("[~] CLOSE TO TARGET. Mean D+1 MAE [<=]150 MW")
|
| 242 |
+
print(" Zero-shot baseline established. Fine-tuning recommended.")
|
| 243 |
+
else:
|
| 244 |
+
print(f"[!] TARGET NOT MET. Mean D+1 MAE: {mean_mae_d1:.2f} MW (Target: [<=]134 MW)")
|
| 245 |
+
print(" Fine-tuning strongly recommended for Phase 2")
|
| 246 |
+
print("="*70)
|
| 247 |
+
|
| 248 |
+
# Save summary report
|
| 249 |
+
report_file = Path('results/october_2024_evaluation_report.txt')
|
| 250 |
+
with open(report_file, 'w', encoding='utf-8', errors='replace') as f:
|
| 251 |
+
f.write("="*70 + "\n")
|
| 252 |
+
f.write("OCTOBER 2024 MULTIVARIATE CHRONOS-2 EVALUATION REPORT\n")
|
| 253 |
+
f.write("="*70 + "\n\n")
|
| 254 |
+
f.write(f"Run date: 2024-09-30\n")
|
| 255 |
+
f.write(f"Forecast period: Oct 1-14, 2024 (336 hours)\n")
|
| 256 |
+
f.write(f"Model: amazon/chronos-2 (multivariate, 615 features)\n")
|
| 257 |
+
f.write(f"Borders evaluated: {len(results)}/{len(borders)}\n")
|
| 258 |
+
f.write(f"Forecast time: {forecast_time/60:.2f} minutes\n\n")
|
| 259 |
+
f.write(f"D+1 MAE RESULTS:\n")
|
| 260 |
+
f.write(f" Mean: {mean_mae_d1:.2f} MW\n")
|
| 261 |
+
f.write(f" Median: {median_mae_d1:.2f} MW\n")
|
| 262 |
+
f.write(f" Min: {min_mae_d1:.2f} MW\n")
|
| 263 |
+
f.write(f" Max: {max_mae_d1:.2f} MW\n\n")
|
| 264 |
+
f.write(f"Target achievement: {below_target}/{len(results)} borders with MAE [<=]150 MW\n\n")
|
| 265 |
+
if mean_mae_d1 <= 134:
|
| 266 |
+
f.write("[OK] TARGET ACHIEVED!\n")
|
| 267 |
+
else:
|
| 268 |
+
f.write(f"[!] Target not met - Fine-tuning recommended\n")
|
| 269 |
+
|
| 270 |
+
print(f"\n[OK] Summary report saved to: {report_file}")
|
| 271 |
+
print(f"\nTotal evaluation time: {(time.time() - total_start)/60:.1f} minutes")
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
if __name__ == '__main__':
|
| 275 |
+
main()
|