Spaces:
Sleeping
Sleeping
| # HuggingFace Space Setup Guide - FBMC Chronos 2 | |
| **IMPORTANT**: This is Day 3, Hour 1-4 of the implementation plan. Complete all steps before proceeding to inference pipeline development. | |
| --- | |
| ## Prerequisites | |
| - HuggingFace account: https://huggingface.co/join | |
| - HuggingFace write token: https://huggingface.co/settings/tokens | |
| - Git installed locally | |
| - Project files ready at: `C:\Users\evgue\projects\fbmc_chronos2` | |
| --- | |
| ## STEP 1: Create HuggingFace Dataset Repository (10 min) | |
| ### 1.1 Create Dataset on HuggingFace Web UI | |
| 1. Go to: https://huggingface.co/new-dataset | |
| 2. Fill in: | |
| - **Owner**: YOUR_USERNAME | |
| - **Dataset name**: `fbmc-features-24month` | |
| - **License**: MIT | |
| - **Visibility**: **Private** (contains project data) | |
| 3. Click "Create dataset" | |
| ### 1.2 Upload Data to Dataset | |
| #### Option A: Using the upload script (Recommended) | |
| ```bash | |
| # 1. Add your HF token to .env file | |
| echo "HF_TOKEN=hf_..." >> .env | |
| # 2. Edit the script to replace YOUR_USERNAME with your actual HF username | |
| # Edit: scripts/upload_to_hf_datasets.py | |
| # Replace all instances of "YOUR_USERNAME" with your HuggingFace username | |
| # 3. Install required packages | |
| .venv\Scripts\uv.exe pip install datasets huggingface-hub | |
| # 4. Run the upload script | |
| .venv\Scripts\python.exe scripts\upload_to_hf_datasets.py | |
| ``` | |
| The script will upload: | |
| - `features_unified_24month.parquet` (~25 MB) | |
| - `metadata.csv` (2,553 features) | |
| - `target_borders.txt` (38 target borders) | |
| #### Option B: Manual upload via web UI | |
| 1. Go to: https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month | |
| 2. Click "Files" tab → "Add file" → "Upload files" | |
| 3. Upload: | |
| - `data/processed/features_unified_24month.parquet` | |
| - `data/processed/features_unified_metadata.csv` (rename to `metadata.csv`) | |
| - `data/processed/target_borders_list.txt` (rename to `target_borders.txt`) | |
| ### 1.3 Verify Dataset Uploaded | |
| Visit: `https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month` | |
| You should see: | |
| - `features_unified_24month.parquet` (~25 MB) | |
| - `metadata.csv` (~200 KB) | |
| - `target_borders.txt` (~1 KB) | |
| --- | |
| ## STEP 2: Create HuggingFace Space (15 min) | |
| ### 2.1 Create Space on HuggingFace Web UI | |
| 1. Go to: https://huggingface.co/new-space | |
| 2. Fill in: | |
| - **Owner**: YOUR_USERNAME | |
| - **Space name**: `fbmc-chronos2-forecast` | |
| - **License**: MIT | |
| - **Select SDK**: **JupyterLab** | |
| - **Space hardware**: Click "Advanced" → Select **A10G GPU (24GB)** ($30/month) | |
| - **Visibility**: **Private** (contains API keys) | |
| 3. Click "Create Space" | |
| **IMPORTANT**: The Space will start building immediately. This takes ~10-15 minutes for first build. | |
| ### 2.2 Configure Space Secrets | |
| While the Space is building: | |
| 1. Go to Space → Settings → Variables and Secrets | |
| 2. Add these secrets (click "New secret"): | |
| | Name | Value | Description | | |
| |------|-------|-------------| | |
| | `HF_TOKEN` | `hf_...` | Your HuggingFace write token | | |
| | `ENTSOE_API_KEY` | `your_key` | ENTSO-E Transparency API key | | |
| 3. Click "Save" | |
| ### 2.3 Wait for Initial Build | |
| - Monitor build logs: Space → Logs tab | |
| - Wait for message: "Your Space is up and running" | |
| - This can take 10-15 minutes for first build | |
| --- | |
| ## STEP 3: Clone Space Locally (5 min) | |
| ### 3.1 Clone the Space Repository | |
| ```bash | |
| # Navigate to projects directory | |
| cd C:\Users\evgue\projects | |
| # Clone the Space (replace YOUR_USERNAME) | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-chronos2-forecast | |
| # Navigate into Space directory | |
| cd fbmc-chronos2-forecast | |
| ``` | |
| ### 3.2 Copy Project Files to Space | |
| ```bash | |
| # Copy source code | |
| cp -r ../fbmc_chronos2/src ./ | |
| # Copy requirements (rename to requirements.txt) | |
| cp ../fbmc_chronos2/hf_space_requirements.txt ./requirements.txt | |
| # Copy .env.example (for documentation) | |
| cp ../fbmc_chronos2/.env.example ./ | |
| # Create directories | |
| mkdir -p data/evaluation | |
| mkdir -p notebooks | |
| mkdir -p tests | |
| ``` | |
| ### 3.3 Create Space README.md | |
| Create `README.md` in the Space directory with: | |
| ```yaml | |
| --- | |
| title: FBMC Chronos 2 Forecast | |
| emoji: ⚡ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: jupyterlab | |
| sdk_version: "4.0.0" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| hardware: a10g-small | |
| --- | |
| # FBMC Flow Forecasting - Zero-Shot Inference | |
| Amazon Chronos 2 for cross-border capacity forecasting. | |
| ## Features | |
| - 2,553 features (615 future covariates) | |
| - 38 bidirectional border targets (19 physical borders) | |
| - 8,192-hour context window | |
| - Dynamic date-driven inference | |
| - A10G GPU acceleration | |
| ## Quick Start | |
| ### Launch JupyterLab | |
| 1. Open this Space | |
| 2. Wait for build to complete (~10-15 min first time) | |
| 3. Click "Open in JupyterLab" | |
| ### Run Inference | |
| See `notebooks/01_test_inference.ipynb` for examples. | |
| ## Data Source | |
| - **Dataset**: [YOUR_USERNAME/fbmc-features-24month](https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month) | |
| - **Size**: 25 MB (17,544 hours × 2,553 features) | |
| - **Period**: Oct 2023 - Sept 2025 | |
| ## Model | |
| - **Chronos 2 Large** (710M parameters) | |
| - **Pretrained**: amazon/chronos-t5-large | |
| - **Zero-shot**: No fine-tuning in MVP | |
| ## Cost | |
| - A10G GPU: $30/month | |
| - Storage: <1 GB (free tier) | |
| ``` | |
| ### 3.4 Push Initial Files to Space | |
| ```bash | |
| # Stage files | |
| git add README.md requirements.txt .env.example src/ | |
| # Commit | |
| git commit -m "feat: initial Space setup with A10G GPU and source code" | |
| # Push to HuggingFace | |
| git push | |
| ``` | |
| **IMPORTANT**: After pushing, the Space will rebuild (~10-15 min). Monitor the build in the Logs tab. | |
| --- | |
| ## STEP 4: Test Space Environment (10 min) | |
| ### 4.1 Wait for Build to Complete | |
| - Go to Space → Logs tab | |
| - Wait for: "Your Space is up and running" | |
| - If build fails, check requirements.txt for dependency conflicts | |
| ### 4.2 Open JupyterLab | |
| 1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-chronos2-forecast | |
| 2. Click "Open in JupyterLab" (top right) | |
| 3. JupyterLab will open in new tab | |
| ### 4.3 Create Test Notebook | |
| In JupyterLab, create `notebooks/00_test_setup.ipynb`: | |
| **Cell 1: Test GPU** | |
| ```python | |
| import torch | |
| print(f"GPU available: {torch.cuda.is_available()}") | |
| print(f"GPU device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}") | |
| print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB") | |
| ``` | |
| Expected output: | |
| ``` | |
| GPU available: True | |
| GPU device: NVIDIA A10G | |
| GPU memory: 22.73 GB | |
| ``` | |
| **Cell 2: Load Dataset** | |
| ```python | |
| from datasets import load_dataset | |
| import polars as pl | |
| # Load unified features from HF Dataset | |
| dataset = load_dataset("YOUR_USERNAME/fbmc-features-24month", split="train") | |
| df = pl.from_pandas(dataset.to_pandas()) | |
| print(f"Shape: {df.shape[0]:,} rows × {df.shape[1]:,} columns") | |
| print(f"Columns: {df.columns[:10]}") | |
| print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}") | |
| ``` | |
| Expected output: | |
| ``` | |
| Shape: 17,544 rows × 2,553 columns | |
| Columns: ['timestamp', 'cnec_t1_binding_10T-DE-FR-000068', ...] | |
| Date range: 2023-10-01 00:00:00 to 2025-09-30 23:00:00 | |
| ``` | |
| **Cell 3: Load Metadata** | |
| ```python | |
| import pandas as pd | |
| # Load metadata | |
| metadata = pd.read_csv( | |
| "hf://datasets/YOUR_USERNAME/fbmc-features-24month/metadata.csv" | |
| ) | |
| # Check future covariates | |
| future_covs = metadata[metadata['is_future_covariate'] == 'true']['feature_name'].tolist() | |
| print(f"Future covariates: {len(future_covs)}") | |
| print(f"Historical features: {len(metadata) - len(future_covs)}") | |
| print(f"\nCategories: {metadata['category'].unique()}") | |
| ``` | |
| Expected output: | |
| ``` | |
| Future covariates: 615 | |
| Historical features: 1,938 | |
| Categories: ['CNEC_Tier1', 'CNEC_Tier2', 'Weather', 'LTA', 'Temporal', ...] | |
| ``` | |
| **Cell 4: Test Chronos 2 Loading** | |
| ```python | |
| from chronos import ChronosPipeline | |
| # Load Chronos 2 Large (this will download ~3 GB on first run) | |
| print("Loading Chronos 2 Large...") | |
| pipeline = ChronosPipeline.from_pretrained( | |
| "amazon/chronos-t5-large", | |
| device_map="cuda", | |
| torch_dtype=torch.bfloat16 | |
| ) | |
| print("[OK] Chronos 2 loaded successfully") | |
| print(f"Model device: {pipeline.model.device}") | |
| ``` | |
| Expected output: | |
| ``` | |
| Loading Chronos 2 Large... | |
| [OK] Chronos 2 loaded successfully | |
| Model device: cuda:0 | |
| ``` | |
| **IMPORTANT**: The first time you load Chronos 2, it will download ~3 GB. This takes 5-10 minutes. Subsequent runs will use cached model. | |
| ### 4.4 Run All Cells | |
| - Execute all cells in order | |
| - Verify all outputs match expected results | |
| - If any cell fails, check error messages and troubleshoot | |
| --- | |
| ## STEP 5: Commit Test Notebook to Space | |
| ```bash | |
| # In JupyterLab terminal or locally | |
| git add notebooks/00_test_setup.ipynb | |
| git commit -m "test: verify GPU, data loading, and Chronos 2 model" | |
| git push | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Build Fails | |
| **Error**: `Collecting chronos-forecasting>=2.0.0: Could not find a version...` | |
| - **Fix**: Check chronos-forecasting version exists on PyPI | |
| - Try: `chronos-forecasting==2.0.0` (pin exact version) | |
| **Error**: `torch 2.0.0 conflicts with transformers...` | |
| - **Fix**: Pin compatible versions in requirements.txt | |
| - Try: `torch==2.1.0` and `transformers==4.36.0` | |
| ### GPU Not Detected | |
| **Issue**: `GPU available: False` | |
| - **Check**: Space Settings → Hardware → Should show "A10G" | |
| - **Fix**: Restart Space (Settings → Restart Space) | |
| ### Dataset Not Found | |
| **Error**: `Repository Not Found for url: https://huggingface.co/datasets/...` | |
| - **Check**: Dataset name matches in code | |
| - **Fix**: Replace `YOUR_USERNAME` with actual HuggingFace username | |
| - **Verify**: Dataset is public or HF_TOKEN is set in Space secrets | |
| ### Out of Memory | |
| **Error**: `CUDA out of memory` | |
| - **Cause**: A10G has 24 GB VRAM, may not be enough for 8,192 context + large batch | |
| - **Fix**: Reduce context window to 512 hours temporarily | |
| - **Fix**: Process borders in smaller batches (10 at a time) | |
| --- | |
| ## Next Steps (Day 3, Hours 5-8) | |
| Once the test notebook runs successfully: | |
| 1. **Hour 5-6**: Create `src/inference/data_fetcher.py` (AsOfDateFetcher class) | |
| 2. **Hour 7-8**: Create `src/inference/chronos_pipeline.py` (ChronosForecaster class) | |
| 3. **Smoke test**: Run inference on 1 border × 7 days | |
| See main implementation plan for details. | |
| --- | |
| ## Success Criteria | |
| At end of STEP 5, you should have: | |
| - [x] HF Dataset repository created and populated (3 files) | |
| - [x] HF Space created with A10G GPU ($30/month) | |
| - [x] Space secrets configured (HF_TOKEN, ENTSOE_API_KEY) | |
| - [x] Source code pushed to Space | |
| - [x] Space builds successfully (~10-15 min) | |
| - [x] JupyterLab accessible | |
| - [x] GPU detected (NVIDIA A10G, 22.73 GB) | |
| - [x] Dataset loads (17,544 × 2,553) | |
| - [x] Metadata loads (2,553 features, 615 future covariates) | |
| - [x] Chronos 2 loads successfully (~3 GB download first time) | |
| - [x] Test notebook committed to Space | |
| **Estimated time**: ~40 minutes active work + ~25 minutes waiting for builds | |
| --- | |
| **Questions?** Check HuggingFace Spaces documentation: https://huggingface.co/docs/hub/spaces | |