Spaces:
Sleeping
feat: complete weather feature engineering with simplified approach (375 features)
Browse filesFinal weather feature set after two rounds of user feedback refinement.
Weather Data Collection:
- Collected 24-month weather data (51 points × 7 vars, 9.1 MB)
- 2,703 API requests to OpenMeteo Historical API (14 min runtime)
- 100% data completeness (894,744 records)
- Date range: Oct 2023 - Sep 2025
Weather Feature Engineering (375 features):
- 357 grid-level features (51 points × 7 weather variables)
- 12 temporal lags (temp/wind/solar × 1h/6h/12h/24h)
- 6 derived features (rate-of-change + stability)
Simplification 1 - Physics → Rate-of-Change (user feedback):
Removed overly complex physics-based features requiring calibration data:
× wind_power_potential (wind^3) - requires turbine power curves
× temp_deviation - arbitrary 15C reference
× solar_efficiency - requires solar panel specs
Replaced with simple rate-of-change features (hour-over-hour deltas):
✓ wind_rate_change - captures wind spikes/drops
✓ solar_rate_change - captures solar ramps (cloud cover)
✓ temp_rate_change - captures temperature swings
Kept stability features (detect volatility):
✓ wind_stability_6h, solar_stability_6h, temp_stability_6h
Simplification 2 - Removed Zone Aggregates (user feedback):
Removed zone-level aggregates (36 features) requiring capacity weighting:
× zone_temp_*, zone_wind_*, zone_solar_* for 12 zones
× Fatal flaw: Equal weighting without knowing generation capacity
× Example: Hamburg offshore (5 GW) ≠ Munich (0.1 GW)
Rationale:
- Model learns from granular grid-level data (357 features)
- Rate-of-change captures timing of weather events → grid adjustments
- No calibration data needed (turbine curves, asset locations, capacities)
- Simpler = more interpretable = easier to debug
- Zero-shot MVP: maximize raw signal, minimize engineered assumptions
Bug Fixes:
- Unicode emoji crash (Windows cp1252 compatibility)
- Polars completeness calculation (scalar extraction)
- Polars join deprecation (outer → left with coalesce)
Files:
- scripts/collect_openmeteo_24month.py (new)
- src/data_collection/collect_openmeteo.py (bug fixes)
- src/feature_engineering/engineer_weather_features.py (new, refined twice)
- data/raw/weather_24month.parquet (9.1 MB, not in git)
- data/processed/features_weather_24month.parquet (10.19 MB, not in git)
Feature Engineering COMPLETE:
- JAO: 1,698 features
- ENTSO-E: 296 features
- Weather: 375 features
- Total: 2,369 features ready for unification
Next: Feature unification → Zero-shot inference
|
@@ -2605,3 +2605,497 @@ Cleanup logic added at line 821-880:
|
|
| 2605 |
|
| 2606 |
**Status**: ✅ ENTSO-E Features Clean & Ready - Moving to Weather Collection
|
| 2607 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2605 |
|
| 2606 |
**Status**: ✅ ENTSO-E Features Clean & Ready - Moving to Weather Collection
|
| 2607 |
|
| 2608 |
+
---
|
| 2609 |
+
|
| 2610 |
+
## 2025-11-10 (Part 2) - Weather Data Collection Infrastructure Ready
|
| 2611 |
+
|
| 2612 |
+
### Summary
|
| 2613 |
+
Prepared weather data collection infrastructure and fixed critical bugs. Ready for full 24-month collection (deferred to next session due to time constraints).
|
| 2614 |
+
|
| 2615 |
+
### Weather Collection Scope
|
| 2616 |
+
**Target**: 52 strategic grid points × 7 weather variables × 24 months
|
| 2617 |
+
|
| 2618 |
+
**Grid Coverage**:
|
| 2619 |
+
- Germany: 6 points (North Sea, Hamburg, Berlin, Frankfurt, Munich, Baltic)
|
| 2620 |
+
- France: 5 points (Dunkirk, Paris, Lyon, Marseille, Strasbourg)
|
| 2621 |
+
- Netherlands: 4 points (Offshore, Amsterdam, Rotterdam, Groningen)
|
| 2622 |
+
- Austria: 3 points (Kaprun, St. Peter, Vienna)
|
| 2623 |
+
- Belgium: 3 points (Offshore, Doel, Avelgem)
|
| 2624 |
+
- Czech Republic: 3 points (Hradec, Bohemia, Temelin)
|
| 2625 |
+
- Poland: 4 points (Baltic, SHVDC, Belchatow, Mikulowa)
|
| 2626 |
+
- Hungary: 3 points (Paks, Bekescsaba, Gyor)
|
| 2627 |
+
- Romania: 3 points (Fantanele, Iron Gates, Cernavoda)
|
| 2628 |
+
- Slovakia: 3 points (Bohunice, Gabcikovo, Rimavska)
|
| 2629 |
+
- Slovenia: 2 points (Krsko, Divaca)
|
| 2630 |
+
- Croatia: 2 points (Ernestinovo, Zagreb)
|
| 2631 |
+
- Luxembourg: 2 points (Trier, Bauler)
|
| 2632 |
+
- External: 8 points (CH, UK, ES, IT, NO, SE, DK×2)
|
| 2633 |
+
|
| 2634 |
+
**Weather Variables**:
|
| 2635 |
+
- `temperature_2m`: Air temperature (C)
|
| 2636 |
+
- `windspeed_10m`: Wind at 10m (m/s)
|
| 2637 |
+
- `windspeed_100m`: Wind at 100m for generation (m/s)
|
| 2638 |
+
- `winddirection_100m`: Wind direction (degrees)
|
| 2639 |
+
- `shortwave_radiation`: Solar radiation (W/m2)
|
| 2640 |
+
- `cloudcover`: Cloud cover (%)
|
| 2641 |
+
- `surface_pressure`: Pressure (hPa)
|
| 2642 |
+
|
| 2643 |
+
**Collection Strategy**:
|
| 2644 |
+
- OpenMeteo Historical API (free tier)
|
| 2645 |
+
- 2-week chunks (1.0 API call each)
|
| 2646 |
+
- 270 requests/minute (45% of 600/min limit)
|
| 2647 |
+
- Total: 2,703 HTTP requests
|
| 2648 |
+
- Estimated runtime: 10 minutes
|
| 2649 |
+
- Expected output: ~50-80 MB parquet file
|
| 2650 |
+
|
| 2651 |
+
### Bugs Discovered and Fixed
|
| 2652 |
+
|
| 2653 |
+
#### Bug 1: Unicode Emoji in Windows Console
|
| 2654 |
+
**Problem**:
|
| 2655 |
+
- Windows cmd.exe uses cp1252 encoding (not UTF-8)
|
| 2656 |
+
- Emojis (✓, ✗, ✅) in progress messages caused `UnicodeEncodeError`
|
| 2657 |
+
- Collection crashed at 15% after successfully fetching data
|
| 2658 |
+
|
| 2659 |
+
**Root Cause**:
|
| 2660 |
+
```python
|
| 2661 |
+
# Line 281, 347, 372 in collect_openmeteo.py
|
| 2662 |
+
print(f"✅ {location_id}: {location_df.shape[0]} hours") # BROKEN
|
| 2663 |
+
print(f"❌ Failed {location_id}") # BROKEN
|
| 2664 |
+
```
|
| 2665 |
+
|
| 2666 |
+
**Fix Applied**:
|
| 2667 |
+
```python
|
| 2668 |
+
print(f"[OK] {location_id}: {location_df.shape[0]} hours")
|
| 2669 |
+
print(f"[ERROR] Failed {location_id}")
|
| 2670 |
+
```
|
| 2671 |
+
|
| 2672 |
+
**Files Modified**: `src/data_collection/collect_openmeteo.py:281,347,372`
|
| 2673 |
+
|
| 2674 |
+
#### Bug 2: Polars Completeness Calculation
|
| 2675 |
+
**Problem**:
|
| 2676 |
+
- Line 366: `combined_df.null_count().sum()` returns DataFrame (not scalar)
|
| 2677 |
+
- Type error: `unsupported operand type(s) for -: 'int' and 'DataFrame'`
|
| 2678 |
+
- Collection completed 100% but failed at final save step
|
| 2679 |
+
- All 894,744 records collected but lost (not written to disk)
|
| 2680 |
+
|
| 2681 |
+
**Root Cause**:
|
| 2682 |
+
```python
|
| 2683 |
+
# BROKEN - Polars returns DataFrame
|
| 2684 |
+
completeness = (1 - combined_df.null_count().sum() / (rows * cols)) * 100
|
| 2685 |
+
```
|
| 2686 |
+
|
| 2687 |
+
**Fix Applied**:
|
| 2688 |
+
```python
|
| 2689 |
+
# Extract scalar from Polars
|
| 2690 |
+
null_count_total = combined_df.null_count().sum_horizontal()[0]
|
| 2691 |
+
completeness = (1 - null_count_total / (rows * cols)) * 100
|
| 2692 |
+
```
|
| 2693 |
+
|
| 2694 |
+
**Files Modified**: `src/data_collection/collect_openmeteo.py:366-370`
|
| 2695 |
+
|
| 2696 |
+
### Test Results
|
| 2697 |
+
|
| 2698 |
+
**Test Scope**: 1 week × 51 grid points (minimal test)
|
| 2699 |
+
```bash
|
| 2700 |
+
Date range: 2025-09-23 to 2025-09-30
|
| 2701 |
+
Grid points: 51
|
| 2702 |
+
Total records: 9,792 (192 hours each)
|
| 2703 |
+
Test duration: ~20 seconds
|
| 2704 |
+
```
|
| 2705 |
+
|
| 2706 |
+
**Test Output**:
|
| 2707 |
+
```
|
| 2708 |
+
Total HTTP requests: 51
|
| 2709 |
+
Total API calls consumed: 51.0
|
| 2710 |
+
Total records: 9,792
|
| 2711 |
+
Date range: 2025-09-23 00:00:00 to 2025-09-30 23:00:00
|
| 2712 |
+
Grid points: 51
|
| 2713 |
+
Completeness: 100.00% ✅
|
| 2714 |
+
Output: test_weather.parquet
|
| 2715 |
+
File size: 0.1 MB
|
| 2716 |
+
```
|
| 2717 |
+
|
| 2718 |
+
**Validation**:
|
| 2719 |
+
- ✅ All 51 grid points collected successfully
|
| 2720 |
+
- ✅ 100% data completeness (no missing values)
|
| 2721 |
+
- ✅ File saved and loaded correctly
|
| 2722 |
+
- ✅ No errors or crashes
|
| 2723 |
+
- ✅ Test file cleaned up
|
| 2724 |
+
|
| 2725 |
+
### Files Modified
|
| 2726 |
+
|
| 2727 |
+
**Scripts Created**:
|
| 2728 |
+
- `scripts/collect_openmeteo_24month.py` - 24-month collection script
|
| 2729 |
+
- Uses existing `OpenMeteoCollector` class
|
| 2730 |
+
- 2-week chunking
|
| 2731 |
+
- Progress tracking with tqdm
|
| 2732 |
+
- Output: `data/raw/weather_24month.parquet`
|
| 2733 |
+
|
| 2734 |
+
**Bug Fixes**:
|
| 2735 |
+
- `src/data_collection/collect_openmeteo.py:281,347,372` - Removed Unicode emojis
|
| 2736 |
+
- `src/data_collection/collect_openmeteo.py:366-370` - Fixed Polars completeness calculation
|
| 2737 |
+
|
| 2738 |
+
### Current Status
|
| 2739 |
+
|
| 2740 |
+
**Weather Infrastructure**: ✅ Complete and Tested
|
| 2741 |
+
- Collection script ready
|
| 2742 |
+
- All bugs fixed
|
| 2743 |
+
- Tested successfully with 1-week sample
|
| 2744 |
+
- Ready for full 24-month collection
|
| 2745 |
+
|
| 2746 |
+
**Data Collected**:
|
| 2747 |
+
- JAO: ✅ 1,698 features (24 months)
|
| 2748 |
+
- ENTSO-E: ✅ 296 features (24 months)
|
| 2749 |
+
- Weather: ⏳ Pending (infrastructure ready, ~10 min runtime)
|
| 2750 |
+
|
| 2751 |
+
**Why Deferred**:
|
| 2752 |
+
User had time constraints - weather collection requires ~10 minutes uninterrupted runtime.
|
| 2753 |
+
|
| 2754 |
+
### Next Session Workflow
|
| 2755 |
+
|
| 2756 |
+
**IMMEDIATE ACTION** (when you return):
|
| 2757 |
+
```bash
|
| 2758 |
+
# Run 24-month weather collection (~10 minutes)
|
| 2759 |
+
.venv/Scripts/python.exe scripts/collect_openmeteo_24month.py
|
| 2760 |
+
```
|
| 2761 |
+
|
| 2762 |
+
**Expected Output**:
|
| 2763 |
+
- File: `data/raw/weather_24month.parquet`
|
| 2764 |
+
- Size: 50-80 MB
|
| 2765 |
+
- Records: ~894,744 (51 points × 17,544 hours)
|
| 2766 |
+
- Features (raw): 12 columns (timestamp, grid_point, location_name, lat, lon, + 7 weather vars)
|
| 2767 |
+
|
| 2768 |
+
**After Weather Collection**:
|
| 2769 |
+
1. **Feature Engineering** - Weather features (~364 features)
|
| 2770 |
+
- Grid-level: `temp_{grid}`, `wind_{grid}`, `solar_{grid}` (51 × 7 = 357)
|
| 2771 |
+
- Zone-level aggregation: `temp_avg_{zone}`, `wind_avg_{zone}` (optional)
|
| 2772 |
+
- Lags: Previous 1h, 6h, 12h, 24h (key variables only)
|
| 2773 |
+
|
| 2774 |
+
2. **Feature Unification** - Merge all sources
|
| 2775 |
+
- JAO: 1,698 features
|
| 2776 |
+
- ENTSO-E: 296 features
|
| 2777 |
+
- Weather: ~364 features
|
| 2778 |
+
- **Total: ~2,358 unified features**
|
| 2779 |
+
|
| 2780 |
+
3. **Day 3: Zero-Shot Inference**
|
| 2781 |
+
- Load Chronos 2 Large (710M params)
|
| 2782 |
+
- Run inference on unified feature set
|
| 2783 |
+
- Evaluate D+1 MAE (target: <150 MW)
|
| 2784 |
+
|
| 2785 |
+
### Lessons Learned
|
| 2786 |
+
|
| 2787 |
+
1. **Windows Console Limitations**: Never use Unicode characters in backend scripts on Windows
|
| 2788 |
+
- Use ASCII alternatives: `[OK]`, `[ERROR]`, `[SUCCESS]`
|
| 2789 |
+
- Emojis OK in: Marimo notebooks (browser-rendered), documentation
|
| 2790 |
+
|
| 2791 |
+
2. **Polars API Differences**: Always extract scalars explicitly
|
| 2792 |
+
- `.sum()` returns DataFrame in Polars
|
| 2793 |
+
- Use `.sum_horizontal()[0]` to get scalar value
|
| 2794 |
+
|
| 2795 |
+
3. **Test Before Full Collection**: Quick tests save hours
|
| 2796 |
+
- 20-second test caught a bug that would have lost 10 minutes of collection
|
| 2797 |
+
- Always test with minimal data (1 week vs 24 months)
|
| 2798 |
+
|
| 2799 |
+
### Git Status
|
| 2800 |
+
|
| 2801 |
+
**Committed**: ENTSO-E quality fixes (previous session)
|
| 2802 |
+
**Uncommitted**: Weather collection bug fixes (ready to commit)
|
| 2803 |
+
|
| 2804 |
+
**Next Commit** (after weather collection completes):
|
| 2805 |
+
```
|
| 2806 |
+
feat: complete weather data collection with bug fixes
|
| 2807 |
+
|
| 2808 |
+
- Fixed Unicode emoji crash (Windows cp1252 compatibility)
|
| 2809 |
+
- Fixed Polars completeness calculation
|
| 2810 |
+
- Collected 24-month weather data (51 points × 7 vars)
|
| 2811 |
+
- Created scripts/collect_openmeteo_24month.py
|
| 2812 |
+
- Output: data/raw/weather_24month.parquet (~50-80 MB)
|
| 2813 |
+
|
| 2814 |
+
Next: Weather feature engineering (~364 features)
|
| 2815 |
+
```
|
| 2816 |
+
|
| 2817 |
+
### Summary Statistics
|
| 2818 |
+
|
| 2819 |
+
**Project Progress**:
|
| 2820 |
+
- Day 0: ✅ Setup complete
|
| 2821 |
+
- Day 1: ✅ Data collection (JAO, ENTSO-E complete; Weather ready)
|
| 2822 |
+
- Day 2: 🔄 Feature engineering (JAO ✅, ENTSO-E ✅, Weather ⏳)
|
| 2823 |
+
- Day 3: ⏳ Zero-shot inference (pending)
|
| 2824 |
+
- Day 4: ⏳ Evaluation (pending)
|
| 2825 |
+
- Day 5: ⏳ Documentation (pending)
|
| 2826 |
+
|
| 2827 |
+
**Feature Count Tracking**:
|
| 2828 |
+
- JAO: 1,698 ✅
|
| 2829 |
+
- ENTSO-E: 296 ✅ (cleaned from 464)
|
| 2830 |
+
- Weather: 364 ⏳ (infrastructure ready)
|
| 2831 |
+
- **Projected Total: ~2,358 features**
|
| 2832 |
+
|
| 2833 |
+
**Data Quality**:
|
| 2834 |
+
- JAO: 100% complete
|
| 2835 |
+
- ENTSO-E: 99.76% complete
|
| 2836 |
+
- Weather: TBD (expect >99% based on test)
|
| 2837 |
+
|
| 2838 |
+
---
|
| 2839 |
+
|
| 2840 |
+
## 2025-11-10 (Part 3) - Weather Feature Engineering Complete
|
| 2841 |
+
|
| 2842 |
+
### Summary
|
| 2843 |
+
Completed weather data collection and feature engineering. All three feature sets (JAO, ENTSO-E, Weather) are now ready for unification.
|
| 2844 |
+
|
| 2845 |
+
### Weather Data Collection
|
| 2846 |
+
**Execution**:
|
| 2847 |
+
- Ran `scripts/collect_openmeteo_24month.py`
|
| 2848 |
+
- Collection time: 14 minutes (2,703 API requests)
|
| 2849 |
+
- 51 grid points × 53 two-week chunks × 7 variables
|
| 2850 |
+
|
| 2851 |
+
**Results**:
|
| 2852 |
+
- ✅ 894,744 records collected (51 points × 17,544 hours)
|
| 2853 |
+
- ✅ 100% data completeness
|
| 2854 |
+
- ✅ File: `data/raw/weather_24month.parquet` (9.1 MB)
|
| 2855 |
+
- ✅ Date range: Oct 2023 - Sep 2025 (24 months)
|
| 2856 |
+
|
| 2857 |
+
**Bug Fixed** (post-collection):
|
| 2858 |
+
- Line 85-86 in script still had completeness calculation bug
|
| 2859 |
+
- Fixed `.sum()` to `.sum_horizontal()[0]` for scalar extraction
|
| 2860 |
+
- Data was saved successfully despite error
|
| 2861 |
+
|
| 2862 |
+
### Weather Feature Engineering
|
| 2863 |
+
**Created**: `src/feature_engineering/engineer_weather_features.py`
|
| 2864 |
+
|
| 2865 |
+
**Features Engineered** (411 total):
|
| 2866 |
+
1. **Grid-level features** (357): 51 grid points × 7 weather variables
|
| 2867 |
+
- temp_<grid_point>, wind10m_<grid_point>, wind100m_<grid_point>
|
| 2868 |
+
- winddir_<grid_point>, solar_<grid_point>, cloud_<grid_point>, pressure_<grid_point>
|
| 2869 |
+
|
| 2870 |
+
2. **Zone-level aggregates** (36): 12 Core FBMC zones × 3 key variables
|
| 2871 |
+
- zone_temp_<zone>, zone_wind_<zone>, zone_solar_<zone>
|
| 2872 |
+
|
| 2873 |
+
3. **Temporal lags** (12): 3 variables × 4 time periods
|
| 2874 |
+
- temp_avg_lag1h/6h/12h/24h
|
| 2875 |
+
- wind_avg_lag1h/6h/12h/24h
|
| 2876 |
+
- solar_avg_lag1h/6h/12h/24h
|
| 2877 |
+
|
| 2878 |
+
4. **Derived features** (6):
|
| 2879 |
+
- wind_power_potential (wind^3, proportional to turbine output)
|
| 2880 |
+
- temp_deviation (deviation from 15C reference)
|
| 2881 |
+
- solar_efficiency (solar output adjusted for temperature)
|
| 2882 |
+
- wind_stability_6h, solar_stability_6h, temp_stability_6h (rolling std)
|
| 2883 |
+
|
| 2884 |
+
**Output**:
|
| 2885 |
+
- File: `data/processed/features_weather_24month.parquet`
|
| 2886 |
+
- Size: 11.48 MB
|
| 2887 |
+
- Shape: 17,544 rows × 412 columns (411 features + timestamp)
|
| 2888 |
+
- Completeness: 100%
|
| 2889 |
+
|
| 2890 |
+
**Bugs Fixed During Development**:
|
| 2891 |
+
1. **Polars join deprecation**: Changed `how='outer'` to `how='left'` with `coalesce=True`
|
| 2892 |
+
2. **Duplicate timestamp columns**: Used coalesce to prevent `timestamp_right` duplicates
|
| 2893 |
+
|
| 2894 |
+
### Files Created
|
| 2895 |
+
- `scripts/collect_openmeteo_24month.py` (fixed bugs)
|
| 2896 |
+
- `src/feature_engineering/engineer_weather_features.py` (new)
|
| 2897 |
+
- `data/raw/weather_24month.parquet` (9.1 MB)
|
| 2898 |
+
- `data/processed/features_weather_24month.parquet` (11.48 MB)
|
| 2899 |
+
|
| 2900 |
+
### Feature Count Update
|
| 2901 |
+
**Final Feature Inventory**:
|
| 2902 |
+
- JAO: 1,698 ✅ Complete
|
| 2903 |
+
- ENTSO-E: 296 ✅ Complete
|
| 2904 |
+
- Weather: 411 ✅ Complete
|
| 2905 |
+
- **Total: 2,405 features** (vs target ~1,735 = +39%)
|
| 2906 |
+
|
| 2907 |
+
### Key Lessons
|
| 2908 |
+
1. **Polars API Evolution**: Deprecation warnings for join methods
|
| 2909 |
+
- `how='outer'` → `how='left'` with `coalesce=True`
|
| 2910 |
+
- Prevents duplicate columns in sequential joins
|
| 2911 |
+
|
| 2912 |
+
2. **Feature Engineering Approach**:
|
| 2913 |
+
- Grid-level: Maximum spatial resolution (51 points)
|
| 2914 |
+
- Zone-level: Aggregated for regional patterns
|
| 2915 |
+
- Temporal lags: Capture weather persistence
|
| 2916 |
+
- Derived: Physical relationships (wind^3 for power, temp effects on solar)
|
| 2917 |
+
|
| 2918 |
+
3. **Data Completeness**: 100% across all three feature sets
|
| 2919 |
+
- No missing values to impute
|
| 2920 |
+
- Ready for direct model input
|
| 2921 |
+
|
| 2922 |
+
### Git Status
|
| 2923 |
+
**Ready to commit**:
|
| 2924 |
+
- Weather collection script (bug fixes)
|
| 2925 |
+
- Weather feature engineering module
|
| 2926 |
+
- Two new parquet files (raw + processed)
|
| 2927 |
+
|
| 2928 |
+
**Next Commit**:
|
| 2929 |
+
```bash
|
| 2930 |
+
feat: complete weather feature engineering (411 features)
|
| 2931 |
+
|
| 2932 |
+
- Collected 24-month weather data (51 points × 7 vars, 9.1 MB)
|
| 2933 |
+
- Engineered 411 weather features (100% complete)
|
| 2934 |
+
* 357 grid-level features
|
| 2935 |
+
* 36 zone-level aggregates
|
| 2936 |
+
* 12 temporal lags (1h/6h/12h/24h)
|
| 2937 |
+
* 6 derived features (wind power, solar efficiency, stability)
|
| 2938 |
+
- Created src/feature_engineering/engineer_weather_features.py
|
| 2939 |
+
- Output: data/processed/features_weather_24month.parquet (11.48 MB)
|
| 2940 |
+
|
| 2941 |
+
Feature engineering COMPLETE:
|
| 2942 |
+
- JAO: 1,698 features
|
| 2943 |
+
- ENTSO-E: 296 features
|
| 2944 |
+
- Weather: 411 features
|
| 2945 |
+
- Total: 2,405 features ready for unification
|
| 2946 |
+
|
| 2947 |
+
Next: Feature unification → Zero-shot inference
|
| 2948 |
+
```
|
| 2949 |
+
|
| 2950 |
+
### Summary Statistics
|
| 2951 |
+
**Project Progress**:
|
| 2952 |
+
- Day 0: ✅ Setup complete
|
| 2953 |
+
- Day 1: ✅ Data collection complete (JAO, ENTSO-E, Weather)
|
| 2954 |
+
- Day 2: ✅ Feature engineering complete (JAO, ENTSO-E, Weather)
|
| 2955 |
+
- Day 3: ⏳ Feature unification → Zero-shot inference
|
| 2956 |
+
- Day 4: ⏳ Evaluation
|
| 2957 |
+
- Day 5: ⏳ Documentation + handover
|
| 2958 |
+
|
| 2959 |
+
**Feature Count (Final)**:
|
| 2960 |
+
- JAO: 1,698 ✅
|
| 2961 |
+
- ENTSO-E: 296 ✅
|
| 2962 |
+
- Weather: 411 ✅
|
| 2963 |
+
- **Total: 2,405 features** (39% above target)
|
| 2964 |
+
|
| 2965 |
+
**Data Quality**:
|
| 2966 |
+
- JAO: 100% complete
|
| 2967 |
+
- ENTSO-E: 99.76% complete
|
| 2968 |
+
- Weather: 100% complete
|
| 2969 |
+
|
| 2970 |
+
---
|
| 2971 |
+
|
| 2972 |
+
## 2025-11-10 (Part 4) - Simplified Weather Features (Physics → Rate-of-Change)
|
| 2973 |
+
|
| 2974 |
+
### Summary
|
| 2975 |
+
Replaced overly complex physics-based features with simple rate-of-change features based on user feedback.
|
| 2976 |
+
|
| 2977 |
+
### Problem Identified
|
| 2978 |
+
**User feedback**: Original derived features were too complex without calibration data:
|
| 2979 |
+
- `wind_power_potential` (wind^3) - requires turbine power curves
|
| 2980 |
+
- `temp_deviation` (from 15C) - arbitrary reference point
|
| 2981 |
+
- `solar_efficiency` (temp-adjusted) - requires solar panel specifications
|
| 2982 |
+
|
| 2983 |
+
These require geographic knowledge, power curves, and equipment specs we don't have.
|
| 2984 |
+
|
| 2985 |
+
### Solution Applied
|
| 2986 |
+
**Replaced 3 complex features with 3 simple rate-of-change features:**
|
| 2987 |
+
|
| 2988 |
+
**Removed:**
|
| 2989 |
+
1. `wind_power_potential` (wind^3 transformation)
|
| 2990 |
+
2. `temp_deviation` (arbitrary 15C reference)
|
| 2991 |
+
3. `solar_efficiency` (requires solar panel specs)
|
| 2992 |
+
|
| 2993 |
+
**Added (hour-over-hour deltas):**
|
| 2994 |
+
1. `wind_rate_change` - captures wind spikes/drops
|
| 2995 |
+
2. `solar_rate_change` - captures solar ramps (cloud cover)
|
| 2996 |
+
3. `temp_rate_change` - captures temperature swings
|
| 2997 |
+
|
| 2998 |
+
**Kept (stability metrics - useful for volatility):**
|
| 2999 |
+
1. `wind_stability_6h` (rolling std)
|
| 3000 |
+
2. `solar_stability_6h` (rolling std)
|
| 3001 |
+
3. `temp_stability_6h` (rolling std)
|
| 3002 |
+
|
| 3003 |
+
### Rationale
|
| 3004 |
+
**Rate-of-change features capture what matters:**
|
| 3005 |
+
- Sudden wind spikes → wind generation ramping → redispatch
|
| 3006 |
+
- Solar drops (clouds) → solar generation drops → grid adjustments
|
| 3007 |
+
- Temperature swings → demand shifts → flow changes
|
| 3008 |
+
|
| 3009 |
+
**No calibration data needed:**
|
| 3010 |
+
- Model learns physics from raw grid-level data (357 features)
|
| 3011 |
+
- Rate-of-change provides timing signals for correlation
|
| 3012 |
+
- Simpler features = more interpretable = easier to debug
|
| 3013 |
+
|
| 3014 |
+
### Results
|
| 3015 |
+
**Re-ran feature engineering:**
|
| 3016 |
+
- Total features: 411 (unchanged)
|
| 3017 |
+
- Derived features: 6 (3 rate-of-change + 3 stability)
|
| 3018 |
+
- File size: 11.41 MB (0.07 MB smaller)
|
| 3019 |
+
- Completeness: 100%
|
| 3020 |
+
|
| 3021 |
+
### Key Lesson
|
| 3022 |
+
**Simplicity over complexity in zero-shot MVP:**
|
| 3023 |
+
- Don't attempt to encode domain physics without calibration data
|
| 3024 |
+
- Let the model learn complex relationships from raw signals
|
| 3025 |
+
- Use simple derived features (deltas, rolling stats) for timing/volatility
|
| 3026 |
+
- Save physics-based features for Phase 2 when we have equipment data
|
| 3027 |
+
|
| 3028 |
+
---
|
| 3029 |
+
|
| 3030 |
+
## 2025-11-10 (Part 5) - Removed Zone Aggregates (Final: 375 Weather Features)
|
| 3031 |
+
|
| 3032 |
+
### Summary
|
| 3033 |
+
Removed zone-level aggregate features (36 features) due to lack of capacity weighting data.
|
| 3034 |
+
|
| 3035 |
+
### Problem Identified
|
| 3036 |
+
**User feedback**: Zone aggregates assume equal weighting without capacity data:
|
| 3037 |
+
- Averaging wind speed across DE_LU grid points (6 locations)
|
| 3038 |
+
- No knowledge of actual generation capacity at each location
|
| 3039 |
+
- Hamburg offshore: 5 GW vs Munich: 0.1 GW → equal averaging = meaningless
|
| 3040 |
+
|
| 3041 |
+
**Fatal flaw**: Without knowing WHERE wind farms/solar parks are located and their CAPACITY, zone averages add noise instead of signal.
|
| 3042 |
+
|
| 3043 |
+
### Solution Applied
|
| 3044 |
+
**Removed zone aggregation entirely:**
|
| 3045 |
+
- Deleted `engineer_zone_aggregates()` function
|
| 3046 |
+
- Removed 36 features (12 zones × 3 variables)
|
| 3047 |
+
- Deleted GRID_POINT_TO_ZONE mapping (unused)
|
| 3048 |
+
|
| 3049 |
+
**Final Feature Set (375 features):**
|
| 3050 |
+
1. **Grid-level**: 357 features (51 points × 7 variables)
|
| 3051 |
+
- Model learns which specific locations correlate with flows
|
| 3052 |
+
2. **Temporal lags**: 12 features (3 variables × 4 time periods)
|
| 3053 |
+
- Captures weather persistence
|
| 3054 |
+
3. **Derived**: 6 features (rate-of-change + stability)
|
| 3055 |
+
- Simple signals without requiring calibration data
|
| 3056 |
+
|
| 3057 |
+
### Rationale
|
| 3058 |
+
**Let the model find the important locations:**
|
| 3059 |
+
- 51 grid-level features give model full spatial resolution
|
| 3060 |
+
- Model can learn which points have generation assets
|
| 3061 |
+
- No false precision from unweighted aggregation
|
| 3062 |
+
- Cleaner signal for zero-shot learning
|
| 3063 |
+
|
| 3064 |
+
### Results
|
| 3065 |
+
**Re-ran feature engineering:**
|
| 3066 |
+
- Total features: 375 (down from 411, -36)
|
| 3067 |
+
- File size: 10.19 MB (down from 11.41 MB, -1.22 MB)
|
| 3068 |
+
- Completeness: 100%
|
| 3069 |
+
|
| 3070 |
+
### Key Lesson
|
| 3071 |
+
**Avoid aggregation without domain knowledge:**
|
| 3072 |
+
- Equal weighting ≠ capacity-weighted average
|
| 3073 |
+
- Geographic averages require knowing asset locations and capacities
|
| 3074 |
+
- When in doubt, keep granular data and let the model learn patterns
|
| 3075 |
+
- Zero-shot MVP: maximize raw signal, minimize engineered assumptions
|
| 3076 |
+
|
| 3077 |
+
### Final Weather Features Breakdown
|
| 3078 |
+
1. **Grid-level (357)**:
|
| 3079 |
+
- temp_*, wind10m_*, wind100m_*, winddir_*
|
| 3080 |
+
- solar_*, cloud_*, pressure_* for each of 51 grid points
|
| 3081 |
+
|
| 3082 |
+
2. **Temporal lags (12)**:
|
| 3083 |
+
- temp_avg_lag1h/6h/12h/24h
|
| 3084 |
+
- wind_avg_lag1h/6h/12h/24h
|
| 3085 |
+
- solar_avg_lag1h/6h/12h/24h
|
| 3086 |
+
|
| 3087 |
+
3. **Derived (6)**:
|
| 3088 |
+
- wind_rate_change, solar_rate_change, temp_rate_change (hour-over-hour)
|
| 3089 |
+
- wind_stability_6h, solar_stability_6h, temp_stability_6h (rolling std)
|
| 3090 |
+
|
| 3091 |
+
---
|
| 3092 |
+
|
| 3093 |
+
**NEXT SESSION BOOKMARK**: Feature unification (merge 2,369 features on timestamp), then zero-shot inference
|
| 3094 |
+
|
| 3095 |
+
**Status**: ✅ All Feature Engineering Complete - Ready for Unification
|
| 3096 |
+
|
| 3097 |
+
**Final Feature Count**:
|
| 3098 |
+
- JAO: 1,698
|
| 3099 |
+
- ENTSO-E: 296
|
| 3100 |
+
- Weather: 375
|
| 3101 |
+
- **Total: 2,369 features** (down from 2,405)
|
|
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Collect 24-Month Weather Data from OpenMeteo
|
| 3 |
+
=============================================
|
| 4 |
+
|
| 5 |
+
Collects hourly weather data from OpenMeteo Historical API for the full
|
| 6 |
+
24-month period (Oct 2023 - Sept 2025) across 52 strategic grid points.
|
| 7 |
+
|
| 8 |
+
7 Weather Variables:
|
| 9 |
+
- temperature_2m: Air temperature at 2m (C)
|
| 10 |
+
- windspeed_10m: Wind speed at 10m (m/s)
|
| 11 |
+
- windspeed_100m: Wind speed at 100m (m/s) - for wind generation
|
| 12 |
+
- winddirection_100m: Wind direction at 100m (degrees)
|
| 13 |
+
- shortwave_radiation: Solar radiation (W/m2) - for solar generation
|
| 14 |
+
- cloudcover: Cloud cover percentage
|
| 15 |
+
- surface_pressure: Surface air pressure (hPa)
|
| 16 |
+
|
| 17 |
+
Collection Strategy:
|
| 18 |
+
- 52 grid points (covering all FBMC zones + neighbors)
|
| 19 |
+
- 2-week chunks (1.0 API call each)
|
| 20 |
+
- 270 requests/minute (45% of 600 limit)
|
| 21 |
+
- Estimated runtime: ~5 minutes
|
| 22 |
+
|
| 23 |
+
Output: data/raw/weather_24month.parquet
|
| 24 |
+
Size: ~50-80 MB (52 points × 7 vars × 17,520 hours)
|
| 25 |
+
Features: 364 (52 × 7) when engineered
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
import sys
|
| 29 |
+
from pathlib import Path
|
| 30 |
+
|
| 31 |
+
# Add src to path
|
| 32 |
+
sys.path.append(str(Path(__file__).parent.parent))
|
| 33 |
+
|
| 34 |
+
from src.data_collection.collect_openmeteo import OpenMeteoCollector
|
| 35 |
+
|
| 36 |
+
# Date range: Oct 2023 - Sept 2025 (24 months)
|
| 37 |
+
START_DATE = '2023-10-01'
|
| 38 |
+
END_DATE = '2025-09-30'
|
| 39 |
+
|
| 40 |
+
# Output file
|
| 41 |
+
OUTPUT_DIR = Path(__file__).parent.parent / 'data' / 'raw'
|
| 42 |
+
OUTPUT_FILE = OUTPUT_DIR / 'weather_24month.parquet'
|
| 43 |
+
|
| 44 |
+
print("="*80)
|
| 45 |
+
print("24-MONTH WEATHER DATA COLLECTION")
|
| 46 |
+
print("="*80)
|
| 47 |
+
print()
|
| 48 |
+
print("Period: October 2023 - September 2025 (24 months)")
|
| 49 |
+
print("Grid points: 52 strategic locations across FBMC")
|
| 50 |
+
print("Variables: 7 weather parameters")
|
| 51 |
+
print("Estimated runtime: ~5 minutes")
|
| 52 |
+
print()
|
| 53 |
+
|
| 54 |
+
# Initialize collector with safe rate limiting
|
| 55 |
+
print("Initializing OpenMeteo collector...")
|
| 56 |
+
collector = OpenMeteoCollector(
|
| 57 |
+
requests_per_minute=270, # 45% of 600 limit
|
| 58 |
+
chunk_days=14 # 1.0 API call per request
|
| 59 |
+
)
|
| 60 |
+
print("[OK] Collector initialized")
|
| 61 |
+
print()
|
| 62 |
+
|
| 63 |
+
# Run collection
|
| 64 |
+
try:
|
| 65 |
+
df = collector.collect_all(
|
| 66 |
+
start_date=START_DATE,
|
| 67 |
+
end_date=END_DATE,
|
| 68 |
+
output_path=OUTPUT_FILE
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
if not df.is_empty():
|
| 72 |
+
print()
|
| 73 |
+
print("="*80)
|
| 74 |
+
print("COLLECTION SUCCESS")
|
| 75 |
+
print("="*80)
|
| 76 |
+
print()
|
| 77 |
+
print(f"Output: {OUTPUT_FILE}")
|
| 78 |
+
print(f"Shape: {df.shape[0]:,} rows x {df.shape[1]} columns")
|
| 79 |
+
print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
|
| 80 |
+
print(f"Grid points: {df['grid_point'].n_unique()}")
|
| 81 |
+
print(f"Weather variables: {len([c for c in df.columns if c not in ['timestamp', 'grid_point', 'location_name', 'latitude', 'longitude']])}")
|
| 82 |
+
print()
|
| 83 |
+
|
| 84 |
+
# Data quality summary
|
| 85 |
+
null_count_total = df.null_count().sum_horizontal()[0]
|
| 86 |
+
null_pct = (null_count_total / (df.shape[0] * df.shape[1])) * 100
|
| 87 |
+
print(f"Data completeness: {100 - null_pct:.2f}%")
|
| 88 |
+
|
| 89 |
+
if null_pct > 0:
|
| 90 |
+
print()
|
| 91 |
+
print("Missing data by column:")
|
| 92 |
+
for col in df.columns:
|
| 93 |
+
null_count = df[col].null_count()
|
| 94 |
+
if null_count > 0:
|
| 95 |
+
pct = (null_count / len(df)) * 100
|
| 96 |
+
print(f" - {col}: {null_count:,} ({pct:.2f}%)")
|
| 97 |
+
|
| 98 |
+
print()
|
| 99 |
+
print("="*80)
|
| 100 |
+
print("NEXT STEPS")
|
| 101 |
+
print("="*80)
|
| 102 |
+
print()
|
| 103 |
+
print("1. Implement weather feature engineering:")
|
| 104 |
+
print(" - Create src/feature_engineering/engineer_weather_features.py")
|
| 105 |
+
print(" - Engineer ~364 features (52 grid points x 7 variables)")
|
| 106 |
+
print(" - Add spatial aggregation (zone-level averages)")
|
| 107 |
+
print()
|
| 108 |
+
print("2. Expected features:")
|
| 109 |
+
print(" - Grid-level: temp_{grid_point}, wind_{grid_point}, solar_{grid_point}, etc.")
|
| 110 |
+
print(" - Zone-level: temp_avg_{zone}, wind_avg_{zone}, solar_avg_{zone}, etc.")
|
| 111 |
+
print(" - Lags: Previous 1h, 6h, 12h, 24h for key variables")
|
| 112 |
+
print()
|
| 113 |
+
print("3. Final unified features:")
|
| 114 |
+
print(" - JAO: 1,698")
|
| 115 |
+
print(" - ENTSO-E: 296")
|
| 116 |
+
print(" - Weather: 364")
|
| 117 |
+
print(" - Total: ~2,358 features")
|
| 118 |
+
print()
|
| 119 |
+
print("[OK] Weather data collection COMPLETE!")
|
| 120 |
+
else:
|
| 121 |
+
print()
|
| 122 |
+
print("[ERROR] No weather data collected")
|
| 123 |
+
print()
|
| 124 |
+
print("Possible causes:")
|
| 125 |
+
print(" - OpenMeteo API access issues")
|
| 126 |
+
print(" - Rate limit exceeded")
|
| 127 |
+
print(" - Network connectivity problems")
|
| 128 |
+
print()
|
| 129 |
+
sys.exit(1)
|
| 130 |
+
|
| 131 |
+
except KeyboardInterrupt:
|
| 132 |
+
print()
|
| 133 |
+
print()
|
| 134 |
+
print("="*80)
|
| 135 |
+
print("COLLECTION INTERRUPTED")
|
| 136 |
+
print("="*80)
|
| 137 |
+
print()
|
| 138 |
+
print("Collection was stopped by user.")
|
| 139 |
+
print()
|
| 140 |
+
print("NOTE: OpenMeteo collection does NOT have checkpoint/resume capability")
|
| 141 |
+
print(" (collection completes in ~5 minutes, so not needed)")
|
| 142 |
+
print()
|
| 143 |
+
print("To restart: Run this script again")
|
| 144 |
+
print()
|
| 145 |
+
sys.exit(130)
|
| 146 |
+
|
| 147 |
+
except Exception as e:
|
| 148 |
+
print()
|
| 149 |
+
print()
|
| 150 |
+
print("="*80)
|
| 151 |
+
print("COLLECTION FAILED")
|
| 152 |
+
print("="*80)
|
| 153 |
+
print()
|
| 154 |
+
print(f"Error: {e}")
|
| 155 |
+
print()
|
| 156 |
+
import traceback
|
| 157 |
+
traceback.print_exc()
|
| 158 |
+
print()
|
| 159 |
+
sys.exit(1)
|
|
@@ -278,7 +278,7 @@ class OpenMeteoCollector:
|
|
| 278 |
return df
|
| 279 |
|
| 280 |
except requests.exceptions.RequestException as e:
|
| 281 |
-
print(f"
|
| 282 |
return pl.DataFrame()
|
| 283 |
|
| 284 |
def collect_all(
|
|
@@ -344,7 +344,7 @@ class OpenMeteoCollector:
|
|
| 344 |
if location_chunks:
|
| 345 |
location_df = pl.concat(location_chunks)
|
| 346 |
all_data.append(location_df)
|
| 347 |
-
print(f"
|
| 348 |
|
| 349 |
# Combine all dataframes
|
| 350 |
if all_data:
|
|
@@ -363,13 +363,18 @@ class OpenMeteoCollector:
|
|
| 363 |
print(f"Total records: {combined_df.shape[0]:,}")
|
| 364 |
print(f"Date range: {combined_df['timestamp'].min()} to {combined_df['timestamp'].max()}")
|
| 365 |
print(f"Grid points: {combined_df['grid_point'].n_unique()}")
|
| 366 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 367 |
print(f"Output: {output_path}")
|
| 368 |
print(f"File size: {output_path.stat().st_size / (1024**2):.1f} MB")
|
| 369 |
|
| 370 |
return combined_df
|
| 371 |
else:
|
| 372 |
-
print("
|
| 373 |
return pl.DataFrame()
|
| 374 |
|
| 375 |
|
|
|
|
| 278 |
return df
|
| 279 |
|
| 280 |
except requests.exceptions.RequestException as e:
|
| 281 |
+
print(f"[ERROR] Failed {location_id} ({start_date} to {end_date}): {e}")
|
| 282 |
return pl.DataFrame()
|
| 283 |
|
| 284 |
def collect_all(
|
|
|
|
| 344 |
if location_chunks:
|
| 345 |
location_df = pl.concat(location_chunks)
|
| 346 |
all_data.append(location_df)
|
| 347 |
+
print(f"[OK] {location_id}: {location_df.shape[0]} hours")
|
| 348 |
|
| 349 |
# Combine all dataframes
|
| 350 |
if all_data:
|
|
|
|
| 363 |
print(f"Total records: {combined_df.shape[0]:,}")
|
| 364 |
print(f"Date range: {combined_df['timestamp'].min()} to {combined_df['timestamp'].max()}")
|
| 365 |
print(f"Grid points: {combined_df['grid_point'].n_unique()}")
|
| 366 |
+
|
| 367 |
+
# Calculate completeness (fix: extract scalar from Polars)
|
| 368 |
+
null_count_total = combined_df.null_count().sum_horizontal()[0]
|
| 369 |
+
completeness = (1 - null_count_total / (combined_df.shape[0] * combined_df.shape[1])) * 100
|
| 370 |
+
print(f"Completeness: {completeness:.2f}%")
|
| 371 |
+
|
| 372 |
print(f"Output: {output_path}")
|
| 373 |
print(f"File size: {output_path.stat().st_size / (1024**2):.1f} MB")
|
| 374 |
|
| 375 |
return combined_df
|
| 376 |
else:
|
| 377 |
+
print("[ERROR] No data collected")
|
| 378 |
return pl.DataFrame()
|
| 379 |
|
| 380 |
|
|
@@ -0,0 +1,263 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Engineer 375 Weather features for FBMC forecasting.
|
| 2 |
+
|
| 3 |
+
Transforms OpenMeteo weather data into model-ready features:
|
| 4 |
+
1. Grid-level features (51 points × 7 vars = 357 features)
|
| 5 |
+
2. Temporal lags (3 vars × 4 time periods = 12 features)
|
| 6 |
+
3. Derived features (rate-of-change + stability = 6 features)
|
| 7 |
+
|
| 8 |
+
Total: 375 weather features
|
| 9 |
+
|
| 10 |
+
Weather Variables (7):
|
| 11 |
+
- temperature_2m (C)
|
| 12 |
+
- windspeed_10m (m/s)
|
| 13 |
+
- windspeed_100m (m/s) - for wind generation
|
| 14 |
+
- winddirection_100m (degrees)
|
| 15 |
+
- shortwave_radiation (W/m2) - for solar generation
|
| 16 |
+
- cloudcover (%)
|
| 17 |
+
- surface_pressure (hPa)
|
| 18 |
+
|
| 19 |
+
Author: Claude
|
| 20 |
+
Date: 2025-11-10
|
| 21 |
+
"""
|
| 22 |
+
from pathlib import Path
|
| 23 |
+
import polars as pl
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def engineer_grid_level_features(weather_df: pl.DataFrame) -> pl.DataFrame:
|
| 27 |
+
"""Engineer grid-level weather features (51 points × 7 vars = 357 features).
|
| 28 |
+
|
| 29 |
+
For each grid point, pivot all 7 weather variables to wide format:
|
| 30 |
+
- temp_<grid_point>
|
| 31 |
+
- wind10m_<grid_point>
|
| 32 |
+
- wind100m_<grid_point>
|
| 33 |
+
- winddir_<grid_point>
|
| 34 |
+
- solar_<grid_point>
|
| 35 |
+
- cloud_<grid_point>
|
| 36 |
+
- pressure_<grid_point>
|
| 37 |
+
"""
|
| 38 |
+
print("\n[1/5] Engineering grid-level features (51 points × 7 vars)...")
|
| 39 |
+
|
| 40 |
+
# Pivot each weather variable separately
|
| 41 |
+
features = None
|
| 42 |
+
|
| 43 |
+
weather_vars = [
|
| 44 |
+
('temperature_2m', 'temp'),
|
| 45 |
+
('windspeed_10m', 'wind10m'),
|
| 46 |
+
('windspeed_100m', 'wind100m'),
|
| 47 |
+
('winddirection_100m', 'winddir'),
|
| 48 |
+
('shortwave_radiation', 'solar'),
|
| 49 |
+
('cloudcover', 'cloud'),
|
| 50 |
+
('surface_pressure', 'pressure')
|
| 51 |
+
]
|
| 52 |
+
|
| 53 |
+
for orig_col, short_name in weather_vars:
|
| 54 |
+
print(f" Pivoting {orig_col}...")
|
| 55 |
+
|
| 56 |
+
pivoted = weather_df.select(['timestamp', 'grid_point', orig_col]).pivot(
|
| 57 |
+
values=orig_col,
|
| 58 |
+
index='timestamp',
|
| 59 |
+
on='grid_point',
|
| 60 |
+
aggregate_function='first'
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
# Rename columns to <short_name>_<grid_point>
|
| 64 |
+
rename_map = {}
|
| 65 |
+
for col in pivoted.columns:
|
| 66 |
+
if col != 'timestamp':
|
| 67 |
+
rename_map[col] = f'{short_name}_{col}'
|
| 68 |
+
|
| 69 |
+
pivoted = pivoted.rename(rename_map)
|
| 70 |
+
|
| 71 |
+
# Join to features
|
| 72 |
+
if features is None:
|
| 73 |
+
features = pivoted
|
| 74 |
+
else:
|
| 75 |
+
features = features.join(pivoted, on='timestamp', how='left', coalesce=True)
|
| 76 |
+
|
| 77 |
+
print(f" [OK] {len(features.columns) - 1} grid-level features")
|
| 78 |
+
return features
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def engineer_temporal_lags(features: pl.DataFrame) -> pl.DataFrame:
|
| 82 |
+
"""Add temporal lags for key weather variables.
|
| 83 |
+
|
| 84 |
+
Lags: 1h, 6h, 12h, 24h for:
|
| 85 |
+
- Average temperature (1 lag feature)
|
| 86 |
+
- Average wind speed (1 lag feature)
|
| 87 |
+
- Average solar radiation (1 lag feature)
|
| 88 |
+
|
| 89 |
+
Total: ~12 lag features (3 vars × 4 lags)
|
| 90 |
+
"""
|
| 91 |
+
print("\n[2/3] Engineering temporal lags (1h, 6h, 12h, 24h)...")
|
| 92 |
+
|
| 93 |
+
# Calculate system-wide averages for lagging
|
| 94 |
+
# Temperature average (across all temp_ columns)
|
| 95 |
+
temp_cols = [c for c in features.columns if c.startswith('temp_')]
|
| 96 |
+
features = features.with_columns([
|
| 97 |
+
pl.concat_list([pl.col(c) for c in temp_cols]).list.mean().alias('temp_avg')
|
| 98 |
+
])
|
| 99 |
+
|
| 100 |
+
# Wind speed average (100m - for wind generation)
|
| 101 |
+
wind_cols = [c for c in features.columns if c.startswith('wind100m_')]
|
| 102 |
+
features = features.with_columns([
|
| 103 |
+
pl.concat_list([pl.col(c) for c in wind_cols]).list.mean().alias('wind_avg')
|
| 104 |
+
])
|
| 105 |
+
|
| 106 |
+
# Solar radiation average
|
| 107 |
+
solar_cols = [c for c in features.columns if c.startswith('solar_')]
|
| 108 |
+
features = features.with_columns([
|
| 109 |
+
pl.concat_list([pl.col(c) for c in solar_cols]).list.mean().alias('solar_avg')
|
| 110 |
+
])
|
| 111 |
+
|
| 112 |
+
# Add lags
|
| 113 |
+
lag_vars = ['temp_avg', 'wind_avg', 'solar_avg']
|
| 114 |
+
lag_hours = [1, 6, 12, 24]
|
| 115 |
+
|
| 116 |
+
for var in lag_vars:
|
| 117 |
+
for lag_h in lag_hours:
|
| 118 |
+
features = features.with_columns([
|
| 119 |
+
pl.col(var).shift(lag_h).alias(f'{var}_lag{lag_h}h')
|
| 120 |
+
])
|
| 121 |
+
|
| 122 |
+
# Drop intermediate averages (keep only lagged versions)
|
| 123 |
+
features = features.drop(['temp_avg', 'wind_avg', 'solar_avg'])
|
| 124 |
+
|
| 125 |
+
lag_features = len(lag_vars) * len(lag_hours)
|
| 126 |
+
print(f" [OK] {lag_features} temporal lag features")
|
| 127 |
+
return features
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
def engineer_derived_features(features: pl.DataFrame) -> pl.DataFrame:
|
| 131 |
+
"""Engineer derived weather features (6 features).
|
| 132 |
+
|
| 133 |
+
Simple features without requiring calibration data:
|
| 134 |
+
- Rate of change (hour-over-hour deltas): wind, solar, temperature
|
| 135 |
+
- Weather stability (rolling std): wind, solar, temperature
|
| 136 |
+
"""
|
| 137 |
+
print("\n[3/3] Engineering derived features (rate-of-change + stability)...")
|
| 138 |
+
|
| 139 |
+
# Calculate system averages for rate-of-change and stability
|
| 140 |
+
wind_cols = [c for c in features.columns if c.startswith('wind100m_')]
|
| 141 |
+
solar_cols = [c for c in features.columns if c.startswith('solar_')]
|
| 142 |
+
temp_cols = [c for c in features.columns if c.startswith('temp_')]
|
| 143 |
+
|
| 144 |
+
features = features.with_columns([
|
| 145 |
+
pl.concat_list([pl.col(c) for c in wind_cols]).list.mean().alias('wind_system_avg'),
|
| 146 |
+
pl.concat_list([pl.col(c) for c in solar_cols]).list.mean().alias('solar_system_avg'),
|
| 147 |
+
pl.concat_list([pl.col(c) for c in temp_cols]).list.mean().alias('temp_system_avg')
|
| 148 |
+
])
|
| 149 |
+
|
| 150 |
+
# Rate of change (hour-over-hour deltas)
|
| 151 |
+
# Captures sudden spikes/drops that correlate with grid constraints
|
| 152 |
+
features = features.with_columns([
|
| 153 |
+
pl.col('wind_system_avg').diff().alias('wind_rate_change'),
|
| 154 |
+
pl.col('solar_system_avg').diff().alias('solar_rate_change'),
|
| 155 |
+
pl.col('temp_system_avg').diff().alias('temp_rate_change')
|
| 156 |
+
])
|
| 157 |
+
|
| 158 |
+
# Weather stability: 6-hour rolling std
|
| 159 |
+
# Detects volatility periods (useful for forecasting uncertainty)
|
| 160 |
+
features = features.with_columns([
|
| 161 |
+
pl.col('wind_system_avg').rolling_std(window_size=6).alias('wind_stability_6h'),
|
| 162 |
+
pl.col('solar_system_avg').rolling_std(window_size=6).alias('solar_stability_6h'),
|
| 163 |
+
pl.col('temp_system_avg').rolling_std(window_size=6).alias('temp_stability_6h')
|
| 164 |
+
])
|
| 165 |
+
|
| 166 |
+
# Drop intermediate columns
|
| 167 |
+
features = features.drop(['wind_system_avg', 'solar_system_avg', 'temp_system_avg'])
|
| 168 |
+
|
| 169 |
+
# Count derived features
|
| 170 |
+
derived_cols = ['wind_rate_change', 'solar_rate_change', 'temp_rate_change',
|
| 171 |
+
'wind_stability_6h', 'solar_stability_6h', 'temp_stability_6h']
|
| 172 |
+
|
| 173 |
+
print(f" [OK] {len(derived_cols)} derived features")
|
| 174 |
+
return features
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
def engineer_weather_features(
|
| 178 |
+
weather_path: Path,
|
| 179 |
+
output_dir: Path
|
| 180 |
+
) -> pl.DataFrame:
|
| 181 |
+
"""Main feature engineering pipeline for weather data.
|
| 182 |
+
|
| 183 |
+
Args:
|
| 184 |
+
weather_path: Path to raw weather data (weather_24month.parquet)
|
| 185 |
+
output_dir: Directory to save engineered features
|
| 186 |
+
|
| 187 |
+
Returns:
|
| 188 |
+
DataFrame with ~435 weather features
|
| 189 |
+
"""
|
| 190 |
+
print("=" * 80)
|
| 191 |
+
print("WEATHER FEATURE ENGINEERING")
|
| 192 |
+
print("=" * 80)
|
| 193 |
+
print()
|
| 194 |
+
print(f"Input: {weather_path}")
|
| 195 |
+
print(f"Output: {output_dir}")
|
| 196 |
+
print()
|
| 197 |
+
|
| 198 |
+
# Load raw weather data
|
| 199 |
+
print("Loading weather data...")
|
| 200 |
+
weather_df = pl.read_parquet(weather_path)
|
| 201 |
+
print(f" [OK] {weather_df.shape[0]:,} rows × {weather_df.shape[1]} columns")
|
| 202 |
+
print(f" Date range: {weather_df['timestamp'].min()} to {weather_df['timestamp'].max()}")
|
| 203 |
+
print()
|
| 204 |
+
|
| 205 |
+
# 1. Grid-level features (51 × 7 = 357 features)
|
| 206 |
+
all_features = engineer_grid_level_features(weather_df)
|
| 207 |
+
|
| 208 |
+
# 2. Temporal lags (~12 features)
|
| 209 |
+
all_features = engineer_temporal_lags(all_features)
|
| 210 |
+
|
| 211 |
+
# 3. Derived features (6 features: rate-of-change + stability)
|
| 212 |
+
all_features = engineer_derived_features(all_features)
|
| 213 |
+
|
| 214 |
+
# Sort by timestamp
|
| 215 |
+
all_features = all_features.sort('timestamp')
|
| 216 |
+
|
| 217 |
+
# Final validation
|
| 218 |
+
print("\n" + "=" * 80)
|
| 219 |
+
print("FEATURE ENGINEERING COMPLETE")
|
| 220 |
+
print("=" * 80)
|
| 221 |
+
print(f"Total features: {all_features.shape[1] - 1} (excluding timestamp)")
|
| 222 |
+
print(f"Total rows: {len(all_features):,}")
|
| 223 |
+
|
| 224 |
+
# Check completeness
|
| 225 |
+
null_count_total = all_features.null_count().sum_horizontal()[0]
|
| 226 |
+
completeness = (1 - null_count_total / (all_features.shape[0] * all_features.shape[1])) * 100
|
| 227 |
+
print(f"Completeness: {completeness:.2f}%")
|
| 228 |
+
print()
|
| 229 |
+
|
| 230 |
+
# Save features
|
| 231 |
+
output_path = output_dir / 'features_weather_24month.parquet'
|
| 232 |
+
all_features.write_parquet(output_path)
|
| 233 |
+
|
| 234 |
+
file_size_mb = output_path.stat().st_size / (1024 ** 2)
|
| 235 |
+
print(f"Features saved: {output_path}")
|
| 236 |
+
print(f"File size: {file_size_mb:.2f} MB")
|
| 237 |
+
print("=" * 80)
|
| 238 |
+
print()
|
| 239 |
+
|
| 240 |
+
return all_features
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
def main():
|
| 244 |
+
"""Main execution."""
|
| 245 |
+
# Paths
|
| 246 |
+
base_dir = Path.cwd()
|
| 247 |
+
raw_dir = base_dir / 'data' / 'raw'
|
| 248 |
+
processed_dir = base_dir / 'data' / 'processed'
|
| 249 |
+
|
| 250 |
+
weather_path = raw_dir / 'weather_24month.parquet'
|
| 251 |
+
|
| 252 |
+
# Verify file exists
|
| 253 |
+
if not weather_path.exists():
|
| 254 |
+
raise FileNotFoundError(f"Weather data not found: {weather_path}")
|
| 255 |
+
|
| 256 |
+
# Engineer features
|
| 257 |
+
features = engineer_weather_features(weather_path, processed_dir)
|
| 258 |
+
|
| 259 |
+
print("SUCCESS: Weather features engineered and saved to data/processed/")
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
if __name__ == '__main__':
|
| 263 |
+
main()
|