Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

App Files Files Community

fbmc-chronos2 / doc /final_domain_research.md

Evgueni Poloukarov

feat: complete Phase 1 ENTSO-E asset-specific outage validation

27cb60a about 1 month ago

preview code

raw

history blame contribute delete

6.37 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Final Domain Collection Research

Summary of Findings

Available Methods in jao-py

The JaoPublicationToolPandasClient class provides three domain query methods:

query_final_domain(mtu, presolved, cne, co, use_mirror) (Line 233)
- Final Computation - Final FB parameters following LTN
- Published: 10:30 D-1
- Most complete dataset (recommended for Phase 2)
query_prefinal_domain(mtu, presolved, cne, co, use_mirror) (Line 248)
- Pre-Final (EarlyPub) - Pre-final FB parameters before LTN
- Published: 08:00 D-1
- Earlier publication time, but before LTN application
query_initial_domain(mtu, presolved, cne, co) (Line 264)
- Initial Computation (Virgin Domain) - Initial flow-based parameters
- Published: Early in D-1
- Before any adjustments

Method Parameters

def query_final_domain(
    mtu: pd.Timestamp,      # Market Time Unit (1 hour, timezone-aware)
    presolved: bool = None, # Filter: True=binding, False=non-binding, None=ALL
    cne: str = None,        # CNEC name keyword filter (NOT EIC-based!)
    co: str = None,         # Contingency keyword filter
    use_mirror: bool = False # Use mirror.flowbased.eu for faster bulk download
) -> pd.DataFrame

Key Findings

DENSE Data Acquisition:
- Set presolved=None to get ALL CNECs (binding + non-binding)
- This provides the DENSE format needed for Phase 2 feature engineering
Filtering Limitations:
- ❌ NO EIC-based filtering on server side
- ✅ Only keyword-based filters (cne, co) available
- Solution: Download all CNECs, filter locally by EIC codes
Query Granularity:
- Method queries 1 hour at a time (mtu = Market Time Unit)
- For 24 months: Need 17,520 API calls (1 per hour)
- Alternative: Use use_mirror=True for whole-day downloads
Mirror Option (Recommended for bulk collection):
- URL: https://mirror.flowbased.eu/dacc/final_domain/YYYY-MM-DD
- Returns full day (24 hours) as CSV in ZIP file
- Much faster than hourly API calls
- Set use_mirror=True OR set env var JAO_USE_MIRROR=1
Data Structure (from parse_final_domain()):
- Returns pandas DataFrame with columns:
  - Identifiers: mtu (timestamp), tso, cnec_name, cnec_eic, direction
  - Contingency: contingency_* fields (nested structure flattened)
  - Presolved field: Indicates if CNEC is binding (True) or redundant (False)
  - RAM breakdown: ram, fmax, imax, frm, fuaf, amr, lta_margin, etc.
  - PTDFs: ptdf_AT, ptdf_BE, ..., ptdf_SK (12 Core zones)
- Timestamps converted to Europe/Amsterdam timezone
- snake_case column names (except PTDFs)

Recommended Implementation for Phase 2

Option A: Mirror-based (FASTEST):

def collect_final_domain_sample(
    start_date: str,
    end_date: str,
    target_cnec_eics: list[str],  # 200 EIC codes from Phase 1
    output_path: Path
) -> pl.DataFrame:
    """Collect DENSE CNEC data for specific CNECs using mirror."""

    client = JAOClient()  # With use_mirror=True

    all_data = []
    for date in pd.date_range(start_date, end_date):
        # Query full day (all CNECs) via mirror
        df_day = client.query_final_domain(
            mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
            presolved=None,  # ALL CNECs (DENSE!)
            use_mirror=True   # Fast bulk download
        )

        # Filter to target CNECs only
        df_filtered = df_day[df_day['cnec_eic'].isin(target_cnec_eics)]
        all_data.append(df_filtered)

    # Combine and save
    df_full = pd.concat(all_data)
    pl_df = pl.from_pandas(df_full)
    pl_df.write_parquet(output_path)

    return pl_df

Option B: Hourly API calls (SLOWER, but more granular):

def collect_final_domain_hourly(
    start_date: str,
    end_date: str,
    target_cnec_eics: list[str],
    output_path: Path
) -> pl.DataFrame:
    """Collect DENSE CNEC data hour-by-hour."""

    client = JAOClient()

    all_data = []
    for date in pd.date_range(start_date, end_date, freq='H'):
        try:
            df_hour = client.query_final_domain(
                mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
                presolved=None  # ALL CNECs
            )
            df_filtered = df_hour[df_hour['cnec_eic'].isin(target_cnec_eics)]
            all_data.append(df_filtered)
        except NoMatchingDataError:
            continue  # Hour may have no data

    df_full = pd.concat(all_data)
    pl_df = pl.from_pandas(df_full)
    pl_df.write_parquet(output_path)

    return pl_df

Data Volume Estimates

Full Download (all ~20K CNECs):

20,000 CNECs × 17,520 hours = 350M records
~27 columns × 8 bytes/value = ~75 GB uncompressed
Parquet compression: ~10-20 GB

Filtered (200 target CNECs):

200 CNECs × 17,520 hours = 3.5M records
~27 columns × 8 bytes/value = ~750 MB uncompressed
Parquet compression: ~100-150 MB

Implementation Strategy

Phase 1 complete: Identify top 200 CNECs from SPARSE data
Extract EIC codes: Save to data/processed/critical_cnecs_eic_codes.csv

Test on 1 week: Validate DENSE collection with mirror

# Test: 2025-09-23 to 2025-09-30 (8 days)
# Expected: 200 CNECs × 192 hours = 38,400 records

Collect 24 months: Using mirror for speed

Validate DENSE structure:

unique_cnecs = df['cnec_eic'].n_unique()
unique_hours = df['mtu'].n_unique()
expected = unique_cnecs * unique_hours
actual = len(df)
assert actual == expected, f"Not DENSE! {actual} != {expected}"

Advantages of Mirror Method

✅ Faster: 1 request/day vs 24 requests/day
✅ Rate limit friendly: 730 requests vs 17,520 requests
✅ More reliable: Less chance of timeout/connection errors
✅ Complete days: Guarantees all 24 hours present

Next Steps

Add collect_final_domain_dense() method to collect_jao.py
Test on 1-week sample with target EIC codes
Validate DENSE structure and data quality
Run 24-month collection after Phase 1 complete
Use DENSE data for Tier 1 & Tier 2 feature engineering

Research completed: 2025-11-05 jao-py version: 0.6.2 Source: C:\Users\evgue\projects\fbmc_chronos2.venv\Lib\site-packages\jao\jao.py