fbmc-chronos2 / doc /final_domain_research.md
Evgueni Poloukarov
feat: complete Phase 1 ENTSO-E asset-specific outage validation
27cb60a

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Final Domain Collection Research

Summary of Findings

Available Methods in jao-py

The JaoPublicationToolPandasClient class provides three domain query methods:

  1. query_final_domain(mtu, presolved, cne, co, use_mirror) (Line 233)

    • Final Computation - Final FB parameters following LTN
    • Published: 10:30 D-1
    • Most complete dataset (recommended for Phase 2)
  2. query_prefinal_domain(mtu, presolved, cne, co, use_mirror) (Line 248)

    • Pre-Final (EarlyPub) - Pre-final FB parameters before LTN
    • Published: 08:00 D-1
    • Earlier publication time, but before LTN application
  3. query_initial_domain(mtu, presolved, cne, co) (Line 264)

    • Initial Computation (Virgin Domain) - Initial flow-based parameters
    • Published: Early in D-1
    • Before any adjustments

Method Parameters

def query_final_domain(
    mtu: pd.Timestamp,      # Market Time Unit (1 hour, timezone-aware)
    presolved: bool = None, # Filter: True=binding, False=non-binding, None=ALL
    cne: str = None,        # CNEC name keyword filter (NOT EIC-based!)
    co: str = None,         # Contingency keyword filter
    use_mirror: bool = False # Use mirror.flowbased.eu for faster bulk download
) -> pd.DataFrame

Key Findings

  1. DENSE Data Acquisition:

    • Set presolved=None to get ALL CNECs (binding + non-binding)
    • This provides the DENSE format needed for Phase 2 feature engineering
  2. Filtering Limitations:

    • ❌ NO EIC-based filtering on server side
    • ✅ Only keyword-based filters (cne, co) available
    • Solution: Download all CNECs, filter locally by EIC codes
  3. Query Granularity:

    • Method queries 1 hour at a time (mtu = Market Time Unit)
    • For 24 months: Need 17,520 API calls (1 per hour)
    • Alternative: Use use_mirror=True for whole-day downloads
  4. Mirror Option (Recommended for bulk collection):

    • URL: https://mirror.flowbased.eu/dacc/final_domain/YYYY-MM-DD
    • Returns full day (24 hours) as CSV in ZIP file
    • Much faster than hourly API calls
    • Set use_mirror=True OR set env var JAO_USE_MIRROR=1
  5. Data Structure (from parse_final_domain()):

    • Returns pandas DataFrame with columns:
      • Identifiers: mtu (timestamp), tso, cnec_name, cnec_eic, direction
      • Contingency: contingency_* fields (nested structure flattened)
      • Presolved field: Indicates if CNEC is binding (True) or redundant (False)
      • RAM breakdown: ram, fmax, imax, frm, fuaf, amr, lta_margin, etc.
      • PTDFs: ptdf_AT, ptdf_BE, ..., ptdf_SK (12 Core zones)
    • Timestamps converted to Europe/Amsterdam timezone
    • snake_case column names (except PTDFs)

Recommended Implementation for Phase 2

Option A: Mirror-based (FASTEST):

def collect_final_domain_sample(
    start_date: str,
    end_date: str,
    target_cnec_eics: list[str],  # 200 EIC codes from Phase 1
    output_path: Path
) -> pl.DataFrame:
    """Collect DENSE CNEC data for specific CNECs using mirror."""

    client = JAOClient()  # With use_mirror=True

    all_data = []
    for date in pd.date_range(start_date, end_date):
        # Query full day (all CNECs) via mirror
        df_day = client.query_final_domain(
            mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
            presolved=None,  # ALL CNECs (DENSE!)
            use_mirror=True   # Fast bulk download
        )

        # Filter to target CNECs only
        df_filtered = df_day[df_day['cnec_eic'].isin(target_cnec_eics)]
        all_data.append(df_filtered)

    # Combine and save
    df_full = pd.concat(all_data)
    pl_df = pl.from_pandas(df_full)
    pl_df.write_parquet(output_path)

    return pl_df

Option B: Hourly API calls (SLOWER, but more granular):

def collect_final_domain_hourly(
    start_date: str,
    end_date: str,
    target_cnec_eics: list[str],
    output_path: Path
) -> pl.DataFrame:
    """Collect DENSE CNEC data hour-by-hour."""

    client = JAOClient()

    all_data = []
    for date in pd.date_range(start_date, end_date, freq='H'):
        try:
            df_hour = client.query_final_domain(
                mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
                presolved=None  # ALL CNECs
            )
            df_filtered = df_hour[df_hour['cnec_eic'].isin(target_cnec_eics)]
            all_data.append(df_filtered)
        except NoMatchingDataError:
            continue  # Hour may have no data

    df_full = pd.concat(all_data)
    pl_df = pl.from_pandas(df_full)
    pl_df.write_parquet(output_path)

    return pl_df

Data Volume Estimates

Full Download (all ~20K CNECs):

  • 20,000 CNECs × 17,520 hours = 350M records
  • ~27 columns × 8 bytes/value = ~75 GB uncompressed
  • Parquet compression: ~10-20 GB

Filtered (200 target CNECs):

  • 200 CNECs × 17,520 hours = 3.5M records
  • ~27 columns × 8 bytes/value = ~750 MB uncompressed
  • Parquet compression: ~100-150 MB

Implementation Strategy

  1. Phase 1 complete: Identify top 200 CNECs from SPARSE data
  2. Extract EIC codes: Save to data/processed/critical_cnecs_eic_codes.csv
  3. Test on 1 week: Validate DENSE collection with mirror
    # Test: 2025-09-23 to 2025-09-30 (8 days)
    # Expected: 200 CNECs × 192 hours = 38,400 records
    
  4. Collect 24 months: Using mirror for speed
  5. Validate DENSE structure:
    unique_cnecs = df['cnec_eic'].n_unique()
    unique_hours = df['mtu'].n_unique()
    expected = unique_cnecs * unique_hours
    actual = len(df)
    assert actual == expected, f"Not DENSE! {actual} != {expected}"
    

Advantages of Mirror Method

  • ✅ Faster: 1 request/day vs 24 requests/day
  • ✅ Rate limit friendly: 730 requests vs 17,520 requests
  • ✅ More reliable: Less chance of timeout/connection errors
  • ✅ Complete days: Guarantees all 24 hours present

Next Steps

  1. Add collect_final_domain_dense() method to collect_jao.py
  2. Test on 1-week sample with target EIC codes
  3. Validate DENSE structure and data quality
  4. Run 24-month collection after Phase 1 complete
  5. Use DENSE data for Tier 1 & Tier 2 feature engineering

Research completed: 2025-11-05 jao-py version: 0.6.2 Source: C:\Users\evgue\projects\fbmc_chronos2.venv\Lib\site-packages\jao\jao.py