Spaces:

Zen0
/

auscyberbench-evaluator

Sleeping

Zen0 commited on Oct 24

Commit

62d78bf

1 Parent(s): 2338c46

Improve UX: Move evaluation settings to top of page

Better workflow:
1. Settings at top (no scrolling needed)
2. Big Run button immediately below
3. Model selection below that
4. Results on right side

Benefits:
- Faster to configure and run evaluations
- Settings visible without scrolling
- Cleaner separation of concerns
- More intuitive workflow

Also:
- Condensed GPU warning (less verbose)
- Added persistent results note to model selection
- Better visual hierarchy with divider

Files changed (1) hide show

app.py +23 -17

app.py CHANGED Viewed

@@ -637,10 +637,28 @@ with gr.Blocks(title="AusCyberBench Evaluation Dashboard", theme=gr.themes.Soft(
     ✅ **Recommended models** have been tested: Qwen2.5-3B (55.6%), DeepSeek (55%), TinyLlama (33%)
     """)
     with gr.Row():
         with gr.Column(scale=1):
             gr.Markdown("### 📋 Model Selection")
             # Quick selection buttons
             with gr.Row():
                 btn_recommended = gr.Button("✅ Recommended (6)", size="sm", variant="primary")
@@ -661,26 +679,14 @@ with gr.Blocks(title="AusCyberBench Evaluation Dashboard", theme=gr.themes.Soft(
                     cb = gr.Checkbox(label=f"{short_name}", value=False)
                     model_checkboxes.append((cb, model))
-            gr.Markdown("### ⚡ GPU Limits (Free Tier)")
             gr.Markdown("""
-            **⚠️ Important:** ZeroGPU free tier has a **60-second limit per session**.
-            **Recommendations:**
-            - ✅ **1-2 models** with 10-20 tasks: Safe, will complete
-            - ⚠️ **3-5 models** with 10 tasks: May timeout midway
-            - ❌ **6+ models** or 50+ tasks: Will likely timeout
-            For testing multiple models, run evaluations separately or use fewer tasks.
             """)
-            gr.Markdown("### ⚙️ Settings")
-            num_samples = gr.Slider(10, 500, value=10, step=10, label="Number of Tasks (10 recommended for multiple models)")
-            use_4bit = gr.Checkbox(label="Use 4-bit Quantisation", value=True)
-            temperature = gr.Slider(0.1, 1.0, value=0.7, step=0.1, label="Temperature")
-            max_tokens = gr.Slider(8, 256, value=32, step=8, label="Max New Tokens")
-            run_btn = gr.Button("🚀 Run Evaluation", variant="primary", size="lg")
         with gr.Column(scale=2):
             gr.Markdown("### 📊 Persistent Leaderboard")
             gr.Markdown("""

     ✅ **Recommended models** have been tested: Qwen2.5-3B (55.6%), DeepSeek (55%), TinyLlama (33%)
     """)
+    # Settings section at top for better UX
+    gr.Markdown("## ⚙️ Evaluation Settings")
+    with gr.Row():
+        num_samples = gr.Slider(10, 500, value=10, step=10, label="Number of Tasks (10 recommended)")
+        use_4bit = gr.Checkbox(label="Use 4-bit Quantisation", value=True)
+    with gr.Row():
+        temperature = gr.Slider(0.1, 1.0, value=0.7, step=0.1, label="Temperature")
+        max_tokens = gr.Slider(8, 256, value=32, step=8, label="Max New Tokens")
+    run_btn = gr.Button("🚀 Run Evaluation", variant="primary", size="lg")
+    gr.Markdown("---")
     with gr.Row():
         with gr.Column(scale=1):
             gr.Markdown("### 📋 Model Selection")
+            gr.Markdown("""
+            **💾 Persistent Results:** Run 1-2 models at a time to avoid GPU timeouts.
+            Results merge with the leaderboard automatically!
+            """)
             # Quick selection buttons
             with gr.Row():
                 btn_recommended = gr.Button("✅ Recommended (6)", size="sm", variant="primary")
                     cb = gr.Checkbox(label=f"{short_name}", value=False)
                     model_checkboxes.append((cb, model))
+            gr.Markdown("### ⚡ GPU Limits")
             gr.Markdown("""
+            **Free tier: 60-second limit**
+            - ✅ 1-2 models: Safe
+            - ⚠️ 3-5 models: May timeout
+            - ❌ 6+ models: Will timeout
             """)
         with gr.Column(scale=2):
             gr.Markdown("### 📊 Persistent Leaderboard")
             gr.Markdown("""