Spaces:
Sleeping
Sleeping
Zen0
commited on
Commit
Β·
62d78bf
1
Parent(s):
2338c46
Improve UX: Move evaluation settings to top of page
Browse filesBetter workflow:
1. Settings at top (no scrolling needed)
2. Big Run button immediately below
3. Model selection below that
4. Results on right side
Benefits:
- Faster to configure and run evaluations
- Settings visible without scrolling
- Cleaner separation of concerns
- More intuitive workflow
Also:
- Condensed GPU warning (less verbose)
- Added persistent results note to model selection
- Better visual hierarchy with divider
app.py
CHANGED
|
@@ -637,10 +637,28 @@ with gr.Blocks(title="AusCyberBench Evaluation Dashboard", theme=gr.themes.Soft(
|
|
| 637 |
β
**Recommended models** have been tested: Qwen2.5-3B (55.6%), DeepSeek (55%), TinyLlama (33%)
|
| 638 |
""")
|
| 639 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 640 |
with gr.Row():
|
| 641 |
with gr.Column(scale=1):
|
| 642 |
gr.Markdown("### π Model Selection")
|
| 643 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 644 |
# Quick selection buttons
|
| 645 |
with gr.Row():
|
| 646 |
btn_recommended = gr.Button("β
Recommended (6)", size="sm", variant="primary")
|
|
@@ -661,26 +679,14 @@ with gr.Blocks(title="AusCyberBench Evaluation Dashboard", theme=gr.themes.Soft(
|
|
| 661 |
cb = gr.Checkbox(label=f"{short_name}", value=False)
|
| 662 |
model_checkboxes.append((cb, model))
|
| 663 |
|
| 664 |
-
gr.Markdown("### β‘ GPU Limits
|
| 665 |
gr.Markdown("""
|
| 666 |
-
|
| 667 |
-
|
| 668 |
-
|
| 669 |
-
-
|
| 670 |
-
- β οΈ **3-5 models** with 10 tasks: May timeout midway
|
| 671 |
-
- β **6+ models** or 50+ tasks: Will likely timeout
|
| 672 |
-
|
| 673 |
-
For testing multiple models, run evaluations separately or use fewer tasks.
|
| 674 |
""")
|
| 675 |
|
| 676 |
-
gr.Markdown("### βοΈ Settings")
|
| 677 |
-
num_samples = gr.Slider(10, 500, value=10, step=10, label="Number of Tasks (10 recommended for multiple models)")
|
| 678 |
-
use_4bit = gr.Checkbox(label="Use 4-bit Quantisation", value=True)
|
| 679 |
-
temperature = gr.Slider(0.1, 1.0, value=0.7, step=0.1, label="Temperature")
|
| 680 |
-
max_tokens = gr.Slider(8, 256, value=32, step=8, label="Max New Tokens")
|
| 681 |
-
|
| 682 |
-
run_btn = gr.Button("π Run Evaluation", variant="primary", size="lg")
|
| 683 |
-
|
| 684 |
with gr.Column(scale=2):
|
| 685 |
gr.Markdown("### π Persistent Leaderboard")
|
| 686 |
gr.Markdown("""
|
|
|
|
| 637 |
β
**Recommended models** have been tested: Qwen2.5-3B (55.6%), DeepSeek (55%), TinyLlama (33%)
|
| 638 |
""")
|
| 639 |
|
| 640 |
+
# Settings section at top for better UX
|
| 641 |
+
gr.Markdown("## βοΈ Evaluation Settings")
|
| 642 |
+
with gr.Row():
|
| 643 |
+
num_samples = gr.Slider(10, 500, value=10, step=10, label="Number of Tasks (10 recommended)")
|
| 644 |
+
use_4bit = gr.Checkbox(label="Use 4-bit Quantisation", value=True)
|
| 645 |
+
with gr.Row():
|
| 646 |
+
temperature = gr.Slider(0.1, 1.0, value=0.7, step=0.1, label="Temperature")
|
| 647 |
+
max_tokens = gr.Slider(8, 256, value=32, step=8, label="Max New Tokens")
|
| 648 |
+
|
| 649 |
+
run_btn = gr.Button("π Run Evaluation", variant="primary", size="lg")
|
| 650 |
+
|
| 651 |
+
gr.Markdown("---")
|
| 652 |
+
|
| 653 |
with gr.Row():
|
| 654 |
with gr.Column(scale=1):
|
| 655 |
gr.Markdown("### π Model Selection")
|
| 656 |
|
| 657 |
+
gr.Markdown("""
|
| 658 |
+
**πΎ Persistent Results:** Run 1-2 models at a time to avoid GPU timeouts.
|
| 659 |
+
Results merge with the leaderboard automatically!
|
| 660 |
+
""")
|
| 661 |
+
|
| 662 |
# Quick selection buttons
|
| 663 |
with gr.Row():
|
| 664 |
btn_recommended = gr.Button("β
Recommended (6)", size="sm", variant="primary")
|
|
|
|
| 679 |
cb = gr.Checkbox(label=f"{short_name}", value=False)
|
| 680 |
model_checkboxes.append((cb, model))
|
| 681 |
|
| 682 |
+
gr.Markdown("### β‘ GPU Limits")
|
| 683 |
gr.Markdown("""
|
| 684 |
+
**Free tier: 60-second limit**
|
| 685 |
+
- β
1-2 models: Safe
|
| 686 |
+
- β οΈ 3-5 models: May timeout
|
| 687 |
+
- β 6+ models: Will timeout
|
|
|
|
|
|
|
|
|
|
|
|
|
| 688 |
""")
|
| 689 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 690 |
with gr.Column(scale=2):
|
| 691 |
gr.Markdown("### π Persistent Leaderboard")
|
| 692 |
gr.Markdown("""
|