Spaces:

IqraEval
/

IqraEval_Interspeech_26

Running

App Files Files Community

01Yassine commited on about 11 hours ago

Commit

9f15275

verified ·

1 Parent(s): 8ad54b2

Update index.html

Browse files

Files changed (1) hide show

index.html +60 -1

index.html CHANGED Viewed

@@ -285,7 +285,7 @@
       <strong>Note:</strong> no extra spaces, single CSV, no archives.
     </p>
-    <h2>Evaluation Criteria</h2>
     <p>
       The Leaderboard is based on phoneme-level <strong>F1-score</strong>.
       We use a hierarchical evaluation (detection + diagnostic) per <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>.
@@ -315,8 +315,67 @@
         <li>Recall = TR/(TR+FA)</li>
         <li>F1 = 2·P·R/(P+R)</li>
       </ul>
     </p>
     <h2>Suggested Research Directions</h2>
     <ol>
       <li>

       <strong>Note:</strong> no extra spaces, single CSV, no archives.
     </p>
+    <!-- <h2>Evaluation Criteria</h2>
     <p>
       The Leaderboard is based on phoneme-level <strong>F1-score</strong>.
       We use a hierarchical evaluation (detection + diagnostic) per <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>.
         <li>Recall = TR/(TR+FA)</li>
         <li>F1 = 2·P·R/(P+R)</li>
       </ul>
+    </p> -->
+    <h2>Evaluation Criteria</h2>
+<div style="background-color: #f0f8ff; border-left: 5px solid #007bff; padding: 15px; margin-bottom: 20px;">
+    <h3 style="margin-top: 0; color: #007bff;">🏆 Primary Metric</h3>
+    <p style="margin-bottom: 0;">
+        The Leaderboard is ranked primarily by the <strong>Phoneme-level F1-score</strong>.
+        While other metrics (FRR, FAR, DER) are computed for analysis, <strong>F1</strong> determines the final standing.
     </p>
+</div>
+<p>
+    We use a hierarchical evaluation strategy (detection + diagnostic) based on the
+    <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a> framework.
+</p>
+<h3>1. Input Definitions</h3>
+<ul>
+    <li><strong>What is said:</strong> The annotated phoneme sequence.</li>
+    <li><strong>What is predicted:</strong> The output from your model.</li>
+    <li><strong>What should have been said:</strong> The reference (target) sequence.</li>
+</ul>
+<h3>2. Confusion Matrix Components</h3>
+<p>From the inputs above, we compute the following counts:</p>
+<table style="width: 100%; border-collapse: collapse; margin-bottom: 20px;">
+    <tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
+        <td style="padding: 8px;"><strong>TA (True Accept)</strong></td>
+        <td style="padding: 8px;">Correct phonemes properly accepted.</td>
+    </tr>
+    <tr style="border-bottom: 1px solid #ddd;">
+        <td style="padding: 8px;"><strong>TR (True Reject)</strong></td>
+        <td style="padding: 8px;">Mispronunciations correctly detected.</td>
+    </tr>
+    <tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
+        <td style="padding: 8px;"><strong>FR (False Reject)</strong></td>
+        <td style="padding: 8px;">Correct phonemes incorrectly flagged as errors.</td>
+    </tr>
+    <tr>
+        <td style="padding: 8px;"><strong>FA (False Accept)</strong></td>
+        <td style="padding: 8px;">Mispronunciations missed (labeled as correct).</td>
+    </tr>
+</table>
+<h3>3. Calculated Metrics</h3>
+<h4>Detection Metrics (Leaderboard Ranking)</h4>
+<ul>
+    <li><strong>Precision:</strong> TR / (TR + FR)</li>
+    <li><strong>Recall:</strong> TR / (TR + FA)</li>
+    <li><strong>F1-Score:</strong> 2 · (Precision · Recall) / (Precision + Recall)</li>
+</ul>
+<h4>Diagnostic Rates (Auxiliary)</h4>
+<ul>
+    <li><strong>FRR (False Reject Rate):</strong> FR / (TA + FR)</li>
+    <li><strong>FAR (False Accept Rate):</strong> FA / (FA + TR)</li>
+    <li><strong>DER (Diagnostic Error Rate):</strong> DE / (CD + DE)</li>
+</ul>
     <h2>Suggested Research Directions</h2>
     <ol>
       <li>