Update index.html
Browse files- index.html +60 -1
index.html
CHANGED
|
@@ -285,7 +285,7 @@
|
|
| 285 |
<strong>Note:</strong> no extra spaces, single CSV, no archives.
|
| 286 |
</p>
|
| 287 |
|
| 288 |
-
<h2>Evaluation Criteria</h2>
|
| 289 |
<p>
|
| 290 |
The Leaderboard is based on phoneme-level <strong>F1-score</strong>.
|
| 291 |
We use a hierarchical evaluation (detection + diagnostic) per <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>.
|
|
@@ -315,8 +315,67 @@
|
|
| 315 |
<li>Recall = TR/(TR+FA)</li>
|
| 316 |
<li>F1 = 2路P路R/(P+R)</li>
|
| 317 |
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 318 |
</p>
|
|
|
|
| 319 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
<h2>Suggested Research Directions</h2>
|
| 321 |
<ol>
|
| 322 |
<li>
|
|
|
|
| 285 |
<strong>Note:</strong> no extra spaces, single CSV, no archives.
|
| 286 |
</p>
|
| 287 |
|
| 288 |
+
<!-- <h2>Evaluation Criteria</h2>
|
| 289 |
<p>
|
| 290 |
The Leaderboard is based on phoneme-level <strong>F1-score</strong>.
|
| 291 |
We use a hierarchical evaluation (detection + diagnostic) per <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>.
|
|
|
|
| 315 |
<li>Recall = TR/(TR+FA)</li>
|
| 316 |
<li>F1 = 2路P路R/(P+R)</li>
|
| 317 |
</ul>
|
| 318 |
+
</p> -->
|
| 319 |
+
|
| 320 |
+
<h2>Evaluation Criteria</h2>
|
| 321 |
+
|
| 322 |
+
<div style="background-color: #f0f8ff; border-left: 5px solid #007bff; padding: 15px; margin-bottom: 20px;">
|
| 323 |
+
<h3 style="margin-top: 0; color: #007bff;">馃弳 Primary Metric</h3>
|
| 324 |
+
<p style="margin-bottom: 0;">
|
| 325 |
+
The Leaderboard is ranked primarily by the <strong>Phoneme-level F1-score</strong>.
|
| 326 |
+
While other metrics (FRR, FAR, DER) are computed for analysis, <strong>F1</strong> determines the final standing.
|
| 327 |
</p>
|
| 328 |
+
</div>
|
| 329 |
|
| 330 |
+
<p>
|
| 331 |
+
We use a hierarchical evaluation strategy (detection + diagnostic) based on the
|
| 332 |
+
<a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a> framework.
|
| 333 |
+
</p>
|
| 334 |
+
|
| 335 |
+
<h3>1. Input Definitions</h3>
|
| 336 |
+
<ul>
|
| 337 |
+
<li><strong>What is said:</strong> The annotated phoneme sequence.</li>
|
| 338 |
+
<li><strong>What is predicted:</strong> The output from your model.</li>
|
| 339 |
+
<li><strong>What should have been said:</strong> The reference (target) sequence.</li>
|
| 340 |
+
</ul>
|
| 341 |
+
|
| 342 |
+
<h3>2. Confusion Matrix Components</h3>
|
| 343 |
+
<p>From the inputs above, we compute the following counts:</p>
|
| 344 |
+
<table style="width: 100%; border-collapse: collapse; margin-bottom: 20px;">
|
| 345 |
+
<tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
|
| 346 |
+
<td style="padding: 8px;"><strong>TA (True Accept)</strong></td>
|
| 347 |
+
<td style="padding: 8px;">Correct phonemes properly accepted.</td>
|
| 348 |
+
</tr>
|
| 349 |
+
<tr style="border-bottom: 1px solid #ddd;">
|
| 350 |
+
<td style="padding: 8px;"><strong>TR (True Reject)</strong></td>
|
| 351 |
+
<td style="padding: 8px;">Mispronunciations correctly detected.</td>
|
| 352 |
+
</tr>
|
| 353 |
+
<tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
|
| 354 |
+
<td style="padding: 8px;"><strong>FR (False Reject)</strong></td>
|
| 355 |
+
<td style="padding: 8px;">Correct phonemes incorrectly flagged as errors.</td>
|
| 356 |
+
</tr>
|
| 357 |
+
<tr>
|
| 358 |
+
<td style="padding: 8px;"><strong>FA (False Accept)</strong></td>
|
| 359 |
+
<td style="padding: 8px;">Mispronunciations missed (labeled as correct).</td>
|
| 360 |
+
</tr>
|
| 361 |
+
</table>
|
| 362 |
+
|
| 363 |
+
<h3>3. Calculated Metrics</h3>
|
| 364 |
+
|
| 365 |
+
<h4>Detection Metrics (Leaderboard Ranking)</h4>
|
| 366 |
+
<ul>
|
| 367 |
+
<li><strong>Precision:</strong> TR / (TR + FR)</li>
|
| 368 |
+
<li><strong>Recall:</strong> TR / (TR + FA)</li>
|
| 369 |
+
<li><strong>F1-Score:</strong> 2 路 (Precision 路 Recall) / (Precision + Recall)</li>
|
| 370 |
+
</ul>
|
| 371 |
+
|
| 372 |
+
<h4>Diagnostic Rates (Auxiliary)</h4>
|
| 373 |
+
<ul>
|
| 374 |
+
<li><strong>FRR (False Reject Rate):</strong> FR / (TA + FR)</li>
|
| 375 |
+
<li><strong>FAR (False Accept Rate):</strong> FA / (FA + TR)</li>
|
| 376 |
+
<li><strong>DER (Diagnostic Error Rate):</strong> DE / (CD + DE)</li>
|
| 377 |
+
</ul>
|
| 378 |
+
|
| 379 |
<h2>Suggested Research Directions</h2>
|
| 380 |
<ol>
|
| 381 |
<li>
|