File size: 18,909 Bytes
63bf0d3
4b3aebb
 
 
 
8bccd4a
4b3aebb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0dc3800
4b3aebb
 
 
0dc3800
 
4b3aebb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35aa93b
0dc3800
4b3aebb
 
 
0dc3800
4b3aebb
 
 
 
 
 
 
980e5ab
8ad54b2
05c07b8
 
980e5ab
4b3aebb
 
0dc3800
4b3aebb
0dc3800
4b3aebb
 
 
 
 
 
0dc3800
4b3aebb
0dc3800
4b3aebb
 
 
0dc3800
4b3aebb
0dc3800
 
4b3aebb
 
 
 
 
0dc3800
4b3aebb
 
 
 
 
 
 
 
0dc3800
 
4b3aebb
 
0dc3800
4b3aebb
 
05c07b8
4b3aebb
0dc3800
 
4b3aebb
05c07b8
 
 
 
 
 
 
 
f103e2b
4b3aebb
 
 
 
 
 
547dc90
4b3aebb
 
7539b18
4b3aebb
 
 
 
 
 
0dc3800
 
 
4b3aebb
 
 
 
 
 
7539b18
4b3aebb
 
0dc3800
4b3aebb
0dc3800
8fe503d
4b3aebb
f103e2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b3aebb
f103e2b
 
 
 
 
4b3aebb
f103e2b
 
 
 
 
4b3aebb
 
 
 
 
 
 
 
0dc3800
 
 
 
 
4b3aebb
 
 
 
9f15275
4b3aebb
0dc3800
4b3aebb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f15275
 
 
 
 
 
 
 
 
4b3aebb
9f15275
4b3aebb
9f15275
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b3aebb
 
0dc3800
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b3aebb
 
 
 
 
 
c3f83e4
4b3aebb
 
980e5ab
4b3aebb
 
 
 
980e5ab
4b3aebb
 
 
 
0dc3800
4b3aebb
 
 
 
 
 
 
 
 
 
 
63bf0d3
4b3aebb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width" />
  <title>IqraEval.2 Challenge Interspeech 2026</title>
  <style>
    :root {
      --navy-blue: #001f4d;
      --coral: #ff6f61;
      --light-gray: #f5f7fa;
      --text-dark: #222;
    }
    body {
      font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
      background-color: var(--light-gray);
      color: var(--text-dark);
      margin: 20px;
      line-height: 1.6;
    }
    h1, h2, h3 {
      color: var(--navy-blue);
      font-weight: 700;
      margin-top: 1.2em;
    }
    h1 {
      text-align: center;
      font-size: 2.8rem;
      margin-bottom: 0.3em;
    }
    h2 {
      border-bottom: 3px solid var(--coral);
      padding-bottom: 0.3em;
    }
    h3 {
      color: var(--coral);
      margin-top: 1em;
    }
    p, ul, pre, ol {
      max-width: 900px;
      margin: 0.8em auto;
    }
    ul, ol { padding-left: 1.2em; }
    ul li, ol li { margin: 0.4em 0; }
    code {
      background-color: #eef4f8;
      color: var(--navy-blue);
      padding: 2px 6px;
      border-radius: 4px;
      font-family: Consolas, monospace;
      font-size: 0.9em;
    }
    pre {
      background-color: #eef4f8;
      padding: 1em;
      border-radius: 8px;
      overflow-x: auto;
      font-size: 0.95em;
    }
    a {
      color: var(--coral);
      text-decoration: none;
    }
    a:hover { text-decoration: underline; }
    .card {
      max-width: 960px;
      background: white;
      margin: 0 auto 40px;
      padding: 2em 2.5em;
      box-shadow: 0 4px 14px rgba(0,0,0,0.1);
      border-radius: 12px;
    }
    img {
      display: block;
      margin: 20px auto;
      max-width: 100%;
      height: auto;
      border-radius: 8px;
      box-shadow: 0 4px 8px rgba(0,31,77,0.15);
    }
    .centered p {
      text-align: center;
      font-style: italic;
      color: var(--navy-blue);
      margin-top: 0.4em;
    }
    .highlight {
      color: var(--coral);
      font-weight: 700;
    }
    /* nested lists in paragraphs */
    p > ul { margin-top: 0.3em; }
  </style>
</head>
<body>
  <div class="card">
    <h1>IqraEval.2 Challenge Interspeech 2026</h1>
    <img src="IqraEval.png" alt="Interspeech 2026 Challenge Logo" />

    <h2>Overview</h2>
    <p>
      This <strong>Challenge Interspeech 2026</strong> is a shared task aimed at advancing <strong>automatic assessment of Modern Standard Arabic (MSA) pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on MSA provides a standardized and well-defined context for evaluating Arabic pronunciation.
    </p>
    <p>
      Participants will develop systems capable of detecting mispronunciations (e.g., substitution, deletion, or insertion of phonemes).
    </p>

    <h2>Timeline</h2>
    <ul>
      <li><strong>1 December 2025</strong>: Registration opens</li>
      <li><strong>15 December 2025</strong>: Release of training data, evaluation set, Arabic phoneme set, and phonemiser</li>
      <li><strong>15 February 2026</strong>: Registration closes; leaderboard frozen</li>
      <li><strong>17 February 2026</strong>: Results announced</li>
      <li><strong>25 February 2026</strong>: Challenge paper submission deadline</li>
    </ul>

    <h2>Task Description: MSA Mispronunciation Detection System</h2>
    <p>
      Design a model to detect and provide detailed feedback on mispronunciations in MSA speech. Users read vowelized sentences; the model predicts the spoken phoneme sequence and flags deviations. Evaluation is on the <strong>MSA-Test</strong> dataset with human‐annotated errors.
    </p>
    <div class="centered">
      <img src="task.png" alt="System Overview" />
      <p>Figure: Overview of the Mispronunciation Detection Workflow</p>
    </div>

    <h3>1. Read the Sentence</h3>
    <p>
      System shows a <strong>Reference Sentence</strong> plus its <strong>Reference Phoneme Sequence</strong>.
    </p>
    <p><strong>Example:</strong></p>
    <ul>
      <li><strong>Arabic:</strong> يَتَحَدَّثُ النَّاسُ اللُّغَةَ الْعَرَبِيَّةَ</li>
      <li>
        <strong>Phoneme:</strong>
        <code>&lt; y a t a H a d d a v u n n aa s u l l u g h a t a l E a r a b i y y a t a</code>
      </li>
    </ul>

    <h3>2. Save Recording</h3>
    <p>
      User speaks; system captures and stores the audio waveform.
    </p>

    <h3>3. Mispronunciation Detection</h3>
    <p>
      Model predicts the phoneme sequence—deviations from reference indicate mispronunciations.
    </p>
    <p><strong>Example of Mispronunciation:</strong></p>
    <ul>
      <li><strong>Reference:</strong> <code>&lt; y a t a H a d d a v u n n aa s u l l u g h a t a l E a r a b i y y a t a</code></li>
      <li><strong>Predicted:</strong> <code>&lt; y a t a H a d d a <span class="highlight">s</span> u n n aa s u l l u g h a t u l E a r a b i y y a t a</code></li>
    </ul>
    <p>
      Here, <code>v</code><code>s</code> (substitution) represents a common pronunciation error.
    </p>

    <!-- <h2>Phoneme Set Description</h2>
    <p>
      The phoneme set used in this work is based on a specialized phonetizer developed for vowelized MSA. It includes a comprehensive range of phonemes designed to capture key phonetic and prosodic features of standard Arabic speech, such as stress, pausing, intonation, emphaticness, and notably, gemination. Gemination—the doubling of consonant sounds—is explicitly represented by duplicating the consonant symbol (e.g., <code>/b/</code> becomes <code>/bb/</code>).
      This phoneme set provides a detailed yet practical representation of the speech sounds relevant for accurate mispronunciation detection in MSA.
      For further details, including the full phoneme inventory, see <a href="https://huggingface.co/spaces/IqraEval/ArabicPhoneme">Phoneme Inventory</a>.
    </p> -->
    <h2>Phoneme Set Description</h2>
      <p>
        The phoneme set employed in this work derives from a specialized phonetizer developed specifically for vowelized Modern Standard Arabic (MSA). It encompasses a comprehensive inventory of phonemes designed to capture essential phonetic and prosodic features, including stress, pausing, intonation, emphaticness, and gemination. Notably, gemination—the lengthening of consonant sounds—is explicitly represented by duplicating the consonant symbol (e.g., <code>/b/</code> becomes <code>/bb/</code>). This approach ensures a detailed yet practical representation of speech sounds, which is critical for accurate mispronunciation detection.
      </p>
      <p>
        To phonemize additional datasets or custom text using this standard, we provide the open-source tool at the <a href="https://github.com/Iqra-Eval/MSA_phonetiser">MSA Phonetizer Repository</a>. <strong>Important:</strong> This phonetizer requires the input Arabic text to be <strong>fully diacritized</strong> to ensure accurate phonetic transcription. For further details on the symbols used, please refer to the <a href="https://huggingface.co/spaces/IqraEval/ArabicPhoneme">Phoneme Inventory</a>.
      </p>
<!-- 
    <h2>Training Dataset: Description</h2>
    <p>
      Hosted on Hugging Face:
    </p>
    <ul>
      <li>
        <strong>Training:</strong> 79 hours of MSA speech augmented with additional Arabic data <strong>(Will be released on 15 December 2025)</strong>
      </li>
      <li>
        <strong>Development:</strong> 3.4 hours as dev set <strong>(Will be released on 15 December 2025)</strong>
      </li>
    </ul>
    <p>
      <strong>Columns:</strong>
      <ul>
        <li><code>audio</code>: waveform</li>
        <li><code>sentence</code>: original text (sentence)</li>
        <li><code>index</code>: sentence ID</li>
        <li><code>tashkeel_sentence</code>: fully diacritized text (sentence)</li>
        <li><code>phoneme</code>: phoneme sequence (using phonetizer)</li>
      </ul>
    </p>

    <h2>Training Dataset: TTS Data (Optional)</h2>
    <p>
      Auxiliary high-quality TTS corpus for augmentation: <strong>(Will be released on 15 December 2025)</strong>
    </p>

    <h2>Test Dataset: MSA-Test</h2>
    <p>
      98 sentences × 18 speakers ≈ 2 h, with deliberate errors and human annotations.
      <code>load_dataset("IqraEval/open_testset")</code>
    </p>
 -->

    <h2>Training Data Overview</h2>
    <p>
  To ensure robustness, our training strategy utilizes a mix of native speech (pseudo-labeled), synthetic mispronunciations, and real recorded errors.
</p>

<h3>1. Native Speech (Pseudo-Labeled)</h3>
<p>
  <strong>Dataset:</strong> <code>IqraEval/Iqra_train</code><br>
  <strong>Volume:</strong> ~79 hours (Train) + 3.4 hours (Dev)<br>
  This dataset consists of recordings from native MSA speakers. As these speakers are assumed to pronounce the text correctly, this subset is treated as "Golden" data using pseudo-labels.
</p>
<p><strong>Columns:</strong></p>
<ul>
  <li><code>audio</code>: The speech waveform.</li>
  <li><code>sentence</code>: The original raw text.</li>
  <li><code>tashkeel_sentence</code>: Fully diacritized text, generated using an internal SOTA diacritizer (assumed correct).</li>
  <li><code>phoneme_ref</code>: The reference canonical phoneme sequence.</li>
  <li><code>phoneme_mis</code>: The realized phoneme sequence.
    <br><em>Note: Since no errors are present, this is identical to <code>phoneme_ref</code>.</em>
  </li>
</ul>

<h3>2. Synthetic Mispronunciations (TTS)</h3>
<p>
  <strong>Dataset:</strong> <code>IqraEval/Iqra_TTS</code><br>
  <strong>Volume:</strong> ~80 hours<br>
  To compensate for the lack of errors in the native set, we generated a synthetic dataset using various trained TTS systems. Mispronunciations were deliberately introduced into the input text before audio generation.
</p>
<p><strong>Columns:</strong></p>
<ul>
  <li><code>audio</code>: The synthesized waveform.</li>
  <li><code>sentence_ref</code>: The original correct text.</li>
  <li><code>sentence_mis</code>: The text containing deliberate errors.</li>
  <li><code>phoneme_ref</code>: The canonical phoneme sequence of the correct text.</li>
  <li><code>phoneme_aug</code>: The phoneme sequence corresponding to the synthesized mispronunciation.</li>
  <li><code>tashkeel_sentence</code>: The fully diacritized version of the reference text.</li>
</ul>

<h3>3. Real Mispronunciations (Interspeech 2026)</h3>
<p>
  <strong>Dataset:</strong> <code>IqraEval/Iqra_Extra_IS26</code><br>
  <strong>Volume:</strong> ~2 hours<br>
  Moving beyond synthetic data, this subset contains real recordings of human mispronunciations collected specifically for Interspeech 2026.
</p>
<p><strong>Columns:</strong></p>
<ul>
  <li><code>audio</code>: The speech waveform.</li>
  <li><code>sentence</code>: The original text.</li>
  <li><code>phoneme_ref</code>: The target canonical phoneme sequence.</li>
  <li><code>phoneme_mis</code>: The actual realized phonemes containing human errors.</li>
</ul>

<hr>

<h2>Evaluation Dataset</h2>
<p>
  <strong>Dataset:</strong> <code>IqraEval/QuranMB.v2</code><br>
  Currently, only the audio files are released for this evaluation set. It serves as a benchmark for detecting mispronunciations in a distinct domain.
</p>

<div style="background-color: #f0f4f8; padding: 15px; border-left: 5px solid #0056b3; margin-top: 20px;">
  <strong>Important Note on Data Leakage:</strong><br>
  Strict measures were taken to ensure experimental integrity. We have verified that there is <strong>no overlap in speakers or content</strong> (sentences) between the training datasets (`Iqra_train`, `Iqra_TTS`, `Iqra_Extra_IS26`) and the evaluation datasets.
                                                                    </div>
    
    <h2>Submission Details (Draft)</h2>
    <p>
      Submit a UTF-8 CSV named <code>teamID_submission.csv</code> with two columns:
    </p>
    <ul>
      <li><strong>ID:</strong> audio filename (no extension)</li>
      <li><strong>Labels:</strong> predicted phoneme sequence (space-separated)</li>
    </ul>
    <pre>ID,Labels
0000_0001, y a t a H a d d a ...
0000_0002, m a a n a n s a ...
...
      </pre>
    <p>
      <strong>Note:</strong> no extra spaces, single CSV, no archives.
    </p>

    <!-- <h2>Evaluation Criteria</h2>
    <p>
      The Leaderboard is based on phoneme-level <strong>F1-score</strong>.
      We use a hierarchical evaluation (detection + diagnostic) per <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>.
    </p>
    <ul>
      <li><em><strong>What is said</strong></em>: annotated phoneme sequence</li>
      <li><em><strong>What is predicted</strong></em>: model output</li>
      <li><em><strong>What should have been said</strong></em>: reference sequence</li>
    </ul>
    <p>From these we compute:</p>
    <ul>
      <li><strong>TA:</strong> correct phonemes accepted</li>
      <li><strong>TR:</strong> mispronunciations correctly detected</li>
      <li><strong>FR:</strong> correct phonemes flagged as errors</li>
      <li><strong>FA:</strong> mispronunciations missed</li>
    </ul>
    <p>Rates:</p>
    <ul>
      <li><strong>FRR:</strong> FR/(TA+FR)</li>
      <li><strong>FAR:</strong> FA/(FA+TR)</li>
      <li><strong>DER:</strong> DE/(CD+DE)</li>
    </ul>
    <p>
      Plus standard Precision, Recall, F1 for detection:
      <ul>
        <li>Precision = TR/(TR+FR)</li>
        <li>Recall = TR/(TR+FA)</li>
        <li>F1 = 2·P·R/(P+R)</li>
      </ul>
    </p> -->

    <h2>Evaluation Criteria</h2>

<div style="background-color: #f0f8ff; border-left: 5px solid #007bff; padding: 15px; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #007bff;">🏆 Primary Metric</h3>
    <p style="margin-bottom: 0;">
        The Leaderboard is ranked primarily by the <strong>Phoneme-level F1-score</strong>. 
        While other metrics (FRR, FAR, DER) are computed for analysis, <strong>F1</strong> determines the final standing.
    </p>
</div>

<p>
    We use a hierarchical evaluation strategy (detection + diagnostic) based on the 
    <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a> framework.
</p>

<h3>1. Input Definitions</h3>
<ul>
    <li><strong>What is said:</strong> The annotated phoneme sequence.</li>
    <li><strong>What is predicted:</strong> The output from your model.</li>
    <li><strong>What should have been said:</strong> The reference (target) sequence.</li>
</ul>

<h3>2. Confusion Matrix Components</h3>
<p>From the inputs above, we compute the following counts:</p>
<table style="width: 100%; border-collapse: collapse; margin-bottom: 20px;">
    <tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
        <td style="padding: 8px;"><strong>TA (True Accept)</strong></td>
        <td style="padding: 8px;">Correct phonemes properly accepted.</td>
    </tr>
    <tr style="border-bottom: 1px solid #ddd;">
        <td style="padding: 8px;"><strong>TR (True Reject)</strong></td>
        <td style="padding: 8px;">Mispronunciations correctly detected.</td>
    </tr>
    <tr style="background-color: #f9f9f9; border-bottom: 1px solid #ddd;">
        <td style="padding: 8px;"><strong>FR (False Reject)</strong></td>
        <td style="padding: 8px;">Correct phonemes incorrectly flagged as errors.</td>
    </tr>
    <tr>
        <td style="padding: 8px;"><strong>FA (False Accept)</strong></td>
        <td style="padding: 8px;">Mispronunciations missed (labeled as correct).</td>
    </tr>
</table>

<h3>3. Calculated Metrics</h3>

<h4>Detection Metrics (Leaderboard Ranking)</h4>
<ul>
    <li><strong>Precision:</strong> TR / (TR + FR)</li>
    <li><strong>Recall:</strong> TR / (TR + FA)</li>
    <li><strong>F1-Score:</strong> 2 · (Precision · Recall) / (Precision + Recall)</li>
</ul>

<h4>Diagnostic Rates (Auxiliary)</h4>
<ul>
    <li><strong>FRR (False Reject Rate):</strong> FR / (TA + FR)</li>
    <li><strong>FAR (False Accept Rate):</strong> FA / (FA + TR)</li>
    <li><strong>DER (Diagnostic Error Rate):</strong> DE / (CD + DE)</li>
</ul>
    
    <h2>Suggested Research Directions</h2>
    <ol>
      <li>
        <strong>Advanced Mispronunciation Detection Models</strong><br>
        Apply state-of-the-art self-supervised models (e.g., Wav2Vec2.0, HuBERT), using variants that are pre-trained/fine-tuned on Arabic speech. These models can then be fine-tuned on MSA datasets to improve phoneme-level accuracy.
      </li>
      <li>
        <strong>Data Augmentation Strategies</strong><br>
        Create synthetic mispronunciation examples using pipelines like
        <a href="https://arxiv.org/abs/2211.00923" target="_blank">SpeechBlender</a>.
        Augmenting limited Arabic speech data helps mitigate data scarcity and improves model robustness.
      </li>
      <li>
        <strong>Analysis of Common Mispronunciation Patterns</strong><br>
        Perform statistical analysis on the MSA-Test dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels).
        These insights can drive targeted training and tailored feedback rules.
      </li>
    </ol>

    <h2>Registration</h2>
    <p>
      Teams and individual participants must register to gain access to the test set. Please complete the registration form using the link below:
    </p>
    <p>
      <a href="https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fforms%2Fd%2Fe%2F1FAIpQLSdDyEP7vzJnpvthiEK6WPws2vpuI_yqbzOzEVqHKs0wdDY_Lg%2Fviewform%3Fusp%3Dheader&data=05%7C02%7C%7C828e4c0463a24cca40de08de2e808b16%7C13a8d02d59f3416a8231b3080e639cad%7C0%7C0%7C638999326802565605%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CUdgz9Az%2FrFF%2FThZgSkvaXYZneSeVNTfv5drPhbKK44%3D&reserved=0" target="_blank">Registration Form</a>
    </p>
    <p>
      Registration opens on December 1, 2025.
    </p>

    <h2>Future Updates</h2>
    <p>
      Further details on the open-set leaderboard submission will be posted on the shared task website (December 15, 2025). Stay tuned!
    </p>

    <h2>Contact and Support</h2>
    <p>
      For inquiries and support, reach out to the task coordinators.
    </p>

    <h2>References</h2>
    <ul>
      <li>El Kheir Y. et al., “SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation,” arXiv:2211.00923, 2022.</li>
      <li>Aly S. A. et al., “ASMDD: Arabic Speech Mispronunciation Detection Dataset,” arXiv:2111.01136, 2021.</li>
      <li>Moustafa A. & Aly S. A., “Efficient Voice Identification Using Wav2Vec2.0 and HuBERT…,” arXiv:2111.06331, 2021.</li>
      <li>El Kheir Y. et al., “Automatic Pronunciation Assessment – A Review,” arXiv:2310.13974, 2021.</li>
    </ul>
  </div>
</body>
</html>