Safetensors
Latvian
bert
normundsg commited on
Commit
b17fa13
·
verified ·
1 Parent(s): e9e806d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -113
README.md CHANGED
@@ -1,113 +1,90 @@
1
- ---
2
- license: mit
3
- datasets:
4
- - SkyWater21/lv_emotions
5
- language:
6
- - lv
7
- base_model:
8
- - google-bert/bert-base-multilingual-cased
9
- ---
10
- Fine-tuned [Multilingual BERT](https://huggingface.co/google-bert/bert-base-multilingual-cased) for multi-label emotion classification task.
11
-
12
- Model was trained on [lv_emotions](https://huggingface.co/datasets/SkyWater21/lv_emotions) dataset. This dataset is Latvian translation of [GoEmotions](https://huggingface.co/datasets/go_emotions) and [Twitter Emotions](https://huggingface.co/datasets/SkyWater21/lv_twitter_emotions) dataset. Google Translate was used to generate the machine translation.
13
-
14
- Original 26 emotions were mapped to 6 base emotions as per Dr. Ekman theory.
15
-
16
- Labels predicted by classifier:
17
- ```yaml
18
- 0: anger
19
- 1: disgust
20
- 2: fear
21
- 3: joy
22
- 4: sadness
23
- 5: surprise
24
- 6: neutral
25
- ```
26
-
27
- Label mapping from 27 emotions from GoEmotion to 6 base emotions as per Dr. Ekman theory:
28
- |GoEmotion|Ekman|
29
- |---|---|
30
- | admiration | joy|
31
- | amusement | joy|
32
- | anger | anger|
33
- | annoyance | anger|
34
- | approval | joy|
35
- | caring | joy|
36
- | confusion | surprise|
37
- | curiosity | surprise|
38
- | desire | joy|
39
- | disappointment | sadness|
40
- | disapproval | anger|
41
- | disgust | disgust|
42
- | embarrassment | sadness|
43
- | excitement | joy|
44
- | fear | fear|
45
- | gratitude | joy|
46
- | grief | sadness|
47
- | joy | joy|
48
- | love | joy|
49
- | nervousness | fear|
50
- | optimism | joy|
51
- | pride | joy|
52
- | realization | surprise|
53
- | relief | joy|
54
- | remorse | sadness|
55
- | sadness | sadness|
56
- | surprise | surprise|
57
- | neutral | neutral|
58
-
59
- Seed used for random number generator is 42:
60
- ```python
61
- def set_seed(seed=42):
62
- random.seed(seed)
63
- np.random.seed(seed)
64
- torch.manual_seed(seed)
65
- if torch.cuda.is_available():
66
- torch.cuda.manual_seed_all(seed)
67
- ```
68
-
69
- Training parameters:
70
- ```yaml
71
- max_length: null
72
- batch_size: 32
73
- shuffle: True
74
- num_workers: 4
75
- pin_memory: False
76
- drop_last: False
77
- optimizer: adam
78
- lr: 0.00001
79
- weight_decay: 0
80
- problem_type: multi_label_classification
81
- num_epochs: 4
82
- ```
83
-
84
-
85
- Evaluation results on test split of [lv_go_emotions](https://huggingface.co/datasets/SkyWater21/lv_emotions/viewer/combined/lv_go_emotions_test)
86
- | |Precision|Recall|F1-Score|Support|
87
- |--------------|---------|------|--------|-------|
88
- |anger | 0.50| 0.35| 0.41| 726|
89
- |disgust | 0.44| 0.28| 0.35| 123|
90
- |fear | 0.58| 0.47| 0.52| 98|
91
- |joy | 0.80| 0.76| 0.78| 2104|
92
- |sadness | 0.66| 0.41| 0.51| 379|
93
- |surprise | 0.59| 0.55| 0.57| 677|
94
- |neutral | 0.71| 0.43| 0.54| 1787|
95
- |micro avg | 0.70| 0.55| 0.62| 5894|
96
- |macro avg | 0.61| 0.46| 0.52| 5894|
97
- |weighted avg | 0.69| 0.55| 0.61| 5894|
98
- |samples avg | 0.58| 0.56| 0.57| 5894|
99
-
100
- Evaluation results on test split of [lv_twitter_emotions](https://huggingface.co/datasets/SkyWater21/lv_emotions/viewer/combined/lv_twitter_emotions_test)
101
- | |Precision|Recall|F1-Score|Support|
102
- |--------------|---------|------|--------|-------|
103
- |anger | 0.92| 0.88| 0.90| 12013|
104
- |disgust | 0.90| 0.94| 0.92| 14117|
105
- |fear | 0.82| 0.67| 0.74| 3342|
106
- |joy | 0.88| 0.84| 0.86| 5913|
107
- |sadness | 0.86| 0.75| 0.80| 4786|
108
- |surprise | 0.94| 0.56| 0.70| 1510|
109
- |neutral | 0.00| 0.00| 0.00| 0|
110
- |micro avg | 0.90| 0.85| 0.87| 41681|
111
- |macro avg | 0.76| 0.66| 0.70| 41681|
112
- |weighted avg | 0.90| 0.85| 0.87| 41681|
113
- |samples avg | 0.85| 0.85| 0.85| 41681|
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - AiLab-IMCS-UL/go_emotions-lv
5
+ - AiLab-IMCS-UL/twitter_emotions-lv
6
+ language:
7
+ - lv
8
+ base_model:
9
+ - google-bert/bert-base-multilingual-cased
10
+ ---
11
+ # Latvian Basic Emotion Classifier
12
+
13
+ A fine-tuned version of [Multilingual BERT](https://huggingface.co/google-bert/bert-base-multilingual-cased) for multi-label text classification of six basic emotions (+neutral) in Latvian, as defined by Ekman’s theory.
14
+
15
+ The model is trained on a combined dataset of [go_emotions-lv](https://huggingface.co/datasets/AiLab-IMCS-UL/go_emotions-lv) and [twitter_emotions-lv](https://huggingface.co/datasets/AiLab-IMCS-UL/twitter_emotions-lv).
16
+
17
+ Predicted labels:
18
+ ```yaml
19
+ 0: anger
20
+ 1: disgust
21
+ 2: fear
22
+ 3: joy
23
+ 4: sadness
24
+ 5: surprise
25
+ 6: neutral
26
+ ```
27
+
28
+ The random seed used for initialization was 42:
29
+ ```python
30
+ def set_seed(seed=42):
31
+ random.seed(seed)
32
+ np.random.seed(seed)
33
+ torch.manual_seed(seed)
34
+ if torch.cuda.is_available():
35
+ torch.cuda.manual_seed_all(seed)
36
+ ```
37
+
38
+ Training parameters:
39
+ ```yaml
40
+ max_length: null
41
+ batch_size: 32
42
+ shuffle: True
43
+ num_workers: 4
44
+ pin_memory: False
45
+ drop_last: False
46
+ optimizer: adam
47
+ lr: 0.00001
48
+ weight_decay: 0
49
+ problem_type: multi_label_classification
50
+ num_epochs: 4
51
+ ```
52
+
53
+ ## Evaluation
54
+
55
+ Evaluation results on the test split of [go_emotions-lv](https://huggingface.co/datasets/AiLab-IMCS-UL/go_emotions-lv/viewer/simplified_ekman/test):
56
+ | |Precision|Recall|F1-Score|Support|
57
+ |--------------|---------|------|--------|-------|
58
+ |anger | 0.50| 0.35| 0.41| 726|
59
+ |disgust | 0.44| 0.28| 0.35| 123|
60
+ |fear | 0.58| 0.47| 0.52| 98|
61
+ |joy | 0.80| 0.76| 0.78| 2104|
62
+ |sadness | 0.66| 0.41| 0.51| 379|
63
+ |surprise | 0.59| 0.55| 0.57| 677|
64
+ |neutral | 0.71| 0.43| 0.54| 1787|
65
+ |micro avg | 0.70| 0.55| 0.62| 5894|
66
+ |macro avg | 0.61| 0.46| 0.52| 5894|
67
+ |weighted avg | 0.69| 0.55| 0.61| 5894|
68
+ |samples avg | 0.58| 0.56| 0.57| 5894|
69
+
70
+ Evaluation results on the test split of [twitter_emotions-lv](https://huggingface.co/datasets/AiLab-IMCS-UL/twitter_emotions-lv/viewer/simplified_ekman/test):
71
+ | |Precision|Recall|F1-Score|Support|
72
+ |--------------|---------|------|--------|-------|
73
+ |anger | 0.92| 0.88| 0.90| 12013|
74
+ |disgust | 0.90| 0.94| 0.92| 14117|
75
+ |fear | 0.82| 0.67| 0.74| 3342|
76
+ |joy | 0.88| 0.84| 0.86| 5913|
77
+ |sadness | 0.86| 0.75| 0.80| 4786|
78
+ |surprise | 0.94| 0.56| 0.70| 1510|
79
+ |micro avg | 0.90| 0.85| 0.87| 41681|
80
+ |macro avg | 0.76| 0.66| 0.70| 41681|
81
+ |weighted avg | 0.90| 0.85| 0.87| 41681|
82
+ |samples avg | 0.85| 0.85| 0.85| 41681|
83
+
84
+ ## See also
85
+
86
+ https://huggingface.co/AiLab-IMCS-UL/lvbert-emotions-ekman
87
+
88
+ ## Acknowledgements
89
+
90
+ This work was supported by the EU Recovery and Resilience Facility project [Language Technology Initiative](https://www.vti.lu.lv) (2.3.1.1.i.0/1/22/I/CFLA/002).