Update README.md
Browse files
README.md
CHANGED
|
@@ -117,6 +117,19 @@ This release is part of WangchanX, a Large Language Model (LLM) research and dev
|
|
| 117 |
|
| 118 |
[Link to WangchanX FLAN-like Dataset Creation Github repository](https://github.com/vistec-AI/WangchanX/tree/datasets)
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
## Disclaimer
|
| 121 |
This is the repository for the commercial instruction-tuned model.
|
| 122 |
The model has _not_ been aligned for safety.
|
|
|
|
| 117 |
|
| 118 |
[Link to WangchanX FLAN-like Dataset Creation Github repository](https://github.com/vistec-AI/WangchanX/tree/datasets)
|
| 119 |
|
| 120 |
+
## Citation
|
| 121 |
+
```
|
| 122 |
+
@misc{phatthiyaphaibun2025mangosteenopenthaicorpus,
|
| 123 |
+
title={Mangosteen: An Open Thai Corpus for Language Model Pretraining},
|
| 124 |
+
author={Wannaphong Phatthiyaphaibun and Can Udomcharoenchaikit and Pakpoom Singkorapoom and Kunat Pipatanakul and Ekapol Chuangsuwanich and Peerat Limkonchotiwat and Sarana Nutanong},
|
| 125 |
+
year={2025},
|
| 126 |
+
eprint={2507.14664},
|
| 127 |
+
archivePrefix={arXiv},
|
| 128 |
+
primaryClass={cs.CL},
|
| 129 |
+
url={https://arxiv.org/abs/2507.14664},
|
| 130 |
+
}
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
## Disclaimer
|
| 134 |
This is the repository for the commercial instruction-tuned model.
|
| 135 |
The model has _not_ been aligned for safety.
|