mrpeerat commited on
Commit
92cf270
·
verified ·
1 Parent(s): a05340d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -117,6 +117,19 @@ This release is part of WangchanX, a Large Language Model (LLM) research and dev
117
 
118
  [Link to WangchanX FLAN-like Dataset Creation Github repository](https://github.com/vistec-AI/WangchanX/tree/datasets)
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## Disclaimer
121
  This is the repository for the commercial instruction-tuned model.
122
  The model has _not_ been aligned for safety.
 
117
 
118
  [Link to WangchanX FLAN-like Dataset Creation Github repository](https://github.com/vistec-AI/WangchanX/tree/datasets)
119
 
120
+ ## Citation
121
+ ```
122
+ @misc{phatthiyaphaibun2025mangosteenopenthaicorpus,
123
+ title={Mangosteen: An Open Thai Corpus for Language Model Pretraining},
124
+ author={Wannaphong Phatthiyaphaibun and Can Udomcharoenchaikit and Pakpoom Singkorapoom and Kunat Pipatanakul and Ekapol Chuangsuwanich and Peerat Limkonchotiwat and Sarana Nutanong},
125
+ year={2025},
126
+ eprint={2507.14664},
127
+ archivePrefix={arXiv},
128
+ primaryClass={cs.CL},
129
+ url={https://arxiv.org/abs/2507.14664},
130
+ }
131
+ ```
132
+
133
  ## Disclaimer
134
  This is the repository for the commercial instruction-tuned model.
135
  The model has _not_ been aligned for safety.