Merge branch 'main' of https://huggingface.co/THUDM/visualglm-6b
Browse files- MODEL_LICENSE +3 -3
- README.md +5 -4
MODEL_LICENSE
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
The
|
| 2 |
|
| 3 |
1. Definitions
|
| 4 |
|
| 5 |
-
“Licensor” means the
|
| 6 |
|
| 7 |
-
“Software” means the
|
| 8 |
|
| 9 |
2. License Grant
|
| 10 |
|
|
|
|
| 1 |
+
The VisualGLM-6B License
|
| 2 |
|
| 3 |
1. Definitions
|
| 4 |
|
| 5 |
+
“Licensor” means the VisualGLM-6B Model Team that distributes its Software.
|
| 6 |
|
| 7 |
+
“Software” means the VisualGLM-6B model parameters made available under this license.
|
| 8 |
|
| 9 |
2. License Grant
|
| 10 |
|
README.md
CHANGED
|
@@ -4,6 +4,7 @@ language:
|
|
| 4 |
- en
|
| 5 |
tags:
|
| 6 |
- glm
|
|
|
|
| 7 |
- chatglm
|
| 8 |
- thudm
|
| 9 |
---
|
|
@@ -17,7 +18,7 @@ tags:
|
|
| 17 |
</p>
|
| 18 |
|
| 19 |
## 介绍
|
| 20 |
-
|
| 21 |
|
| 22 |
VisualGLM-6B 依靠来自于 [CogView](https://arxiv.org/abs/2105.13290) 数据集的30M高质量中文图文对,与300M经过筛选的英文图文对进行预训练,中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间;之后的微调阶段,模型在长视觉问答数据上训练,以生成符合人类偏好的答案。
|
| 23 |
|
|
@@ -33,12 +34,12 @@ pip install SwissArmyTransformer>=0.3.6 torch>1.10.0 torchvision transformers>=4
|
|
| 33 |
|
| 34 |
```ipython
|
| 35 |
>>> from transformers import AutoTokenizer, AutoModel
|
| 36 |
-
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/
|
| 37 |
-
>>> model = AutoModel.from_pretrained("THUDM/
|
| 38 |
>>> image_path = "your image path"
|
| 39 |
>>> response, history = model.chat(tokenizer, image_path, "描述这张图片。", history=[])
|
| 40 |
>>> print(response)
|
| 41 |
-
>>> response, history = model.chat(tokenizer, "这张图片可能是在什么场所拍摄的?", history=history)
|
| 42 |
>>> print(response)
|
| 43 |
```
|
| 44 |
|
|
|
|
| 4 |
- en
|
| 5 |
tags:
|
| 6 |
- glm
|
| 7 |
+
- visualglm
|
| 8 |
- chatglm
|
| 9 |
- thudm
|
| 10 |
---
|
|
|
|
| 18 |
</p>
|
| 19 |
|
| 20 |
## 介绍
|
| 21 |
+
VisualGLM-6B 是一个开源的,支持**图像、中文和英文**的多模态对话语言模型,语言模型基于 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B),具有 62 亿参数;图像部分通过训练 [BLIP2-Qformer](https://arxiv.org/abs/2301.12597) 构建起视觉模型与语言模型的桥梁,整体模型共78亿参数。
|
| 22 |
|
| 23 |
VisualGLM-6B 依靠来自于 [CogView](https://arxiv.org/abs/2105.13290) 数据集的30M高质量中文图文对,与300M经过筛选的英文图文对进行预训练,中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间;之后的微调阶段,模型在长视觉问答数据上训练,以生成符合人类偏好的答案。
|
| 24 |
|
|
|
|
| 34 |
|
| 35 |
```ipython
|
| 36 |
>>> from transformers import AutoTokenizer, AutoModel
|
| 37 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
|
| 38 |
+
>>> model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
|
| 39 |
>>> image_path = "your image path"
|
| 40 |
>>> response, history = model.chat(tokenizer, image_path, "描述这张图片。", history=[])
|
| 41 |
>>> print(response)
|
| 42 |
+
>>> response, history = model.chat(tokenizer, image_path, "这张图片可能是在什么场所拍摄的?", history=history)
|
| 43 |
>>> print(response)
|
| 44 |
```
|
| 45 |
|