Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- multimodal
|
10 |
pipeline_tag: video-text-to-text
|
11 |
model-index:
|
12 |
-
- name: VideoChat-Flash-Qwen2_5-
|
13 |
results:
|
14 |
- task:
|
15 |
type: multimodal
|
@@ -78,7 +78,7 @@ model-index:
|
|
78 |
# 🦜VideoChat-Flash-Qwen2_5-2B_res448⚡
|
79 |
[\[📰 Blog\]](https://internvideo.github.io/blog/2024-12-31-VideoChat-Flash) [\[📂 GitHub\]](https://github.com/OpenGVLab/VideoChat-Flash) [\[📜 Tech Report\]](https://www.arxiv.org/abs/2501.00574) [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash)
|
80 |
|
81 |
-
VideoChat-Flash-2B is constructed upon UMT-L (300M) and
|
82 |
|
83 |
> Note: Due to a predominantly English training corpus, the model only exhibits basic Chinese comprehension, to ensure optimal performance, using English for interaction is recommended.
|
84 |
|
|
|
9 |
- multimodal
|
10 |
pipeline_tag: video-text-to-text
|
11 |
model-index:
|
12 |
+
- name: VideoChat-Flash-Qwen2_5-1_5B_res448
|
13 |
results:
|
14 |
- task:
|
15 |
type: multimodal
|
|
|
78 |
# 🦜VideoChat-Flash-Qwen2_5-2B_res448⚡
|
79 |
[\[📰 Blog\]](https://internvideo.github.io/blog/2024-12-31-VideoChat-Flash) [\[📂 GitHub\]](https://github.com/OpenGVLab/VideoChat-Flash) [\[📜 Tech Report\]](https://www.arxiv.org/abs/2501.00574) [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash)
|
80 |
|
81 |
+
VideoChat-Flash-2B is constructed upon UMT-L (300M) and Qwen2.5-1.5B, employing only **16 tokens per frame**. By leveraging Yarn to extend the context window to 128k (Qwen2's native context window is 32k), our model supports input sequences of up to approximately **10,000 frames**.
|
82 |
|
83 |
> Note: Due to a predominantly English training corpus, the model only exhibits basic Chinese comprehension, to ensure optimal performance, using English for interaction is recommended.
|
84 |
|