lixinhao commited on
Commit
1df3392
·
verified ·
1 Parent(s): ee222dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -9,7 +9,7 @@ tags:
9
  - multimodal
10
  pipeline_tag: video-text-to-text
11
  model-index:
12
- - name: VideoChat-Flash-Qwen2_5-2B_res448
13
  results:
14
  - task:
15
  type: multimodal
@@ -78,7 +78,7 @@ model-index:
78
  # 🦜VideoChat-Flash-Qwen2_5-2B_res448⚡
79
  [\[📰 Blog\]](https://internvideo.github.io/blog/2024-12-31-VideoChat-Flash) [\[📂 GitHub\]](https://github.com/OpenGVLab/VideoChat-Flash) [\[📜 Tech Report\]](https://www.arxiv.org/abs/2501.00574) [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash)
80
 
81
- VideoChat-Flash-2B is constructed upon UMT-L (300M) and Qwen2_5-2B, employing only **16 tokens per frame**. By leveraging Yarn to extend the context window to 128k (Qwen2's native context window is 32k), our model supports input sequences of up to approximately **10,000 frames**.
82
 
83
  > Note: Due to a predominantly English training corpus, the model only exhibits basic Chinese comprehension, to ensure optimal performance, using English for interaction is recommended.
84
 
 
9
  - multimodal
10
  pipeline_tag: video-text-to-text
11
  model-index:
12
+ - name: VideoChat-Flash-Qwen2_5-1_5B_res448
13
  results:
14
  - task:
15
  type: multimodal
 
78
  # 🦜VideoChat-Flash-Qwen2_5-2B_res448⚡
79
  [\[📰 Blog\]](https://internvideo.github.io/blog/2024-12-31-VideoChat-Flash) [\[📂 GitHub\]](https://github.com/OpenGVLab/VideoChat-Flash) [\[📜 Tech Report\]](https://www.arxiv.org/abs/2501.00574) [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash)
80
 
81
+ VideoChat-Flash-2B is constructed upon UMT-L (300M) and Qwen2.5-1.5B, employing only **16 tokens per frame**. By leveraging Yarn to extend the context window to 128k (Qwen2's native context window is 32k), our model supports input sequences of up to approximately **10,000 frames**.
82
 
83
  > Note: Due to a predominantly English training corpus, the model only exhibits basic Chinese comprehension, to ensure optimal performance, using English for interaction is recommended.
84