Buy me a Ko-Fi • Support my work using Patreon
OpenChat-3.5-0106_32K-PoSE
Description
This model is Openchat-3.5-0106 with the context length extended from 8192 tokens to 32768 tokens using PoSE.
The model was fine-tuned using Rank-Stabilized LoRA and the LongAlpaca-12K dataset. I hope to continue extending the context in future versions and then apply the same methods to my upscaled versions of OpenChat-3.5 that were created using Block Expansion instead of Depth UP Scaling.
After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this in a future version, probably by using a pre-training dataset for continuous pre-training instead of a fine-tuning dataset so that upstream task are less affected. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt too. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 12.70 |
IFEval (0-Shot) | 39.69 |
BBH (3-Shot) | 8.83 |
MATH Lvl 5 (4-Shot) | 1.44 |
GPQA (0-shot) | 3.47 |
MuSR (0-shot) | 11.33 |
MMLU-PRO (5-shot) | 11.46 |
Citation
@misc{zhu2024poseefficientcontextwindow,
title={PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training},
author={Dawei Zhu and Nan Yang and Liang Wang and Yifan Song and Wenhao Wu and Furu Wei and Sujian Li},
year={2024},
eprint={2309.10400},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2309.10400},
}
- Downloads last month
- 56
Model tree for Pretergeek/OpenChat-3.5-0106_32K-PoSE
Dataset used to train Pretergeek/OpenChat-3.5-0106_32K-PoSE
Collection including Pretergeek/OpenChat-3.5-0106_32K-PoSE
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard39.690
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard8.830
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard1.440
- acc_norm on GPQA (0-shot)Open LLM Leaderboard3.470
- acc_norm on MuSR (0-shot)Open LLM Leaderboard11.330
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard11.460