yulan-team/YuLan-Mini
Text Generation
•
Updated
•
670
•
27
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
Note A highly capable 2.4B lightweight LLM using only 1T pre-training data.
Note The model & optimizer states of the last curriculum phase before learning rate annealing.
Note The model & optimizer states of the 20th curriculum phase.