2๏ธโฃ versions: 70B and 8B ๐ง Trained by distilling logits from Llama-3.1-405B ๐ฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper) โ๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit ๐ค 8B weights are available on HF (not the 70B)