Panic

Kernel

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago
franciszzj/Leffa
new activity 2 days ago
hexgrad/Kokoro-82M:How to fine-tune?
liked a model 2 days ago
hexgrad/Kokoro-82M
View all activity

Organizations

None yet

Kernel's activity

New activity in hexgrad/Kokoro-82M 2 days ago

How to fine-tune?

3
#15 opened 4 days ago by
Meshwa
reacted to SivilTaram's post with πŸ”₯ 9 months ago
view post
Post
2426
βš“οΈ Sailor: A New Multilingual Open LLM for South-East Asia 🌏

Last month we have released a new family of multilingual language models called **Sailor**, ranging from 0.5B to 7B parameters, continually pre-trained from the Qwen1.5 models. Based on our extensive benchmarking, the Sailor models demonstrate exceptional performance on South-East Asian languages, taking us one step closer to multilingual LLMs that can serve the diverse needs of the region and beyond.

Today, we're more than excited to share the key technical details behind the Sailor models! πŸ’ͺ

**Key highlights**:
πŸ” Data curation: Merging short examples, document-level code-switching, aggressive data cleaning and deduplication.
πŸ€– Tokenization Robustness: We find that BPE dropout is really effective to deal with prompt variations.
πŸ” Optimizing Data Mixture: We propose a new approach to automatically balance capabilities across different languages!
🌟 Recipe in Continual Pre-training: We discover a powerful metric that can help predict how well the Sailor models will perform on the original domain (e.g., English) after continual pre-training.

We are thrilled to share these technical details with the community and invite you to explore the Sailor models. We hope Sailor models take us one step closer to multilingual LLMs in the world! 🌍✨

To learn more, please access our research paper or reach out to our team.
πŸ”— Paper: Sailor: Open Language Models for South-East Asia (2404.03608)
🧩 Model: sail/sailor-language-models-65e19a749f978976f1959825
πŸ’» Code: https://github.com/sail-sg/sailor-llm
reacted to akhaliq's post with ❀️ 10 months ago
view post
Post
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Simple and Scalable Strategies to Continually Pre-train Large Language Models (2403.08763)

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by final loss and language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (EnglishrightarrowEnglish) and a stronger distribution shift (EnglishrightarrowGerman) at the 405M parameter model scale with large dataset sizes (hundreds of billions of tokens). Selecting the weak but realistic shift for larger-scale experiments, we also find that our continual learning strategies match the re-training baseline for a 10B parameter LLM. Our results demonstrate that LLMs can be successfully updated via simple and scalable continual learning strategies, matching the re-training baseline using only a fraction of the compute. Finally, inspired by previous work, we propose alternatives to the cosine learning rate schedule that help circumvent forgetting induced by LR re-warming and that are not bound to a fixed token budget.
reacted to vishesh-t27's post with πŸ”₯ 10 months ago
view post
Post
Komodo-7B is here !! Today we are releasing the base version of Komodo-7B along with the technical report.

Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct.

Komodo-7B performers really good in multiple Indonesian languages including Indonesian, Acehnese, Balinese, Banjarese, Buginese, Dayak Ngaju, Javanese, Lampungnese, Madurese, Minangkabau, Sundanese, and Toba Batak.

Our model outperforms various existing large language models including some multilingual models.

Technical Report: https://arxiv.org/abs/2403.09362

Base Model HuggingFace: Yellow-AI-NLP/komodo-7b-base

Kudos to the team @louisowen6 , @akanyaani & @biddwan Komodo: A Linguistic Expedition into Indonesia's Regional Languages (2403.09362)
New activity in Yellow-AI-NLP/komodo-7b-base 10 months ago