henern
's Collections
Data
updated
Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment
Paper
•
2401.12474
•
Published
•
35
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic
Prompt Compression
Paper
•
2403.12968
•
Published
•
24
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities
of Large Language Models
Paper
•
2310.00746
•
Published
•
1
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper
•
2402.04333
•
Published
•
3
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
with Web Data, and Web Data Only
Paper
•
2306.01116
•
Published
•
32
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
Scale
Paper
•
2406.17557
•
Published
•
89
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
•
2406.20094
•
Published
•
97
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
66
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Paper
•
2407.18129
•
Published
•
12
Meltemi: The first open Large Language Model for Greek
Paper
•
2407.20743
•
Published
•
68
Meta-Rewarding Language Models: Self-Improving Alignment with
LLM-as-a-Meta-Judge
Paper
•
2407.19594
•
Published
•
20
Paper
•
2408.05366
•
Published
•
12
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
Paper
•
2407.21669
•
Published
DiaSynth -- Synthetic Dialogue Generation Framework
Paper
•
2409.19020
•
Published
•
20
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for
pre-training large language models
Paper
•
2410.18505
•
Published
•
10
Conifer: Improving Complex Constrained Instruction-Following Ability of
Large Language Models
Paper
•
2404.02823
•
Published
•
2