L-Hongbin
's Collections
Search, Verify and Feedback: Towards Next Generation Post-training
Paradigm of Foundation Models via Verifier Engineering
Paper
•
2411.11504
•
Published
•
19
Top-nσ: Not All Logits Are You Need
Paper
•
2411.07641
•
Published
•
19
Adaptive Decoding via Latent Preference Optimization
Paper
•
2411.09661
•
Published
•
10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context
Training
Paper
•
2411.13476
•
Published
•
15
Viewer
•
Updated
•
2.2M
•
8.18k
•
267
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
•
2411.13676
•
Published
•
40
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
58
Star Attention: Efficient LLM Inference over Long Sequences
Paper
•
2411.17116
•
Published
•
48
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
41
MH-MoE:Multi-Head Mixture-of-Experts
Paper
•
2411.16205
•
Published
•
24
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
•
2410.01131
•
Published
•
9
Viewer
•
Updated
•
77.7k
•
2.3k
•
311
Viewer
•
Updated
•
860k
•
3.5k
•
294
Viewer
•
Updated
•
327
•
755
•
129
allenai/tulu-3-sft-mixture
Viewer
•
Updated
•
939k
•
4.46k
•
94
CASIA-LM/ChineseWebText2.0
Viewer
•
Updated
•
2k
•
6.29k
•
19
Yi-Lightning Technical Report
Paper
•
2412.01253
•
Published
•
25
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
68
Weighted-Reward Preference Optimization for Implicit Model Fusion
Paper
•
2412.03187
•
Published
•
9
Paper
•
2412.08905
•
Published
•
97
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
16
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
Post-LN
Paper
•
2412.13795
•
Published
•
18
Paper
•
2412.15115
•
Published
•
337
A Post-Training Enhanced Optimization Approach for Small Language Models
Paper
•
2411.02939
•
Published
Viewer
•
Updated
•
133k
•
2.3k
•
128
How to Synthesize Text Data without Model Collapse?
Paper
•
2412.14689
•
Published
•
48
Viewer
•
Updated
•
28M
•
817
•
36
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
84
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Paper
•
2412.17498
•
Published
•
21
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
44
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks
with Reinforcement Fine-Tuning
Paper
•
2412.16849
•
Published
•
7