-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 1 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2203.05482
-
Meta-Learning a Dynamical Language Model
Paper • 1803.10631 • Published -
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
Paper • 2003.11963 • Published -
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
Paper • 2212.04960 • Published • 1 -
Continuous Learning in a Hierarchical Multiscale Neural Network
Paper • 1805.05758 • Published • 1
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Averaging Weights Leads to Wider Optima and Better Generalization
Paper • 1803.05407 • Published • 2 -
Merging Models with Fisher-Weighted Averaging
Paper • 2111.09832 • Published • 1 -
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6 -
Diverse Weight Averaging for Out-of-Distribution Generalization
Paper • 2205.09739 • Published • 1 -
Fusing finetuned models for better pretraining
Paper • 2204.03044 • Published • 5 -
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Paper • 2309.07311 • Published • 3
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6 -
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 6 -
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 13 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 28