ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published 20 days ago • 15
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 108
view article Article Introducing RWKV — An RNN with the advantages of a transformer May 15, 2023 • 14
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 43
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers Paper • 2211.14730 • Published Nov 27, 2022 • 2
Priority Sampling of Large Language Models for Compilers Paper • 2402.18734 • Published Feb 28, 2024 • 16