Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.07522

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Paper • 2311.14495 • Published Nov 24, 2023 • 1
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 59
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 59
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18, 2024 • 38
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

Trellis Networks for Sequence Modeling

Paper • 1810.06682 • Published Oct 15, 2018 • 1
Pruning Very Deep Neural Network Channels for Efficient Inference

Paper • 2211.08339 • Published Nov 14, 2022 • 1
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Paper • 2309.14157 • Published Sep 25, 2023 • 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Trap of Feature Diversity in the Learning of MLPs

Paper • 2112.00980 • Published Dec 2, 2021 • 1
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

Paper • 2301.05816 • Published Jan 14, 2023 • 1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Paper • 2108.04384 • Published Aug 9, 2021 • 1

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

State Space Models

Papers On State Space Models

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11, 2024 • 37

DS' Daily paper

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 87
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 64
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Paper • 2405.20541 • Published May 30, 2024 • 22
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 44

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17, 2024 • 37
Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published Jun 12, 2024 • 15
WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published Jun 17, 2024 • 14
In-Context Editing: Learning Knowledge from Self-Induced Distributions

Paper • 2406.11194 • Published Jun 17, 2024 • 15

Steady state model

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11, 2024 • 37

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs