Collections
Discover the best community collections!
Collections including paper arxiv:2406.04823
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 120 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 5
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 36
-
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Paper • 1901.08746 • Published • 3 -
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 13 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1