Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.12224

Papers - Transformers Without Positional Encoding - NoPE

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2

Papers - Context - NoPE vs RoPE - Passkey Retrieval Viz

Page 7 fig shows NoPE extending passed the models context size from pretraining or fine-tuning

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Training - Eval - Sliding Window - Proof-pile

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Training - Eval - Sliding Window - PG19

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Datasets - Training - Context - Starcoderdata

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1
The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2

Papers - Training - Eval - Sliding Window Perplexity

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Datasets - Training - Context - SlimPajama

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - TinyLlama

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Attention - Training - Context - Head-based Scaling

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Papers - Attention - NoPE - Long Context with SoftMax Temp

Uniform scaling not as good as Head-based scaling

Length Generalization of Causal Transformers without Position Encoding

Paper • 2404.12224 • Published Apr 18, 2024 • 1

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs