Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.03450

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 4
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Paper • 2309.09582 • Published Sep 18, 2023 • 4
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 11
Evaluating the Robustness to Instructions of Large Language Models

Paper • 2308.14306 • Published Aug 28, 2023 • 1

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Paper • 2310.15494 • Published Oct 24, 2023 • 1
A Long Way to Go: Investigating Length Correlations in RLHF

Paper • 2310.03716 • Published Oct 5, 2023 • 9
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 50
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 10

LLM Tech Report

XGen-7B Technical Report

Paper • 2309.03450 • Published Sep 7, 2023 • 8
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 19

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 19
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Paper • 2310.08678 • Published Oct 12, 2023 • 12
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 243
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 16
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 10

XGen-7B Technical Report

Paper • 2309.03450 • Published Sep 7, 2023 • 8
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 44
Robotic Table Tennis: A Case Study into a High Speed Learning System

Paper • 2309.03315 • Published Sep 6, 2023 • 6
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs