Peter Szemraj's picture

Peter Szemraj PRO

pszemraj

·

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Recent Activity

upvoted a paper 2 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

updated a dataset 4 days ago

BEE-spoke-data/govdocs1-by-extension

updated a dataset 4 days ago

BEE-spoke-data/cosmopedia-v2-mincols

View all activity

Organizations

pszemraj's activity

upvoted a paper 2 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 13 days ago • 63

upvoted a collection 12 days ago

Embedding Model Datasets

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3, 2024 • 91

upvoted 4 papers 18 days ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published 19 days ago • 48

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

Paper • 2412.15204 • Published 18 days ago • 33

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 18 days ago • 337

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 20 days ago • 119

upvoted a collection 18 days ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 18 days ago • 116

upvoted a paper 18 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 25 days ago • 83

upvoted 2 papers 19 days ago

OmniPred: Language Models as Universal Regressors

Paper • 2402.14547 • Published Feb 22, 2024 • 12

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11, 2024 • 19

upvoted 5 papers 26 days ago

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 56

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 20

Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 46

Structured 3D Latents for Scalable and Versatile 3D Generation

Paper • 2412.01506 • Published Dec 2, 2024 • 50

upvoted 4 papers about 2 months ago

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published Nov 16, 2024 • 44

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 43

Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29, 2024 • 31

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 64

upvoted a collection 2 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 15 days ago • 197