SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning Paper • 2409.05556 • Published Sep 9, 2024 • 2
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 44
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23, 2024 • 35
Learning Task Decomposition to Assist Humans in Competitive Programming Paper • 2406.04604 • Published Jun 7, 2024 • 4
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28, 2024 • 35
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 97
BERTopic: Neural topic modeling with a class-based TF-IDF procedure Paper • 2203.05794 • Published Mar 11, 2022
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published Oct 28, 2024 • 30
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published Oct 11, 2024 • 44
JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published Oct 16, 2024 • 43
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 49
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published Sep 25, 2024 • 10
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Paper • 2310.03714 • Published Oct 5, 2023 • 33
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published Oct 7, 2024 • 20
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published Sep 21, 2024 • 12
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study Paper • 2409.17580 • Published Sep 26, 2024 • 8
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23, 2024 • 13
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper • 2409.13373 • Published Sep 20, 2024 • 3
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs Paper • 2408.00114 • Published Jul 31, 2024
LLMs Will Always Hallucinate, and We Need to Live With This Paper • 2409.05746 • Published Sep 9, 2024 • 3
Planning In Natural Language Improves LLM Search For Code Generation Paper • 2409.03733 • Published Sep 5, 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10, 2024 • 64
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation Paper • 2409.03271 • Published Sep 5, 2024 • 2
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Paper • 2408.08946 • Published Aug 16, 2024 • 11
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Paper • 2410.24032 • Published Oct 31, 2024 • 9
AAAR-1.0: Assessing AI's Potential to Assist Research Paper • 2410.22394 • Published Oct 29, 2024 • 14