kaizuberbuehler
's Collections
Agents
updated
More Agents Is All You Need
Paper
•
2402.05120
•
Published
•
51
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
•
2402.07456
•
Published
•
42
Generative Agents: Interactive Simulacra of Human Behavior
Paper
•
2304.03442
•
Published
•
12
Language Agent Tree Search Unifies Reasoning Acting and Planning in
Language Models
Paper
•
2310.04406
•
Published
•
8
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and
Optimisation
Paper
•
2312.13010
•
Published
•
4
GAIA: a benchmark for General AI Assistants
Paper
•
2311.12983
•
Published
•
187
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
65
Octopus v2: On-device language model for super agent
Paper
•
2404.01744
•
Published
•
56
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
•
2404.12753
•
Published
•
41
Scaling Instructable Agents Across Many Simulated Worlds
Paper
•
2404.10179
•
Published
•
27
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
•
2404.07972
•
Published
•
46
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Paper
•
2404.05902
•
Published
•
20
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
•
2404.05719
•
Published
•
83
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
Navigating Agent
Paper
•
2404.03648
•
Published
•
24
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper
•
2305.16291
•
Published
•
9
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper
•
2309.08172
•
Published
•
11
The Rise and Potential of Large Language Model Based Agents: A Survey
Paper
•
2309.07864
•
Published
•
7
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper
•
2303.11366
•
Published
•
4
LEGENT: Open Platform for Embodied Agents
Paper
•
2404.18243
•
Published
•
21
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
28
OpenVLA: An Open-Source Vision-Language-Action Model
Paper
•
2406.09246
•
Published
•
36
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex
Interactive Tasks
Paper
•
2305.17390
•
Published
•
2
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Paper
•
2407.18961
•
Published
•
40
AppWorld: A Controllable World of Apps and People for Benchmarking
Interactive Coding Agents
Paper
•
2407.18901
•
Published
•
33
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
•
2407.21787
•
Published
•
12
OmniParser for Pure Vision Based GUI Agent
Paper
•
2408.00203
•
Published
•
24
WebArena: A Realistic Web Environment for Building Autonomous Agents
Paper
•
2307.13854
•
Published
•
24
Diffusion Augmented Agents: A Framework for Efficient Exploration and
Transfer Learning
Paper
•
2407.20798
•
Published
•
24
AgentGen: Enhancing Planning Abilities for Large Language Model based
Agent via Environment and Task Generation
Paper
•
2408.00764
•
Published
•
1
Diversity Empowers Intelligence: Integrating Expertise of Software
Engineering Agents
Paper
•
2408.07060
•
Published
•
41
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
118
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Paper
•
2408.14354
•
Published
•
41
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated
clinical environments
Paper
•
2405.07960
•
Published
•
1
On the limits of agency in agent-based models
Paper
•
2409.10568
•
Published
•
13
DSBench: How Far Are Data Science Agents to Becoming Data Science
Experts?
Paper
•
2409.07703
•
Published
•
67
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
at Scale
Paper
•
2409.16299
•
Published
•
11
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer
Use
Paper
•
2411.10323
•
Published
•
31
Generative World Explorer
Paper
•
2411.11844
•
Published
•
75
Paper
•
2412.13501
•
Published
•
23
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
49
Large Action Models: From Inception to Implementation
Paper
•
2412.10047
•
Published
•
31
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
•
2412.19723
•
Published
•
70
A3: Android Agent Arena for Mobile GUI Agents
Paper
•
2501.01149
•
Published
•
20
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper
•
2412.21139
•
Published
•
16
ResearchTown: Simulator of Human Research Community
Paper
•
2412.17767
•
Published
•
12
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital
World
Paper
•
2412.17589
•
Published
•
12
Agent-SafetyBench: Evaluating the Safety of LLM Agents
Paper
•
2412.14470
•
Published
•
11
GenEx: Generating an Explorable World
Paper
•
2412.09624
•
Published
•
87
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web
Tutorials
Paper
•
2412.09605
•
Published
•
26
The BrowserGym Ecosystem for Web Agent Research
Paper
•
2412.05467
•
Published
•
19
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
•
2412.04454
•
Published
•
57
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and
Proactive Robotic Failure Detection
Paper
•
2412.04455
•
Published
•
37
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper
•
2412.01928
•
Published
•
40
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Paper
•
2411.19039
•
Published
•
1
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
•
2410.22304
•
Published
•
17
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics
Manipulation
Paper
•
2411.17636
•
Published
•
2
Cooperative Strategic Planning Enhances Reasoning Capabilities in Large
Language Models
Paper
•
2410.20007
•
Published
•
1
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate
Prioritized Experience Replay
Paper
•
2410.12236
•
Published
•
1
Large Language Model-Brained GUI Agents: A Survey
Paper
•
2411.18279
•
Published
•
29
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
77
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
for Web Agents
Paper
•
2411.06559
•
Published
•
12
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile
Manipulation
Paper
•
2411.04999
•
Published
•
17
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
•
2411.03562
•
Published
•
64