MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Abstract
Recent advancements in foundation models have enhanced AI systems' capabilities in autonomous tool usage and reasoning. However, their ability in location or map-based reasoning - which improves daily life by optimizing navigation, facilitating resource discovery, and streamlining logistics - has not been systematically studied. To bridge this gap, we introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning. MapEval features three task types (textual, API-based, and visual) that require collecting world information via map tools, processing heterogeneous geo-spatial contexts (e.g., named entities, travel distances, user reviews or ratings, images), and compositional reasoning, which all state-of-the-art foundation models find challenging. Comprising 700 unique multiple-choice questions about locations across 180 cities and 54 countries, MapEval evaluates foundation models' ability to handle spatial relationships, map infographics, travel planning, and navigation challenges. Using MapEval, we conducted a comprehensive evaluation of 28 prominent foundation models. While no single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro achieved competitive performance overall. However, substantial performance gaps emerged, particularly in MapEval, where agents with Claude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%, respectively, and the gaps became even more amplified when compared to open-source LLMs. Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average, struggling with complex map images and rigorous geo-spatial reasoning. This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.
Community
We are excited to share our recent work titled "MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models".
š Paper: https://arxiv.org/abs/2501.00316
š» Code: https://github.com/MapEval
š¤ Data: https://huggingface.co/MapEval
š Homepage: https://mapeval.github.io/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MapQaTor: A System for Efficient Annotation of Map Query Datasets (2024)
- An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models (2024)
- Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs (2024)
- EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios (2024)
- GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks (2024)
- Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning (2024)
- RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper