AutoMapper - MapAGI Demo and Eval

Creating OSM-compatible mappings directly from multiple sequences of street-level imagery, particularly 360-degree images, would be a groundbreaking step in simplifying geospatial data generation and enhancing mapping accuracy. This project serves as a benchmark for evaluating the performance of large language models (LLMs) in mapping tasks, specifically designed to test their ability to generate structured mappings and automate workflows efficiently using this kind of visual input.

Project Structure

Data Handling and Creating Predictions

Photos: This directory contains street-level imagery. Each image follows the naming convention {sequence_id}_{sequence_index}.png, where sequence_id is the unique identifier of a car trip and sequence_index is the numerical position of a photo in the given sequence. Together they form the unique identifier of a photo.
Metadata: This directory includes metadata related to the photos, sequences of photos per way, and ground truth annotations at way level. We also provide different LLM predictions with the naming convention predictions_*.csv
Demo Utilities: This directory contains a demo notebook showcasing a possible approach for creating predictions for sequences of street-level imagery.

Evaluation

Evaluation Rules

The evaluation follows these rules for classifying predictions. These rules apply consistently to all OSM tags (name, oneway, lanes, maxspeed, turn:lanes, etc.):

True Positive (TP): Both ground truth and prediction have the same non-null value
- Example: GT="Main Street", Pred="Main Street" → TP
- Example: GT="yes", Pred="yes" (for oneway) → TP
False Positive (FP):
- Prediction has a value but ground truth is null/empty
- Example: GT=empty, Pred="Extra Street" → FP
- Mismatch: Both have values but they don't match (also counts as FN)
- Example: GT="Elm Street", Pred="Wrong Street" → FP (and FN)
False Negative (FN):
- Ground truth has a value but prediction is null/empty
- Example: GT="Park Road", Pred=empty → FN
- Mismatch: Both have values but they don't match (also counts as FP)
- Example: GT="Elm Street", Pred="Wrong Street" → FN (and FP)
- Missing OSMID: OSMID exists in ground truth but not in predictions → FN
True Negative (TN): Both ground truth and prediction are null/empty
- These are filtered out and not counted in metrics
Outer Merge: The evaluation uses an outer merge to ensure:
- OSMIDs in ground truth but not in predictions are captured (FN)
- OSMIDs in predictions but not in ground truth are captured (FP)

Key Points:

These rules apply to ALL OSM tags: name, oneway, lanes, lanes:forward, lanes:backward, maxspeed, turn:lanes, etc.
Mismatches count as BOTH FP and FN: A wrong prediction (e.g., GT="Elm Street", Pred="Wrong Street" or GT="yes", Pred=empty for oneway) is penalized in both precision (FP) and recall (FN)
Special handling for 'name' field: Street names are compared with preprocessing (lowercase, remove punctuation) to handle variations like "Main St." vs "Main Street"
All other fields: Direct value comparison (exact match required)
Precision = TP / (TP + FP) - measures how many predictions are correct
Recall = TP / (TP + FN) - measures how many ground truth values were found
F1 = 2 × (Precision × Recall) / (Precision + Recall)

Evaluation Utilities

Tools for evaluating predictions and generating metrics:

eval.py
Provides a streamlined command-line evaluation method. It generates .csv and .md files in the evaluation_results directory containing feature-specific and general metrics.

Usage:
```
python eval.py path/to/predictions.csv [id_suffix] [--gt-path path/to/ground_truth.csv] [--test]
```
- path/to/predictions.csv: Path to the predictions file in .csv format.
- id_suffix (optional): A custom identifier for the evaluation. If not provided, a default identifier will be used.
- --gt-path (optional): Path to ground truth CSV. Defaults to metadata/ground_truth.csv.
- --test (optional): Enable test mode with assertions for validation.
Output: The script generates metrics files with the following columns:
- osm_tag: The OSM tag/attribute name
- occurrences: Number of non-null values in ground truth
- tp: True Positives count
- fp: False Positives count
- fn: False Negatives count
- precision: Precision score
- recall: Recall score
- f1: F1 score
interactive_eval_notebook.ipynb
Contains interactive evaluation and visualization utilities at feature level.

Evaluation Results

This directory contains metrics and reports generated after running the evaluation scripts.

Project Setup

Download photos.zip from here and extract its contents to ./photos.
Details about metadata files, feature specific map-making and other related information can be accesed in this documentation

Environment Setup

conda create -n "automapper"
conda activate automapper
pip install -r requirements.txt

Contributions

Feel free to contribute by improving the benchmarks.

Testing the Evaluation Script

To validate that the evaluation script works correctly, run the comprehensive test suite:

python eval.py ../metadata/test_predictions_comprehensive.csv test_comprehensive --gt-path ../metadata/test_ground_truth.csv --test

This command:

Evaluates predictions against a carefully designed test ground truth with 14 test cases
Covers all evaluation scenarios: true positives, false positives, false negatives, mismatches, missing/extra OSMIDs
Enables test mode (--test flag) which runs assertion-based validation to ensure metrics are calculated correctly
Verifies that mismatches count as both FP and FN (as per evaluation rules)
Validates outer merge functionality for handling missing osmids
See metadata/TEST_CASES.md for detailed breakdown of all test scenarios

If the evaluation script is working correctly, all assertions will pass silently. Any errors indicate issues with the evaluation logic.

Updates

We updated the repository with more data. Download extra_photos.zip from here

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
demo_utils		demo_utils
evaluation_results		evaluation_results
evaluation_utils		evaluation_utils
metadata		metadata
photos		photos
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoMapper - MapAGI Demo and Eval

Project Structure

Data Handling and Creating Predictions

Evaluation

Evaluation Rules

Evaluation Utilities

Evaluation Results

Project Setup

Environment Setup

Contributions

Testing the Evaluation Script

Updates

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

grab/AutoMapper

Folders and files

Latest commit

History

Repository files navigation

AutoMapper - MapAGI Demo and Eval

Project Structure

Data Handling and Creating Predictions

Evaluation

Evaluation Rules

Evaluation Utilities

Evaluation Results

Project Setup

Environment Setup

Contributions

Testing the Evaluation Script

Updates

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages