Skip to content

tianyi-lab/ChartAlignBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š ChartAlignBench

๐Ÿ“– Paper | ๐Ÿ“šHuggingFace Dataset

This repo contains the official evaluation code and dataset for the paper "ChartAB: A Benchmark for Chart Grounding & Dense Alignment"

Highlights

  • ๐Ÿ”ฅ 9,000+ instances for VLM evaluation on Dense Chart Grounding and Multi-Chart Alignment.
  • ๐Ÿ”ฅ Evaluation using novel two stage pipeline that decomposes task into intermediate grounding followed by reasoning resulting in significant accuracy improvement.
  • ๐Ÿ”ฅ Evaluates both data and attribute understanding across diverse chart types and complexities.

Findings

  • ๐Ÿ”Ž Performance degradation on complex charts: VLMs demonstrate strong data understanding on simple charts (e.g., bar, line, or numbered bar/line), but their performance drops substantially on complex types (e.g., 3D, box, radar, rose, or multi-axis charts) due to intricate layouts and component interactions.
  • ๐Ÿ”Ž Weak attribute understanding: VLMs exhibit poor recognition of text styles (<20% accuracy for size/font), limited color perception (median RGB error >50), and strong spatial biases in legend positioning.
  • ๐Ÿ”Ž Two-stage pipeline proves superior: The ground-then-reason approach consistently outperforms direct inference, reducing hallucinations through intermediate grounding steps.
  • ๐Ÿ”Ž Poor grounding/alignment degrade downstream QA: Precise data grounding and alignment correlate positively with downstream QA accuracy, establishing dense chart understanding as essential for reliable reasoning performance.
  • ๐Ÿ”Ž Scaling law holds for most alignment tasks: Larger models consistently outperform smaller ones on all but text-style alignment due to JSON generation complexity leading to high number of irregular failures.

Dataset

ChartAB is the first benchmark designed to comprehensively evaluate the dense level understanding of VLMs on charts, focusing on two core content: data (the underlying values visualized by the chart) and attribute (visual attributes impacting chart design such as color, legend position, and text style). The benchmark consists of 9,000+ instances spanning 9 diverse chart types (bar, numbered bar, line, numbered line, 3D bar, box, radar, rose, and multi-axes charts) organized into three evaluation subsets. The Data Grounding & Alignment subset contains chart pairs that differ in data. The Attribute Grounding & Alignment subset comtains chart pairs differing in attributes. Robustness subset contains collection of 5 chart pairs per instance, where each pair maintains identical data difference but varies in an attribute value (color, legend, or text style) across the pairs.

VLMs's Result

Evaluation Demo

About

Code for "ChartAB: A Benchmark for Chart Grounding & Dense Alignment"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published