-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Description
Summary
Add methods to FlowGraph and FlowFrame to print human-readable dependency trees and pipeline summaries, making it easier to inspect and debug data pipelines without opening the visual editor.
Problem
Currently, inspecting a FlowGraph's structure requires either:
- Opening the visual editor with
ff.open_graph_in_editor()
- Looking at the unhelpful string representation that just lists node IDs
# Current output is not very useful
print(large_pipeline)
# FlowGraph(Nodes: {13: Node id: 13 (manual_input), 14: Node id: 14 (polars_code), 15: Node id: 15 (select), 16: Node id: 16 (group_by)})
This makes it difficult to quickly understand pipeline structure during development, debugging, or when working in environments where the visual editor isn't practical (like remote servers, CI/CD, or when sharing code examples).
Proposed Solution
Add several methods to provide text-based pipeline visualization:
1. FlowGraph.print_tree()
- Dependency Tree View
pipeline.flow_graph.print_tree()
# Manual Input (id=13)
# └── Filter (id=14): quality_score > 0.9
# └── Select (id=15): ["id", "value", "category"]
# └── Group By (id=16): group_by=["category"], agg=[mean(value)]
# With descriptions
pipeline.flow_graph.print_tree(show_descriptions=True)
# Manual Input (id=13): "Load simulated data"
# └── Filter (id=14): "Reduces data early"
# └── Select (id=15): "Reduces columns early"
# └── Group By (id=16): "Aggregate by category"
# With schema information
pipeline.flow_graph.print_tree(show_schema=True)
# Manual Input (id=13) → [id: i64, quality_score: f64, value: i64, category: str]
# └── Filter (id=14) → [id: i64, quality_score: f64, value: i64, category: str]
# └── Select (id=15) → [id: i64, value: i64, category: str]
# └── Group By (id=16) → [category: str, value: f64]
2. FlowFrame.print_lineage()
Current Node's Path
pipeline.print_lineage()
# Manual Input → Filter → Select → Group By (current)
3. FlowGraph.print_execution_order()
- Execution Sequence
pipeline.flow_graph.print_execution_order()
# Execution order:
# 1. Manual Input (id=13)
# 2. Filter (id=14)
# 3. Select (id=15)
# 4. Group By (id=16)
# Compact format
pipeline.flow_graph.print_execution_order(compact=True)
# Execution order: [13→14→15→16]
# With estimated timing (if available)
pipeline.flow_graph.print_execution_order(show_timing=True)
# Execution order:
# 1. Manual Input (id=13) - ~0.1s
# 2. Filter (id=14) - ~0.3s
# 3. Select (id=15) - ~0.1s
# 4. Group By (id=16) - ~0.5s
# Total estimated time: ~1.0s
4. FlowGraph.print_summary()
- Node Details Summary
pipeline.flow_graph.print_summary()
# Pipeline Summary (4 nodes):
# ┌─────────────────────────────────────────────────────────────────┐
# │ Node 13 (manual_input): Manual Input │
# │ Description: Load simulated data │
# │ Output: 4 columns, 10000 rows │
# │ Schema: [id: i64, quality_score: f64, value: i64, ...] │
# ├─────────────────────────────────────────────────────────────────┤
# │ Node 14 (filter): Filter │
# │ Description: Reduces data early │
# │ Condition: quality_score > 0.9 │
# │ Output: 4 columns, ~2500 rows (estimated) │
# ├─────────────────────────────────────────────────────────────────┤
# │ Node 15 (select): Select │
# │ Description: Reduces columns early │
# │ Columns: ["id", "value", "category"] │
# │ Output: 3 columns, ~2500 rows │
# ├─────────────────────────────────────────────────────────────────┤
# │ Node 16 (group_by): Group By │
# │ Description: Aggregate by category │
# │ Group by: ["category"], Agg: [mean(value)] │
# │ Output: 2 columns, 4 rows │
# └─────────────────────────────────────────────────────────────────┘
# Compact summary
pipeline.flow_graph.print_summary(compact=True)
# Node 13: Manual Input → 4 cols, 10000 rows
# Node 14: Filter (quality_score > 0.9) → 4 cols, ~2500 rows
# Node 15: Select (3 columns) → 3 cols, ~2500 rows
# Node 16: Group By (category) → 2 cols, 4 rows
Metadata
Metadata
Assignees
Labels
No labels