Skip to content

Conversation

@bukzor
Copy link
Contributor

@bukzor bukzor commented Oct 22, 2025

Summary

  • Fixes HTML text nodes containing &, <, > being output without proper escaping
  • Prevents data corruption when round-tripping HTML through xq | xq -j

Changes

  • Added escapeTextContent() function for minimal entity escaping (&amp;, &lt;, &gt;)
  • Modified FormatHtml() to escape text nodes properly
  • Added comprehensive tests verifying output is valid XML

Test plan

  • Unit tests pass: go test ./...
  • New tests verify proper escaping of &, <, > in HTML text nodes
  • Tests confirm xq output can be parsed as XML (required for -j flag)
  • Verified tests fail without the fix (showing they detect the bug)

Example

Before this fix:

echo '<html>1 &amp; 2</html>' | xq
# Output: <html>1 & 2</html>  (bare & causes parse error)
echo '<html>1 &amp; 2</html>' | xq | xq -j
# Error: invalid character entity & (no semicolon)

After this fix:

echo '<html>1 &amp; 2</html>' | xq
# Output: <html>1 &amp; 2</html>  (properly escaped)
echo '<html>1 &amp; 2</html>' | xq | xq -j
# Success: {"html": "1 & 2"}

🤖 Generated with Claude Code

HTML text nodes containing &, <, > were output without escaping,
causing xq's output to be unparseable when piped back through xq -j.

This commit adds:
- New escapeTextContent() function for minimal entity escaping
- Modified FormatHtml to escape text nodes with &amp;, &lt;, &gt;
- Tests verifying the output is valid XML

Example issue:
  echo '<html>1 &amp; 2</html>' | xq | xq -j
  # Before: Error - bare & in output
  # After: Success - properly escaped as &amp;

This is a critical fix preventing data corruption when round-tripping
HTML through xq.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant