Skip to content

Python JSONPath Version 2 #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

Python JSONPath Version 2 #98

wants to merge 21 commits into from

Conversation

jg-rp
Copy link
Owner

@jg-rp jg-rp commented Aug 9, 2025

Looking ahead to Python JSONPath version 2, this PR includes breaking changes for both the Python API and some subtle changes to the default JSONPath syntax. We have:

  • Changed the lexer so it emits more punctuation and whitespace tokens. Previously we broadly skipped some punctuation and whitespace. Now the parser can make better choices about when to accept whitespace and do a better job of enforcing dots.
  • Rewritten the parser and its token stream. It should now be more correct and easier to read.
  • Changed the internal representation of JSONPath segments and selectors. We now model segments explicitly.
  • Renamed "fake root" to "pseudo-root"
  • Dropped support for unquoted property names in bracketed segments.

More changes to follow before release:

  • Implement the Singular path selector.
  • Implement the keys filter selector.
  • Remove shorthand arguments to some selector classes. We no longer need them.
  • Improve leading and trailing whitespace handling options so users can choose how strict to be.
  • If available, use the regex package instead of re for match and search function extensions.
  • Document the singular path selector and the keys filter selector

@jg-rp
Copy link
Owner Author

jg-rp commented Aug 10, 2025

Some JSONPath performance notes, before attempting any new optimizations.

This benchmark is run on lots of small JSONPath queries with small data.

Main branch (89c0e7e)

(python-jsonpath) james@Jamess-Mac-mini python-jsonpath % python scripts/benchmark.py 
repeating 436 queries 100 times, best of 3 rounds
compile and find               1.392
compile and find (values)      1.400
just compile                   0.917
just find                      0.392
just find (values)             0.395

v2 branch (e41ec29)

(python-jsonpath) james@Jamess-Mac-mini python-jsonpath % python scripts/benchmark.py
repeating 436 queries 100 times, best of 3 rounds
compile and find               1.461
compile and find (values)      1.471
just compile                   0.949
just find                      0.413
just find (values)             0.418

@rob-ross
Copy link
Contributor

I am testing my Lexer against your test_lex.py code. It's still a work in progress. But I have converted your test data into a json file. You can get it here .

The only changes I made are :

  1. I changed fake root to pseudo root
  2. I wrapped each test case in a dict/object with a single member "Token". I think this helps make the json file a little more clear, although it introduces a slight wrinkle in your deserialization.

I'll probably be converting more of your tests like this as I proceed. It would make a little more work for you on your end to use them, as you'd have to write a load() method to deserialize them. But it would help us both out in the long run as we could each capture new bugs in the same file without having to modify any python code. And it would help me as you add new features, as I could use test-driven development with updated versions of the file after you introduce new features.

I hope this is useful!

  • Rob

@jg-rp
Copy link
Owner Author

jg-rp commented Aug 13, 2025

I have converted your test data into a json file.

Looks good 👍 I do like "golden files", especially when they apply to multiple projects.

Notice that this pull request - on the v2 branch - has changed tokens produced by the lexer quite a bit. Don't feel obliged to follow v2 instead of main, but it does fix some of the inconsistencies you pointed out in our previous discussions. And, with these changes, we will be able to configure JSONPath to strictly follow RFC 9535 without exception.

@rob-ross
Copy link
Contributor

Well it didn't take me long to sour on that idea of wrapping the tokens in a Map. It literally doubles the amount of code I have to write in Java to deserialize it. lol. It's extra characters and thus file size in the json file. So I'm redoing it to be a simpler JSON format, which will also make it easier to load in Python. I can migrate test_lex.json to use the JSON file. I'll probably work on it tomorrow. For me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants