Example to produce SQL without Entity Framework Core! #1361
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
We've always claimed it's possible to use JsonApiDotNetCore without Entity Framework Core. Just implement your own resource service or repository, right?
There's an implementation for MongoDB using its LINQ provider and there's an example that takes a LINQ expression, compiles it, then executes it against an in-memory list.
But we never told you what it takes to translate complex JSON:API requests to SQL yourself. So let's put our money where our mouth is: this PR shows how it can be done!
What
This PR provides an implementation for most of the JsonApiDotNetCore features. It supports all JSON:API endpoints (including atomic operations) and query string parameters (both top-level and deeply nested), as well as custom resource definition callbacks.
DapperRepositoryimplementsIResourceRepositoryand uses Dapper to execute ADO.NET database queries and to materialize the returned result set into JSON:API resources. This example lets Entity Framework Core generate the database at startup (for convenience) but doesn't use it for serving requests. Information about the underlying database model (tables, columns, and foreign keys) is needed to produce SQL. This is provided byIDataModelService. For convenience again,FromEntitiesDataModelServiceobtains that information from the Entity Framework Core model at startup, but feel free to plug in something else.At a high level,
QueryLayeris translated into a tree ofSqlTreeNodeobjects representing the SQL query.SqlQueryBuildertakes that as input and produces SQL text from it. It's mostly SQL-92 compliant and supports PostgreSQL, MySQL, and SQL Server. Adapting it to your own flavor should be straightforward.For example, the following GET request:
Gets translated by JsonApiDotNetCore (unchanged) into:
Then
DapperRepository(with the help ofSelectStatementBuilder) translates that into:For less involved requests, simpler SQL is produced where possible. For example:
Produces the following SQL (no sub-queries):
In the SQL above, the ordering on
PriorityandLastModifiedAtoriginates from a resource definition.Limitations
First of all, this is not a mature, battle-tested, and optimized implementation. If you can, please use Entity Framework Core instead, because:
That said, if you're not too concerned about performance or absolute correctness (there are likely bugs; please report them via issues or PRs), you're welcome to try it out or use it as an inspiration to implement your own data access.
The following limitations apply:
JOIN LATERAL/OUTER APPLY,ROW_NUMBER() OVER (PARTITION BY...). I've spent a long time trying to pull it off but eventually gave up. I challenge you![EagerLoad]support. It could be done, but it's rarely used.IResourceDefinition.OnRegisterQueryableHandlersForQueryStringParameters(). Because noIQueryableis involved, it doesn't apply.Implementation
At a high level, there are many similarities with how Entity Framework Core performs the translation to SQL. I often struggled to grasp patterns from its source code, so I inferred most using trial and error.
The tree of SQL nodes
In this example, all nodes derive from
SqlTreeNode. Most of them are straightforward and don't require explanation.All nodes are immutable, yet they expose members as read-only collections. This has two reasons:
Dictionary<,>with a string key. This is not true withImmutableDictionary<,>, because it relies on indeterministicString.GetHashCode(). We need to know the exact SQL in tests.The abstract type
TableSourceNodecontains a list ofColumnNodes. Derived typeTableNoderepresents a database table, whileSelectNoderepresents a sub-query.ColumnNodeis also abstract, with derived typesColumnInTableNodeandColumnInSelectNode.SelectNodecontains a list of abstractSelectorNodes per table, with implementationsColumnSelectorNode(SELECT t1.Name),CountSelectorNode(SELECT COUNT(*)), andOneSelectorNode(SELECT 1).ColumnSelectorNodepoints to aColumnNode(optionally aliased), so it can be a column in a table or a sub-query.These abstract columns in
TableSourceNodedon't occur in the produced SQL. They are used to trace references back to an underlying database column. When a sub-query joins multiple tables, duplicate column names will be aliased to make them uniquely referenceable. In the example request above,t7.Id00 DESCpoints to the selectort6."Id" AS Id00, which points to the selectort5."Id", which points to theIdcolumn in theTagstable.Another need for tracing references is that it's not always possible to remap in-place. A post-processor pulls stale references back into scope.
Joins and sub-queries
At a fundamental level, all tables are joined using
LEFT JOIN. If the foreign key is defined at the left side of the JSON:API relationship and it's non-nullable, it gets optimized into anINNER JOIN, which is more efficient. This optimization is not applicable when joining with a nestedQueryLayer. For example, todo-items without any tags must still be returned at/todoItems?include=tags.Initially, I thought another exception was needed for
hasandcountin filters (see dotnet/efcore#32103). Ultimately, it comes down to interpreting what "null safe" means, so I chose to follow the Entity Framework Core behavior.It is generally safest to join every
include(or nestedQueryLayer) as a sub-query. But that makes the SQL harder to read and slower to execute. Depending on the nested query layer shape, the use of a sub-query can be optimized into a simple join against a table. Determining whether that optimization can be applied is non-trivial when pagination is supported. Entity Framework Core is very flexible: it employs several techniques to push the current query down into a sub-query and pull it out again at various stages while processing the input.In this example, we determine upfront whether a sub-query is needed. Orderings from sub-queries without pagination only need to appear in the top-level query. That just leaves filtering, which may constrain the set of related resources. So, a sub-query is only produced if the nested query layer has a filter. Due to all the complexity in Entity Framework Core, we sometimes generate simpler SQL (because opportunities are missed, there an open issue for that).
Materialization
As mentioned earlier, Dapper is used to parse the result set into .NET objects. It scans the returned column names and starts a new object each time an
Idproperty occurs. To make that work, we must feed the list of expected object types upfront. This is easy to determine from the requested includes.We implement a
Mapmethod that Dapper calls with an array of all objects found in a single row. From there, we callResourceFieldAttribute.SetValuerepeatedly, while caching instances to preserve reference identity (which matches NoTrackingWithIdentityResolution). This is more flexible than the Entity Framework Core materializer, which requires everything to be ordered to support streaming. Therefore, you'll see Entity Framework Core often addsIdto orderings to achieve total ordering (with the downside of potentially not fully using an index). We're not doing that, which reduces pressure on the database server.A downside of Dapper usage is that column names in the result set must match property names exactly, so we use a post-processing step to compensate. You can see this in the example request above, where we turn:
into:
Resource/relationship updates
The tricky part is ensuring changes are sent in the right order so you won't hit a foreign key constraint violation. For example, updating a one-to-one relationship where the foreign key exists at the right side requires first updating a row in another table.
Some of our relationship updates are more efficient because we update/delete related records in one go using
WHERE "Id" IN (...), instead of issuing a SQL statement per match. On the other hand, the dynamic contents ofINreduces the effectiveness of execution plan caching, so I'm not sure it matters much.QUALITY CHECKLIST