Skip to content

DataLoader implementation #539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 1, 2018

Conversation

johnrutherford
Copy link
Member

Here is my DataLoader implementation. As mentioned in #537 and #264, this required changes to the DocumentExecuter to provide a hook for dispatching pending loaders.

Notes:

  1. I refactored the DocumentExecuter to use IExecutionStrategy. By default, queries will use the ParallelExecutionStrategy, mutations will use the SerialExecutionStrategy, and subscriptions will use the SubscriptionExecutionStrategy. The SelectExecutionStrategy() method can be overridden if necessary.
  2. This makes the SubscriptionExecuter obsolete. I haven't worked much with subscriptions yet, so this area may still need a little work.
  3. The ParallelExecutionStrategy still needs to be refactored to optimize "batchability."

@@ -3,9 +3,8 @@
# top-most EditorConfig file
root = true

# Unix-style newlines with a newline ending every file
# Editor default newlines with a newline ending every file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept at Unix-style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because it conflicts with the line ending settings in .gitattributes. On Windows, files are checked out with CRLF and normalized to LF in the repository. So all of the .cs files would have CRLF line endings for me except when adding new lines which would be LF. Kind of annoying. So I think removing this would be best for development on different OSes, but line endings still get normalized in the repository because of the setting in .gitattributes.

<TargetFrameworks>netstandard2.0;netstandard1.3;net45</TargetFrameworks>
</PropertyGroup>

</Project>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just merge this project into the core project so that there aren't multiple DLL's that have to get packaged.

Copy link
Member Author

@johnrutherford johnrutherford Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't realize we were talking about putting it all in to the main project. But I'm fine with that. I've always been a bit skeptical of the usefulness of a DataLoader outside of GraphQL, anyway. And it could always be separated out later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I put this in the main project, I'll need to target netstandard1.3 instead of netstandard1.1. Is that a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably time to update to it. 🎉


namespace GraphQL.Execution
{
public class ParallelExecutionStrategy : ExecutionStrategy
Copy link
Member

@joemcbride joemcbride Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is probably still a work in progress, though the way these are strategies are currently setup there is a lot of duplicate logic. That is the core of the GraphQL spec logic so I want to make sure that we can reduce that duplication as much as possible. Obviously my concern is that this would make it way too easy for a query or mutation to start behaving differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. My plan was to make the parallel and serial logic completely separate for now. Then re-work the parallel logic, and then see what could be shared. But I could re-factor them to share more code in the meantime if you'd prefer.

<PackageReference Include="System.Reflection.TypeExtensions" Version="4.4.0" />
</ItemGroup>

</Project>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on this one, merge it into the main project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@johnrutherford
Copy link
Member Author

@joemcbride I just pushed some more changes for this.

I'm working on a separate branch for the ExecutionStrategy optimizations. It's almost done. I'll submit a separate pull request for it.

@joemcbride
Copy link
Member

Looking good! One problem I noticed is that the DataLoader tests aren't being ran on CI. This is because the build script only targets the GraphQL.Tests.

This script will need to be updated. I can work on getting that updated to easier support multiple test projects.

@johnrutherford
Copy link
Member Author

Ok. Or I can merge all of the tests into one project if that's easier. They're both testing the same library anyway.

@johnrutherford
Copy link
Member Author

My changes for optimizing the execution are ready. Once this pull request is merged, I'll submit another pull request since it obviously depends on this branch.

@joemcbride
Copy link
Member

joemcbride commented Feb 1, 2018

Got that updated. So you can add a line here for that project or merge it in. Either way, once the tests are running and green I can get this merged.

export default function testDotnet() {
return Promise.all([
test('./src/GraphQL.Tests')
])
}

@johnrutherford
Copy link
Member Author

Ok, all tests are included and passing now. :)

@joemcbride joemcbride merged commit 575bdf9 into graphql-dotnet:master Feb 1, 2018
@johnrutherford johnrutherford deleted the dataloader branch February 1, 2018 03:12
@TwitchBronBron
Copy link
Contributor

May I recommend adding DataLoaderContext as a property to ResolveFieldContext rather than having each app manage it themselves? This would reduce the boilerplate that each interested app would need to write. Each top-level field resolve would get their own instance, and that same instance would be passed along to all child fields.

It would be valuable for each top-level field resolver to have its own copy of the DataLoaderContext, because of the following scenario:

Here's the query.

{
    order1: order(orderId: 1) {
        orderId
        user {
            userId
        }
    }
    order2: order(orderId: 2) {
        orderId
        user {
            userId
            giantBlob
        }
    }
}

Notice that order1 is asking only for userId, while order2 is additionally asking for giantBlob. The way DataLoader is implemented right now, the fetchFunc will only be run ONCE, with both userIds in the same ID list, and there's no way to know what columns each order wanted. Ideally it would be run twice (once for order1, and once for order2) so that the fetchFunc can craft a query with only the desired columns.

Here is how I imagine this would look in code:

Field<UserType, User>()
    .Name("User")
    .ResolveAsync(ctx =>
        {
            ctx.DataLoader.GetOrAddBatchLoader<int, User>("GetUsersById",
                (IEnumerable<int> ids) =>
                {
                    var columnNames = ctx.SubFields.Keys;
                    //get a list of users, retrieving only the columns requested
                    return await users.GetUsersByIdAsync(ids, columnNames);
                });
            return loader.LoadAsync(ctx.Source.UserId);
        });

If we need to have the best of both worlds, there could be a setting on the schema indicating whether to create a new instance of DataLoader for each top-level field resolve, or if it should share the same DataLoader for the entire query.

@johnrutherford
Copy link
Member Author

@TwitchBronBron You should probably comment on issue #264 since this pull request has been merged already.

@TwitchBronBron
Copy link
Contributor

Will do. Thanks!

@jphenow
Copy link
Contributor

jphenow commented Feb 20, 2018

I've been looking around for a bit for this exact thing. Thanks y'all! Excited for 2.0 - nice work on this lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants