DataLoader implementation #539

johnrutherford · 2018-01-28T02:26:18Z

Here is my DataLoader implementation. As mentioned in #537 and #264, this required changes to the DocumentExecuter to provide a hook for dispatching pending loaders.

Notes:

I refactored the DocumentExecuter to use IExecutionStrategy. By default, queries will use the ParallelExecutionStrategy, mutations will use the SerialExecutionStrategy, and subscriptions will use the SubscriptionExecutionStrategy. The SelectExecutionStrategy() method can be overridden if necessary.
This makes the SubscriptionExecuter obsolete. I haven't worked much with subscriptions yet, so this area may still need a little work.
The ParallelExecutionStrategy still needs to be refactored to optimize "batchability."

Add DataLoader unit tests Fix line endings in .editorconfig

Add more DataLoader unit tests

joemcbride · 2018-01-28T03:47:29Z

.editorconfig

@@ -3,9 +3,8 @@
 # top-most EditorConfig file
 root = true

-# Unix-style newlines with a newline ending every file
+# Editor default newlines with a newline ending every file


This should be kept at Unix-style.

I changed it because it conflicts with the line ending settings in .gitattributes. On Windows, files are checked out with CRLF and normalized to LF in the repository. So all of the .cs files would have CRLF line endings for me except when adding new lines which would be LF. Kind of annoying. So I think removing this would be best for development on different OSes, but line endings still get normalized in the repository because of the setting in .gitattributes.

joemcbride · 2018-01-28T03:48:41Z

src/DataLoader/DataLoader.csproj

+    <TargetFrameworks>netstandard2.0;netstandard1.3;net45</TargetFrameworks>
+  </PropertyGroup>
+
+</Project>


I would just merge this project into the core project so that there aren't multiple DLL's that have to get packaged.

Ok, I didn't realize we were talking about putting it all in to the main project. But I'm fine with that. I've always been a bit skeptical of the usefulness of a DataLoader outside of GraphQL, anyway. And it could always be separated out later.

If I put this in the main project, I'll need to target netstandard1.3 instead of netstandard1.1. Is that a problem?

Its probably time to update to it. 🎉

joemcbride · 2018-01-28T03:55:30Z

src/GraphQL/Execution/ParallelExecutionStrategy.cs

+
+namespace GraphQL.Execution
+{
+    public class ParallelExecutionStrategy : ExecutionStrategy


I know this is probably still a work in progress, though the way these are strategies are currently setup there is a lot of duplicate logic. That is the core of the GraphQL spec logic so I want to make sure that we can reduce that duplication as much as possible. Obviously my concern is that this would make it way too easy for a query or mutation to start behaving differently.

Yeah, I agree. My plan was to make the parallel and serial logic completely separate for now. Then re-work the parallel logic, and then see what could be shared. But I could re-factor them to share more code in the meantime if you'd prefer.

joemcbride · 2018-01-28T04:17:38Z

src/GraphQL.DataLoader/GraphQL.DataLoader.csproj

+      <PackageReference Include="System.Reflection.TypeExtensions" Version="4.4.0" />
+  </ItemGroup>
+
+</Project>


Ditto on this one, merge it into the main project.

Change netstandard1.1 target to netstandard1.3

johnrutherford · 2018-01-31T18:39:24Z

@joemcbride I just pushed some more changes for this.

I'm working on a separate branch for the ExecutionStrategy optimizations. It's almost done. I'll submit a separate pull request for it.

joemcbride · 2018-01-31T18:43:18Z

Looking good! One problem I noticed is that the DataLoader tests aren't being ran on CI. This is because the build script only targets the GraphQL.Tests.

This script will need to be updated. I can work on getting that updated to easier support multiple test projects.

johnrutherford · 2018-01-31T22:34:39Z

Ok. Or I can merge all of the tests into one project if that's easier. They're both testing the same library anyway.

johnrutherford · 2018-01-31T23:47:23Z

My changes for optimizing the execution are ready. Once this pull request is merged, I'll submit another pull request since it obviously depends on this branch.

joemcbride · 2018-02-01T00:51:32Z

Got that updated. So you can add a line here for that project or merge it in. Either way, once the tests are running and green I can get this merged.

graphql-dotnet/tools/tasks/test.dotnet.js

Lines 13 to 17 in 8dc9227

    
           export default function testDotnet() { 
        
             return Promise.all([ 
        
               test('./src/GraphQL.Tests') 
        
             ]) 
        
           }

johnrutherford · 2018-02-01T02:00:37Z

Ok, all tests are included and passing now. :)

TwitchBronBron · 2018-02-02T14:15:05Z

May I recommend adding DataLoaderContext as a property to ResolveFieldContext rather than having each app manage it themselves? This would reduce the boilerplate that each interested app would need to write. Each top-level field resolve would get their own instance, and that same instance would be passed along to all child fields.

It would be valuable for each top-level field resolver to have its own copy of the DataLoaderContext, because of the following scenario:

Here's the query.

{
    order1: order(orderId: 1) {
        orderId
        user {
            userId
        }
    }
    order2: order(orderId: 2) {
        orderId
        user {
            userId
            giantBlob
        }
    }
}

Notice that order1 is asking only for userId, while order2 is additionally asking for giantBlob. The way DataLoader is implemented right now, the fetchFunc will only be run ONCE, with both userIds in the same ID list, and there's no way to know what columns each order wanted. Ideally it would be run twice (once for order1, and once for order2) so that the fetchFunc can craft a query with only the desired columns.

Here is how I imagine this would look in code:

Field<UserType, User>()
    .Name("User")
    .ResolveAsync(ctx =>
        {
            ctx.DataLoader.GetOrAddBatchLoader<int, User>("GetUsersById",
                (IEnumerable<int> ids) =>
                {
                    var columnNames = ctx.SubFields.Keys;
                    //get a list of users, retrieving only the columns requested
                    return await users.GetUsersByIdAsync(ids, columnNames);
                });
            return loader.LoadAsync(ctx.Source.UserId);
        });

If we need to have the best of both worlds, there could be a setting on the schema indicating whether to create a new instance of DataLoader for each top-level field resolve, or if it should share the same DataLoader for the entire query.

johnrutherford · 2018-02-02T15:52:50Z

@TwitchBronBron You should probably comment on issue #264 since this pull request has been merged already.

TwitchBronBron · 2018-02-02T16:09:19Z

Will do. Thanks!

jphenow · 2018-02-20T16:17:17Z

I've been looking around for a bit for this exact thing. Thanks y'all! Excited for 2.0 - nice work on this lib.

johnrutherford added 8 commits January 25, 2018 19:14

Add DataLoader and GraphQL.DataLoader projects

2b2a1fd

Add DataLoader unit tests Fix line endings in .editorconfig

Add another test

89125f4

Add IsFetchNeeded() to DataLoaderBase

60ee2a7

Replace SlowFuncFieldResolver with AsyncFieldResolver

61331d4

Minor changes to test graph types

df987d0

Refactor DocumentExecuter to use "execution strategies"

cbb22a7

Fix issue with GraphQL.DataLoader dependencies

4e34661

Re-organize tests

6a8e5c1

Add more DataLoader unit tests

joemcbride reviewed Jan 28, 2018

View reviewed changes

johnrutherford added 2 commits January 31, 2018 12:12

Refactor execution strategies to share code

c8833ea

Merge DataLoader code into main project

5812c08

Change netstandard1.1 target to netstandard1.3

johnrutherford added 2 commits January 31, 2018 20:48

Merge branch 'master' into dataloader

3401a5a

Add GraphQL.DataLoader.Tests to CI

6d0624a

joemcbride merged commit 575bdf9 into graphql-dotnet:master Feb 1, 2018

johnrutherford mentioned this pull request Feb 1, 2018

Refactor and optimize execution strategies #540

Merged

johnrutherford deleted the dataloader branch February 1, 2018 03:12

PrimeHydra mentioned this pull request Feb 15, 2018

DataLoader documentation #560

Closed

edpark11 mentioned this pull request Sep 2, 2018

Breadth-first-traversal for PR #367 (allows for thunk-based dataloader batching) graphql-go/graphql#388

Merged

Uh oh!

DataLoader implementation #539

DataLoader implementation #539

Uh oh!

Conversation

johnrutherford commented Jan 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnrutherford Jan 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joemcbride Jan 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnrutherford commented Jan 31, 2018

Uh oh!

joemcbride commented Jan 31, 2018

Uh oh!

johnrutherford commented Jan 31, 2018

Uh oh!

johnrutherford commented Jan 31, 2018

Uh oh!

joemcbride commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnrutherford commented Feb 1, 2018

Uh oh!

TwitchBronBron commented Feb 2, 2018

Uh oh!

johnrutherford commented Feb 2, 2018

Uh oh!

TwitchBronBron commented Feb 2, 2018

Uh oh!

jphenow commented Feb 20, 2018

Uh oh!

Uh oh!

johnrutherford Jan 28, 2018 •

edited

Loading

joemcbride Jan 28, 2018 •

edited

Loading

joemcbride commented Feb 1, 2018 •

edited

Loading