Skip to content

parse comments beginning with ## as descriptions on schema definitions #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

charlieschwabacher
Copy link

@charlieschwabacher charlieschwabacher commented Jul 9, 2016

This pull adds a comment syntax for descriptions to the schema definition language as discussed in graphql/graphql-spec#90.

The details of the comment syntax have not been worked out in that thread - if there is something you prefer vs these 'double hash' comments, I'm happy to update.

As this pull stands, comments beginning with a second hash mark (##) will be parsed as description AST nodes and attached to the following type or field definition.

for example:

## description of type 'User'
type User {
  ## description of field 'firstName'
  firstName: String
}

changes to lexer:

  • when the lexer encounters a # character, it checks the next char - if that is also a #, it takes text from the current position to the end of the line and produces a DESCRIPTION token.

changes to parser:

  • when parsing top level definitions, before looking at the initial keyword, the parser first takes any number of DESCRIPTION tokens, joins their values, and creates a description AST node. Then it takes the keyword - if the definition is an OperationDefinition or FragmentDefinition the description is discarded. If the definition is a TypeSystemDefinition the description node is attached to the definition node.
  • when parsing field, input value, or enum value definitions, if there are any leading DESCRIPTION tokens, the parser creates a Description node and attaches it to the final FieldDefinition, InputValueDefinition, or EnumValueDefinition node.

changes to buildASTSchema.js:

  • descriptions on AST nodes are attached to Type objects.

@OlegIlyenko
Copy link
Contributor

You mentioned that you only attach comments to AST nodes only in case of ## comments. I would like to point one aspect of it which may be useful. Preserving comments in general can be a very useful thing to do. For example current query formatting functionality in GraphiQL relies on parse and print functions. If you can save most of the comments during the parsing and attach them to the AST nodes, then printer would be able to nicely print them, thus preserving the comments during formatting process.

This is what I did in sangria when I implemented this functionality. It is unable to preserve all of the comments (it will ignore the comments at the end of the file, for instance), but comments in the most common places are preserved. You can experiment with it here: http://toolbox.sangria-graphql.org/format

@sheerun
Copy link

sheerun commented Jul 10, 2016

This also bit me when I worked on plugin visualising graphql file. Please include all comments in AST (not only ##)

@charlieschwabacher
Copy link
Author

Interesting - I like the idea of the printer being able to restore comments - you are right, that would be really nice for things like 'prettify' in Graphiql, etc. It does seem weird for it to preserve most comments, but lose some in some cases though.

Maybe to capture all comments, the parser could build a flat array of them in order by start location outside the AST, and then the printer could insert them when the location of the next AST node is greater than the location of the next comment (and insert any remaining comments at the end of the document).

@charlieschwabacher
Copy link
Author

charlieschwabacher commented Jul 10, 2016

Realized the location based comment insertion I described in the previous comment will not work if the AST has been modified (for example definitions could be re-ordered, etc..) (thanks @nemanja-stanarevic for pointing this out).. modified ASTs are definitely something the printer should support.

Maybe comment insertion by location could be an optional feature of the printer only for use cases like GraphiQL's 'prettify'.. in that case I think it would be best to get the array of comments in a separate 'lex' pass that only looks for comments to avoid complicating the lexer / parser.

I think it makes sense to think about that separately from these changes though.

@OlegIlyenko
Copy link
Contributor

Indeed, I agree that it can be a bit surprising to user if not all comments survive the formatting. Actually once I tried to modify the parser in a way that preserves all of the comments. It turns out that it's pretty hard to do. Comments can appear anywhere, so it's very hard to unify them with the rest of AST in a single tree without making compromises.

I like this idea of storing comments in a separate data structure. This would mean that you need to parse query twice or have 2 outputs for a single parse. I think it would be a bit harder to archive the later.

On the other hand I think it would be much harder to work with 2 data structures and make precise location matching in cases where you need to semantically analyze the comments (like in case of description). In sangria I decided to do it in more pragmatic way by attaching comments to an AST node that follows it. This requires only a single parse iteration and makes it pretty trivial to extract semantic information out of them. That said, I also interested to look at an experiments with alternative solutions (like the one you suggested).

@charlieschwabacher
Copy link
Author

On the other hand I think it would be much harder to work with 2 data structures and make precise location matching in cases where you need to semantically analyze the comments

@OlegIlyenko - yeah, I agree - I was imagining descriptions being lexed independently and attached to AST nodes wether or not comments are stored in a separate structure.

If we do want to support other types of 'meaningful' comments, maybe your approach of attaching all comments to the AST is better. To avoid losing trailing comments we could keep the flat array AND store comments on nodes, or we could stick trailing comments somewhere else like the Document node.

@charlieschwabacher
Copy link
Author

@leebyron @dschafer, hoping I can get some 'official' feedback on this.

  • Is this something you would consider merging in the first place? It's intended to line up w/ the plans for the schema language spec additions in [RFC] GraphQL Schema Definition Language (SDL) graphql-spec#90, but I understand if you are not quite ready or if you already have something else in the pipeline.
  • If it is something that you will consider merging, what's your take on storing all comments on the AST vs parsing descriptions directly? Can you see any other types of meaningful comments being used in the future?

@leebyron
Copy link
Contributor

Thanks for investigating this! I agree with the other feedback that the ## token is too confusing and instead all comments should be represented.

This particular implementation is probably not the way this should be implemented, treating comments as a different type of AST node can potentially create problems and doesn't support any other more generic handling of comments as Oleg highlighted

@charlieschwabacher
Copy link
Author

@leebyron thanks for the quick response.

Would something like comment?: ?string on AST nodes capturing leading comments, and a trailingComment?: ?string on the Document work better?

In that case any comment would become 'description' on types and fields without a special syntax?

@leebyron
Copy link
Contributor

Well comments likely do need to be represented in AST form and there can be many of them for a larger docblock.

I'm curious to know more about how other parsers handle comment representation before making a decision

@leebyron
Copy link
Contributor

Also determining if a comment is the trailing comment for one node or the leading comment for another could end up being tricky

@charlieschwabacher
Copy link
Author

charlieschwabacher commented Jul 12, 2016

Well comments likely do need to be represented in AST form and there can be many of them for a larger docblock.

Since they can only be one line, what do you think about joining them w/ \n? You lose the information in loc, but I think nothing besides that.

Also determining if a comment is the trailing comment for one node or the leading comment for another could end up being tricky

I think it makes sense to always consider comments to be leading. The issue that raises is that if a comment is the last bit of text in the document, there will be no AST node to attach it to.

I'm curious to know more about how other parsers handle comment representation before making a decision

That makes a lot of sense to me 👍

@ghost ghost added the CLA Signed label Jul 12, 2016
@charlieschwabacher
Copy link
Author

charlieschwabacher commented Jul 16, 2016

@OlegIlyenko @leebyron I have been thinking a little more about parsing all comments and attaching to the AST.

Starting with this query

query {
  planet(id: "cGxhbmV0czox") {
    name
  }
}

which produces this AST:

Document
  OperationDefinition
    SelectionSet
      Field
        Name
        Argument
          Name
          StringValue
        SelectionSet
          Field
            Name

Inserting comments between every token is entirely valid (at least as of now)

# comment a
query
# comment b
{
  # comment c
  planet
  # comment d
  (
    # comment e
    id
    # comment f
    :
    # comment g
    "cGxhbmV0czox"
    # comment h
  )
  # comment i
  {
    # comment j
    name
    # comment k
  }
  # comment l
}
# comment m
type
# comment n

We end up with not just leading and trailing comments, but internal comments between tokens that make up a node. For example comment f is in the middle of an Argument node.

It seems like the best we can do to deal with internal comments is to define comments on a node as any comments starting after the end of preceding sibling and before the end of the node, and then stick them in an array on the node. We might want something like comments e, f, and g on the Argument node.

If we do this though, we have to somehow define at which depth in the AST comments should be attached. For example, if we were to just attach them to leaf nodes, comment e would be on the Name node, when it seems clear that it should be on the Argument node.

Then we have the issue of comments like h which would go on the Field node above e, f, and g, and comment n which would probably go all the way up on the document node and need special handling to get there.

Based on how much complexity this introduces, I still think a special syntax for 'doc' comments, lexing only those, and parsing them only in well defined locations is the way to go.

@charlieschwabacher
Copy link
Author

BTW, another parser that takes this approach is CoffeeScript's. It ignores # comments as whitespace, but has a ### block comment syntax that is parsed and preserved as comments in the compiled javascript.

# ignored as whitespace

###
parsed block comment
###

@leebyron
Copy link
Contributor

Rather than introduce a new token for a different kind of comment, #463 and #464 add in this functionality but leverage the existing comment syntax. This ended up being the preferred decision because of being simpler but also less likely to cause accidental confusion by different type comments being used in the wrong ways.

The approach taken in #463 also properly handles internal comments, addressing concerns @charlieschwabacher thoughtfully laid out.

@leebyron leebyron closed this Aug 24, 2016
@leebyron
Copy link
Contributor

Thanks again for your exploration on this! Despite this specific PR not being merged, your investigations, code, and thoughtful conversation helped lead to a great final decision.

@charlieschwabacher
Copy link
Author

#464 looks great, really happy to see this functionality added. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants