-
Notifications
You must be signed in to change notification settings - Fork 2k
RFC: Block String #926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Block String #926
Conversation
One question I'm considering is whether the contents of a multi-line string literal should always trim indentation, or only do so in the context of a docstring style description in the schema language (should we expand the RFC to include that). https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation The argument for doing so is that the primary value of a multi-line strings is to write literate long-form text, even in the context of true values provided for runtime interpretation. The argument against doing so is that it restricts some values from being represented as multi-line strings - specifically values which intend to start with whitespace. Those values would be only be representable with single-line comments and escape sequences. I'm interested in opinions on this, especially from those with experience in Scala, Python, Elixir, or other languages with multi-line literals. |
Coffeescript would be a prior art for doing this: http://coffeescript.org/#strings |
The Ceylon language also supports this triple-quote literal with unescaped characters. It also has similar behavior of removing indentation. https://ceylon-lang.org/documentation/1.3/tour/basics/#string_literals |
f222fa5
to
8a229e3
Compare
Why not allow "regular strings" to span multiple lines? If we allow that, we could still keep the """triplequoted strings""" for the extra features: no need to escape " and , and the trimming of indentation. (Or not, I'm not sure if these extra features warrant a new literal format in the language. I don't have a strong opinion.) This would also answer your question I think: If you want a multiline string that trims indentation, use """, and if you want a multiline string that keeps indentation, use ". |
I also quite unsure what would be the best way to handle this. For docstrings it's clearly necessary to remove indentation. ATM, I feel that it would be a good idea to preserve the whitespaces in the normal multi-line strings. In scala whitespaces are preserved, but a common pattern is to use """This is a
|triple-quoted string
|and it can contain
|multiple lines""".stripMargin Maybe a special syntax can be introduced to express the intention? |
I would probably vote for stripping indentation NOT being a feature of the """ string literal itself. It introduces too many weird edge cases that I think are not worth handling (e.g. what if some lines are indented differently from others). That said, I agree we should strip indentation from descriptions. But that can be specified in the part of the specification that describes how string literals before field names are treated as field descriptions (the specification will recommend/require that indentation is removed from the string before treating it as a description for the field). |
Yeah I think documentation processing and display tools could do the white space stripping for you, it doesn't have to be a feature of the language. I think it would be nice if it worked just like template literals in JS. |
I think expanding " to allow literal newlines is interesting, but out of scope for this change - there are some pretty surprising implications I encountered when testing out this idea, mostly around printing values rather than parsing them. I also think changing the behavior of an existing literal is too dangerous with respect to preserving behavior expectations across server versions.
I hope we can make this well defined. If there are weird edge cases, we'll see exactly the same weird edge cases exhibited in documentation, so it's important to solve for them and be consistent about it. For example, different levels of indentation is expected in markdown (which descriptions use) and preserved. In this RFC I propose an algorithm that's nearly identical to the one python and coffeescript use.
I'd be curious to hear examples of when you would want this behavior - is it just to mirror JavaScript behavior? Are there specific cases in mind that you imagine wanting to support, or is it just a general feeling? I think it would be confusing to have these literals interpreted in different ways depending on where they're found. Especially since they're most likely to be used in other places where long-form content is being written (other directives, as an example). My thought was that rather than specifically making descriptions special, that these literals would carry this behavior wherever they're used. In my personal opinion, this is one of the more confusing aspects of learning the difference between multi-line strings and docstrings in Python. This is especially important since GraphQL is not a programming language where you might write a function to operate on the string to solve this problem (https://esdiscuss.org/topic/multiline-template-strings-that-don-t-break-indentation, or the widely-used stripMargin in Scala) - otherwise everywhere that might expect to receive a multi-line string would have to do similar post-processing. If multi-line strings are just another way of representing strings, that means that almost everywhere that expects a string would have to be aware of this post-processing expectation and be aware of what kind of literal was used. I don't think this is reasonable to expect of schema developers, and would much rather account for this at the language parsing level to ensure consistency. There is precedent for trimming common indentation like this for all literals. Coffeescript championed this approach, which as I understand works well for those using Coffeescript. The new Ceylon language also treats it's multi-line literals in this way. I'm curious if the naming is part of the problem here. Really these new tokens don't need to be multiple lines. What is unique is that they are a "block" of content that's interpreted verbatim without escapes. |
Most of the discussion seems to be about the multi-line aspect, so it is probably the most contentious part. Basically stripping indentation violates the principle of least surprise for me, especially since the other languages that do it are somewhat niche. I really like the idea that Plus, if we're talking about post-processing, GraphQL already asks anything consuming documentation strings to include a Markdown renderer, so it's not unreasonable to ask for some other nice-to-haves if we think being able to indent doc strings is critical. |
I'm against having separate behavior for Here is an example of how
It looks really ugly for me. So if we speak about IDL without stripping ident it quickly becomes unreadable. As for preserving spaces inside Query document, I think it's better to use query parameters for this purpose. |
I share this concern. The behavior strikes me as very magical, regardless of whether it is selectively applied (ie. in docstrings) or everywhere. @IvanGoncharov's right, however, that inline multiline strings will look ugly in some contexts. My inclination would be to solve the docstring problem using comment syntax, not multi-line strings, for uniformity with other tools, and solve the inline multiline string problem using document-local constants:
|
@wincent I like your suggestion! This idea of a I don't have a string opinion on a docstring syntax, but I like |
Interesting. For me it's the opposite. Not stripping the indentation seems to violate the principle of least surprise. I would find it surprising that depending on the context of the use of the literal, that the behavior of how it is interpreted changes. I agree with @IvanGoncharov that essentially all other places you would expect to find this literal cares about the same legibility concern (otherwise, it would just use a standard double-quote string) Are there use cases we can think of for not stripping indentation? Other than the fact that not all programming languages with multi-line strings have this behavior?
I think descriptions are a primary use case for multi-line strings, and if we fall back to comments-as-description (which I've been convinced have serious drawbacks) then I would propose abandoning multi-line strings as not valuable enough, especially if they can't be used for the purpose of verbatim text without a work-around like needing to add yet another syntactical addition, and separating the definition and use of the long-form content just to work around the indentation issue. |
I think one of the motivations for introducing block strings for use with docstring style descriptions is that they can also be used for other places long-form text may appear. A clear second use case is deprecation reasons, as @IvanGoncharov pointed out. In the long-tail are potential cases for long-form text within queries themselves, but it's likely going to primarily live within the domain of directives and descriptions. |
I think indentation should be stripped. Here's an alternate indentation of the prior example
Constants may have other merits, but I dislike adding distance between declaration and use in the |
👋 Just adding my opinion from a Ruby background: multi-line strings are fine and I think indentation should be stripped. Ruby always had heredoc syntax for strings, but it didn't strip indentation. So, Ruby on Rails added So, I think we should skip a step, and strip indentation from the start! Here are Ruby's indentation-handling rules:
"Literals.rdoc", see "squiggly heredoc" Are there any cases where that is not desirable behavior? |
Very fascinating to hear about Ruby's evolution of heredoc. As I've been doing more research into other languages it's clear that multi-line literals have served two purposes: true heredoc style embedded content in which leading whitespace should be considered part of the content, as well as long-form written text where leading whitespace is expected to be ignored. Most languages ended up with some standard library function to get the latter behavior from the former, but some have added a different literal for exactly the latter purpose, Ruby being a best example of that. I agree that since this attempts to support writing long-form text and not embedded content, that we should avoid having additional syntax to specify the behavior people will expect. And since GraphQL is a data/query description language, not a programming language, we don't really have the option of adding functions for converting between the behaviors. It sounds like nearly all indentation-stripping methods use the same method of stripping the least-indented non-whitespace line. Python's docstring rules include a special-case for the first line to avoid a required leading newline (something Ruby heredoc grammatically assumes) - I'd suggest keeping that behavior. |
I'm interested in thoughts from @syrusakbary and @andimarek as well, since you both maintain fairly popular GraphQL libraries. Especially getting an understanding of the impact of a change like this on the libraries you maintain. |
@leebyron I don't see any problem with this change. Do you have anything specific in mind which might be problematic? |
@andimarek - specifically I'm curious about being able to build this into the lexer for the parsing functionality of the library, and ensuring that I have multiple sets of eyes and brains on considering any corner cases of using multi-line strings in GraphQL documents in general - though I'm happy to hear you don't see any issue. @OlegIlyenko, @jjergus, @wincent @stubailo - are you at all convinced by the arguments and opinions above? (Or if you still have concerns - are they strong or weak?) Also cc @dschafer for a closer look at the discussion on this thread (we had conversations about this PR offline, but I'm not sure if you have seen the broader feedback) |
With respect to the stripping or not stripping indentation, I think it's pretty clear that most people will value being able to indent strings for aesthetic reasons, but at the same time having the behavior be automatic, magical, and not possible to opt out of is potentially problematic. In the absence of a separate syntactical (or programmatic) means of signaling the intent to strip, I would (weakly) prefer it if we didn't do so. Of those two means (syntactic and programmatic), I think we should reject the idea of using a separate syntactic token (for simplicity) and forget about programmatic means because, as you say, GraphQL is not a programming language. If the code that is consuming the schema artifact cares about leading indentation, it could trivially strip it itself anyway, which would probably be my preferred solution (yes, duplication of effort over magic). The other concern I have is with overloading multiline-strings with two uses (ie. use as docstrings and use wherever the grammar currently allows strings). In Python, this works well only because strings are expressions, and the docstring position (first statement after function body etc) is a valid location for a statement containing an expression. In GraphQL, which is not a programming language, allowing strings (and only multiline strings) to appear at specific places where documentation is allowed feels arbitrary and subtle to me. (I am not a Python programmer, so please correct me if I am mistaken in my characterization.) The other thing I'll be interested to see is how this — being a breaking change — ends up playing out in practice. We could, for example, use comments for documentation (indicated with a leading |
I agree with @wincent on all points described in previous comments. I think important aspect we also need to consider is a copy/paste. I can imagine authoring the docstrings directly in the GraphQL document. In this case, I would prefer indentation to be stripped. For arguments and input objects, in most practical scenarios, I will copy/paste the contents of the string from some other place, especially if it's a bigger content (based on my previous experiences with scala's multiline strings). Multi-line strings are ideal in this scenario since I don't need to re-format the original multi-line content in any way. This would not work anymore if GraphQL multi-line strings will manipulate the string in some way (copy/pasted string might also contain the indentation which is part of the content and I don't want it to be stripped). If indentation is not stripped from normal strings, we can recommend the GraphQL implementations libraries to provide a helper function to easily strip the whitespaces in a way docstring does it automatically. In general, the idea of providing a documentation as a multi-line string in specific position withing the AST sounds quite foreign and a bit confusing to me. I feel that the most natural way to provide the documentation would be the directive (in this case an addition of multi-line strings feature would feel natural to me as well). Based on the previous discussions, the use of directives for documentation string was considered quite inconvenient (and I would agree with it). A javadoc-style comment is my second favorite: ##
# This is a doctring comment.
# Blah.
type Foo {
bar: String
} Comments have specific semantics - they provide more information about the schema for human beings. When I see something like this: """
This is a doctring comment.
Blah.
"""
type Foo {
bar: String
} I see a string literal positioned close to the type definition. String literals do not define any specific semantics (except that they normally provide a value for an input field or an argument, which then defines its meaning), So I need to infer the semantics based on the string positioning within AST. I think this is my main concern. This is also why I mentioned directives previously. When provided as an argument to a directive, the semantics becomes much more clear and easier to infer based on the context: @description(text: """
This is a doctring comment.
Blah.
""")
type Foo {
bar: String
} As @wincent mentioned, overloading string literal (can non-multiline string literal be used as a docstring?) with two significantly different interpretations based on the AST position is quite confusing. |
I like the @description directive idea, but currently description comments go before the element while directives go after, right? |
@JeffRMoore yeah with the current syntax it would look like this: type Foo @description(text: """
This is a docstring comment.
Blah.
""") {
bar: String
} |
My concern is that it's not the consuming code which determines if it cares about leading indentation or not, it's the client writing the input. Significant leading whitespace is going to be a property of each input, not a property of each position the input could be placed. Add to that if the default behavior is that leading indentation is not stripped and server-side engineers need to consider when to add it, that it will likely be behavior that's applied inconsistently, and even if we could come up with tools in every implementation that makes the consistent application easier to get right, indentation significance still depends on the input. All this would lead to clients of GraphQL not being able to anticipate when they can and cannot anticipate indentation-stripping behavior. That means they won't trust it, and they'll write their documents in a way that is harder to read, which I would count as a significant loss. Legibility is paramount to this decision. Also, I don't think it's fair to characterize this as "magic" - what we're talking about here is a uniform behavior for this new kind of literal (strip-always) vs position-dependent behavior for this new kind of literal. In my experience, code is often considered "magic" when it tries to predict your intent amongst many options - sometimes leading to wrong and surprising results. In this case we're not trying to predict any intent and I'm proposing that we do not provide different interpretations options. Ultimately I would like to make this decision based on anticipated real world use cases. If the majority of those would expect leading indentation to be preserved, then we should do as you suggest and not strip leading indentation by default. I think what we've found is the opposite. The two clearest cases are writing descriptions and writing prose within directives, both of which expect leading indentation to be stripped. I would imagine that future use cases for the multiline string would share this expectation. |
Could you clarify what you mean by breaking change in this context, or what scenarios you see that we should plan for? It's true that supporting a new literal is forward-breaking, such that older libraries that do not yet support them will not consume them. I'm not very worried about that, since this is going to be primarily used in a new addition to GraphQL (descriptions in SDL). Another way to think about this being a breaking change is that existing use of |
Could you help me understand why you see this as a problem? I'm not sure I would characterize this as an overloading since a docstring is not a logical operator. Typically overloading is problematic when the same method/operator can have different behavior depending on the context in which it is used - (i.e. What we have done in this case is added a new place where a string literal can appear in the Grammar. I think debate on this is welcome, but my opinion is that descriptions in GraphQL are important enough and talked about as first-class to warrant their own place in the AST for the schema language. In Python, docstrings started as an unofficial addition. Since Python is an imperative programming language, a string literal is an expression which is a statement but one that doesn't imperatively do anything. Then a doc generator can walk the AST and easily find them and process them. Same is true in JS, though docstrings never caught on in that community: function docstringedIdentity(x) {
"this docstring is a statement that has no impact";
return x;
} Since GraphQL isn't a programming language and doesn't have an equivalent to a sequence of statements, this isn't really open to the community building for themselves. Even if it was, I'd prefer we think about this up front since descriptions are so important to GraphQL. |
I agree that conceptually a directive could be a natural way to add descriptions, but since the primary goal is to make descriptions easy to read and write, it's my opinion that the creation of a new position in the AST for this data is worth the improvement to legibility and convenience. However I think we could also make an argument that there is a point at which a directive is no longer the best conceptual fit for some piece of data. For example we decided not to write |
I'm not 100% sure what you mean by "client" in these sentences. Do you mean a developer writing a GraphQL document?
I'm having trouble following the argument or even parsing the text here. In the paragraph I quote below you say we're not considering position-dependent behavior, but above you seem to be talking about "indentation significance depend[ing] on the input" and I don't know what you mean by that. Also, what does "clients" mean in this paragraph? Feel free not to answer these questions: you solicited concerns, "strong or weak", and I gave some weak ones. I'm mostly discussing this out of intellectual interest rather than firm conviction that things should be done in a certain way.
We're probably off in the weeds now trying to define the exact semantics of "magic", but I'll clarify anyway: I was using it in the sense of violating the "principle of least surprise" mentioned higher up in the thread. I know there are counter-examples — although I'd argue that they're pretty rare — but in most languages I've seen the material inside the opening and closing delimiters of a string literal is basically a sequence of bytes; if a line within the string begins with N bytes of whitespace then those N bytes are no less part of the string than N other bytes of whitespace (or non-whitespace) anywhere in the middle of a line. The fact that the leading whitespace may be automatically stripped would be unexpected for me.
I think that's likely too, which is why I said, " I think it's pretty clear that most people will value being able to indent strings for aesthetic reasons". In principal I agree with the idea of optimizing for the convenience of the common case, but I am reticent to do something that would preclude people from doing anything outside of the common case. If I don't want stripping for example, my only choice is to not use multi-line strings at all.
I just mean this is the first major forward-breaking change that I am aware of, and given the size and number of actors in the GraphQL ecosystem I am curious to see how it goes. Obviously we can't commit to never making breaking changes — that way lies stagnation — but I wonder whether there'll be things that we can learn as time goes by to make them more "successful" (in the sense that we find ways to minimize pain in exchange for the upside). I specifically mentioned
I was using the term "overloading" not in the narrow sense of operator overloading but in the more general English-as-spoken-by-programmers sense, where we use "overloading" for various things ranging from specifics like operator overloading and function overloading to more abstract things like "overloading terminology with multiple meanings". In this context, I hoped that "overloading multiline-strings with two uses (ie. ...)" would have made that clear. I don't really have anything to add beyond what I said in the paragraph that you quoted. |
When discussed at a previous GraphQLWG meeting there was consensus around multi-line strings and SDL descriptions, so I'll be rebasing and merging these |
This RFC adds a new form of `StringValue`, the multi-line string, similar to that found in Python and Scala. A multi-line string starts and ends with a triple-quote: ``` """This is a triple-quoted string and it can contain multiple lines""" ``` Multi-line strings are useful for typing literal bodies of text where new lines should be interpretted literally. In fact, the only escape sequence used is `\"""` and `\` is otherwise allowed unescaped. This is beneficial when writing documentation within strings which may reference the back-slash often: ``` """ In a multi-line string \n and C:\\ are unescaped. """ ``` The primary value of multi-line strings are to write long-form input directly in query text, in tools like GraphiQL, and as a prerequisite to another pending RFC to allow docstring style documentation in the Schema Definition Language.
8a229e3
to
7c00820
Compare
c9e8374
to
36ec0e9
Compare
This RFC adds a new lexed token, the block string, similar to that found in Coffeescript, Python, and Scala.
A block string starts and ends with a triple-quote:
Block strings are useful for typing literal bodies of text where new lines should be interpreted literally. In fact, the only escape sequence used is
\"""
and\
is otherwise allowed unescaped. This is beneficial when writing documentation within strings which may reference the back-slash often:The primary value of block strings are to write long-form input directly in query text, in tools like GraphiQL, and as a prerequisite to another pending RFC to allow docstring style documentation in the Schema Definition Language.