Skip to content

RFC: Block String #926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 30, 2017
Merged

RFC: Block String #926

merged 3 commits into from
Nov 30, 2017

Conversation

leebyron
Copy link
Contributor

@leebyron leebyron commented Jun 22, 2017

This RFC adds a new lexed token, the block string, similar to that found in Coffeescript, Python, and Scala.

A block string starts and ends with a triple-quote:

"""This is a triple-quoted string
and it can contain multiple lines"""

Block strings are useful for typing literal bodies of text where new lines should be interpreted literally. In fact, the only escape sequence used is \""" and \ is otherwise allowed unescaped. This is beneficial when writing documentation within strings which may reference the back-slash often:

"""
In a block string \n and C:\\ are unescaped.
"""

The primary value of block strings are to write long-form input directly in query text, in tools like GraphiQL, and as a prerequisite to another pending RFC to allow docstring style documentation in the Schema Definition Language.

@leebyron leebyron changed the title RFC: Long String (Multi-line String) RFC: Multi-line String Jun 22, 2017
@leebyron
Copy link
Contributor Author

One question I'm considering is whether the contents of a multi-line string literal should always trim indentation, or only do so in the context of a docstring style description in the schema language (should we expand the RFC to include that).

https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation

The argument for doing so is that the primary value of a multi-line strings is to write literate long-form text, even in the context of true values provided for runtime interpretation.

The argument against doing so is that it restricts some values from being represented as multi-line strings - specifically values which intend to start with whitespace. Those values would be only be representable with single-line comments and escape sequences.

I'm interested in opinions on this, especially from those with experience in Scala, Python, Elixir, or other languages with multi-line literals.

@leebyron
Copy link
Contributor Author

Coffeescript would be a prior art for doing this: http://coffeescript.org/#strings

@leebyron
Copy link
Contributor Author

The Ceylon language also supports this triple-quote literal with unescaped characters. It also has similar behavior of removing indentation. https://ceylon-lang.org/documentation/1.3/tour/basics/#string_literals

@leebyron leebyron force-pushed the rfc-longstring branch 4 times, most recently from f222fa5 to 8a229e3 Compare June 22, 2017 07:33
@jjergus
Copy link
Contributor

jjergus commented Jun 22, 2017

Why not allow "regular strings" to span multiple lines?

If we allow that, we could still keep the """triplequoted strings""" for the extra features: no need to escape " and , and the trimming of indentation. (Or not, I'm not sure if these extra features warrant a new literal format in the language. I don't have a strong opinion.)

This would also answer your question I think: If you want a multiline string that trims indentation, use """, and if you want a multiline string that keeps indentation, use ".

@OlegIlyenko
Copy link
Contributor

OlegIlyenko commented Jun 22, 2017

I'm interested in opinions on this, especially from those with experience in Scala, Python, Elixir, or other languages with multi-line literals.

I also quite unsure what would be the best way to handle this. For docstrings it's clearly necessary to remove indentation. ATM, I feel that it would be a good idea to preserve the whitespaces in the normal multi-line strings. In scala whitespaces are preserved, but a common pattern is to use stripMargin if it needs to be indented:

"""This is a 
  |triple-quoted string
  |and it can contain 
  |multiple lines""".stripMargin

Maybe a special syntax can be introduced to express the intention?

@jjergus
Copy link
Contributor

jjergus commented Jun 22, 2017

I would probably vote for stripping indentation NOT being a feature of the """ string literal itself. It introduces too many weird edge cases that I think are not worth handling (e.g. what if some lines are indented differently from others).

That said, I agree we should strip indentation from descriptions. But that can be specified in the part of the specification that describes how string literals before field names are treated as field descriptions (the specification will recommend/require that indentation is removed from the string before treating it as a description for the field).

@stubailo
Copy link

Yeah I think documentation processing and display tools could do the white space stripping for you, it doesn't have to be a feature of the language. I think it would be nice if it worked just like template literals in JS.

@leebyron
Copy link
Contributor Author

leebyron commented Jun 23, 2017

This would also answer your question I think: If you want a multiline string that trims indentation, use """, and if you want a multiline string that keeps indentation, use ".

I think expanding " to allow literal newlines is interesting, but out of scope for this change - there are some pretty surprising implications I encountered when testing out this idea, mostly around printing values rather than parsing them.

I also think changing the behavior of an existing literal is too dangerous with respect to preserving behavior expectations across server versions.

I would probably vote for stripping indentation NOT being a feature of the """ string literal itself. It introduces too many weird edge cases that I think are not worth handling (e.g. what if some lines are indented differently from others).

I hope we can make this well defined. If there are weird edge cases, we'll see exactly the same weird edge cases exhibited in documentation, so it's important to solve for them and be consistent about it. For example, different levels of indentation is expected in markdown (which descriptions use) and preserved. In this RFC I propose an algorithm that's nearly identical to the one python and coffeescript use.

Yeah I think documentation processing and display tools could do the white space stripping for you, it doesn't have to be a feature of the language. I think it would be nice if it worked just like template literals in JS.

I'd be curious to hear examples of when you would want this behavior - is it just to mirror JavaScript behavior? Are there specific cases in mind that you imagine wanting to support, or is it just a general feeling?

I think it would be confusing to have these literals interpreted in different ways depending on where they're found. Especially since they're most likely to be used in other places where long-form content is being written (other directives, as an example). My thought was that rather than specifically making descriptions special, that these literals would carry this behavior wherever they're used. In my personal opinion, this is one of the more confusing aspects of learning the difference between multi-line strings and docstrings in Python.

This is especially important since GraphQL is not a programming language where you might write a function to operate on the string to solve this problem (https://esdiscuss.org/topic/multiline-template-strings-that-don-t-break-indentation, or the widely-used stripMargin in Scala) - otherwise everywhere that might expect to receive a multi-line string would have to do similar post-processing. If multi-line strings are just another way of representing strings, that means that almost everywhere that expects a string would have to be aware of this post-processing expectation and be aware of what kind of literal was used. I don't think this is reasonable to expect of schema developers, and would much rather account for this at the language parsing level to ensure consistency.

There is precedent for trimming common indentation like this for all literals. Coffeescript championed this approach, which as I understand works well for those using Coffeescript. The new Ceylon language also treats it's multi-line literals in this way.

I'm curious if the naming is part of the problem here. Really these new tokens don't need to be multiple lines. What is unique is that they are a "block" of content that's interpreted verbatim without escapes.

@stubailo
Copy link

I'm curious if the naming is part of the problem here. Really these new tokens don't need to be multiple lines. What is unique is that they are a "block" of content that's interpreted verbatim without escapes.

Most of the discussion seems to be about the multi-line aspect, so it is probably the most contentious part. Basically stripping indentation violates the principle of least surprise for me, especially since the other languages that do it are somewhat niche.

I really like the idea that """ is the same as " but doesn't need any escaping for stuff like newlines. To me that means it doesn't need to have any special behavior around whitespace.

Plus, if we're talking about post-processing, GraphQL already asks anything consuming documentation strings to include a Markdown renderer, so it's not unreasonable to ask for some other nice-to-haves if we think being able to indent doc strings is critical.

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Jun 23, 2017

Basically stripping indentation violates the principle of least surprise for me

I'm against having separate behavior for """ depending on where it's placed. I agree that it can cause initial surprise for people who didn't use Python or CoffeScript. However, it is a lesser problem than an inconvenience that developers will experience constantly.

Here is an example of how reason argument of @deprecated directive will look like in IDL if """ will be added without stripping:

type Foo {
  bar: String @deprecated(reason: """We deprecated it because of bla bla bla.
Instead, you should use "baz" field. ...
 
IMPORTANT: Field will be removed on 11 Jan 2018.
If you can't make migration in time please contact us on ...
""")
  baz: Int
}

It looks really ugly for me. So if we speak about IDL without stripping ident it quickly becomes unreadable.

As for preserving spaces inside Query document, I think it's better to use query parameters for this purpose.

@wincent
Copy link
Contributor

wincent commented Jun 23, 2017

Basically stripping indentation violates the principle of least surprise for me

I share this concern. The behavior strikes me as very magical, regardless of whether it is selectively applied (ie. in docstrings) or everywhere.

@IvanGoncharov's right, however, that inline multiline strings will look ugly in some contexts. My inclination would be to solve the docstring problem using comment syntax, not multi-line strings, for uniformity with other tools, and solve the inline multiline string problem using document-local constants:

const BAR_DEPRECATION_REASON = """
Whatever I want.
Blah.
"""

##
# This is a doctring comment.
# Blah.
type Foo {
  bar: String @deprecated(reason: BAR_DEPRECATION_REASON)
}

@OlegIlyenko
Copy link
Contributor

@wincent I like your suggestion! This idea of a const was on my proposal TODO list as well :) I wanted to suggest let or const syntax to define constants inside of a document. It can be also quite useful for several other scenarios as well (like complex input objects for mutation arguments).

I don't have a string opinion on a docstring syntax, but I like ##-style descriptions as well

@leebyron
Copy link
Contributor Author

leebyron commented Jun 23, 2017

Basically stripping indentation violates the principle of least surprise for me

Interesting. For me it's the opposite. Not stripping the indentation seems to violate the principle of least surprise. I would find it surprising that depending on the context of the use of the literal, that the behavior of how it is interpreted changes. I agree with @IvanGoncharov that essentially all other places you would expect to find this literal cares about the same legibility concern (otherwise, it would just use a standard double-quote string)

Are there use cases we can think of for not stripping indentation? Other than the fact that not all programming languages with multi-line strings have this behavior?

My inclination would be to solve the docstring problem using comment syntax, not multi-line strings, for uniformity with other tools, and solve the inline multiline string problem using document-local constants:

I think descriptions are a primary use case for multi-line strings, and if we fall back to comments-as-description (which I've been convinced have serious drawbacks) then I would propose abandoning multi-line strings as not valuable enough, especially if they can't be used for the purpose of verbatim text without a work-around like needing to add yet another syntactical addition, and separating the definition and use of the long-form content just to work around the indentation issue.

@leebyron
Copy link
Contributor Author

I think one of the motivations for introducing block strings for use with docstring style descriptions is that they can also be used for other places long-form text may appear. A clear second use case is deprecation reasons, as @IvanGoncharov pointed out. In the long-tail are potential cases for long-form text within queries themselves, but it's likely going to primarily live within the domain of directives and descriptions.

@JeffRMoore
Copy link
Contributor

I think indentation should be stripped. Here's an alternate indentation of the prior example

type Foo {
  bar: String 
    @deprecated(reason: 
      """
      We deprecated it because of bla bla bla.
      Instead, you should use "baz" field. ...
 
      IMPORTANT: Field will be removed on 11 Jan 2018.
      If you can't make migration in time please contact us on ...
      """
    )
  baz: Int
}

Constants may have other merits, but I dislike adding distance between declaration and use in the @ deprecated example.

@rmosolgo
Copy link

👋 Just adding my opinion from a Ruby background: multi-line strings are fine and I think indentation should be stripped.

Ruby always had heredoc syntax for strings, but it didn't strip indentation. So, Ruby on Rails added .strip_heredoc, just like Scala's .stripMargin above. Recently, Ruby added a new spin on heredoc syntax (using <<~ instead of <<-) which strips indentation, since that's what people wanted all along.

So, I think we should skip a step, and strip indentation from the start!

Here are Ruby's indentation-handling rules:

The indentation of the least-indented line will be removed from each line of the content. Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored for the purposes of determining indentation, but escaped tabs and spaces are considered non-indentation characters.

"Literals.rdoc", see "squiggly heredoc"

Are there any cases where that is not desirable behavior?

@leebyron
Copy link
Contributor Author

leebyron commented Jul 5, 2017

Very fascinating to hear about Ruby's evolution of heredoc. As I've been doing more research into other languages it's clear that multi-line literals have served two purposes: true heredoc style embedded content in which leading whitespace should be considered part of the content, as well as long-form written text where leading whitespace is expected to be ignored. Most languages ended up with some standard library function to get the latter behavior from the former, but some have added a different literal for exactly the latter purpose, Ruby being a best example of that.

I agree that since this attempts to support writing long-form text and not embedded content, that we should avoid having additional syntax to specify the behavior people will expect. And since GraphQL is a data/query description language, not a programming language, we don't really have the option of adding functions for converting between the behaviors.

It sounds like nearly all indentation-stripping methods use the same method of stripping the least-indented non-whitespace line. Python's docstring rules include a special-case for the first line to avoid a required leading newline (something Ruby heredoc grammatically assumes) - I'd suggest keeping that behavior.

@leebyron
Copy link
Contributor Author

leebyron commented Jul 5, 2017

I'm interested in thoughts from @syrusakbary and @andimarek as well, since you both maintain fairly popular GraphQL libraries. Especially getting an understanding of the impact of a change like this on the libraries you maintain.

@andimarek
Copy link
Contributor

@leebyron I don't see any problem with this change. Do you have anything specific in mind which might be problematic?

@leebyron
Copy link
Contributor Author

leebyron commented Jul 6, 2017

@andimarek - specifically I'm curious about being able to build this into the lexer for the parsing functionality of the library, and ensuring that I have multiple sets of eyes and brains on considering any corner cases of using multi-line strings in GraphQL documents in general - though I'm happy to hear you don't see any issue.

@OlegIlyenko, @jjergus, @wincent @stubailo - are you at all convinced by the arguments and opinions above? (Or if you still have concerns - are they strong or weak?)

Also cc @dschafer for a closer look at the discussion on this thread (we had conversations about this PR offline, but I'm not sure if you have seen the broader feedback)

@wincent
Copy link
Contributor

wincent commented Jul 6, 2017

With respect to the stripping or not stripping indentation, I think it's pretty clear that most people will value being able to indent strings for aesthetic reasons, but at the same time having the behavior be automatic, magical, and not possible to opt out of is potentially problematic. In the absence of a separate syntactical (or programmatic) means of signaling the intent to strip, I would (weakly) prefer it if we didn't do so. Of those two means (syntactic and programmatic), I think we should reject the idea of using a separate syntactic token (for simplicity) and forget about programmatic means because, as you say, GraphQL is not a programming language.

If the code that is consuming the schema artifact cares about leading indentation, it could trivially strip it itself anyway, which would probably be my preferred solution (yes, duplication of effort over magic).

The other concern I have is with overloading multiline-strings with two uses (ie. use as docstrings and use wherever the grammar currently allows strings). In Python, this works well only because strings are expressions, and the docstring position (first statement after function body etc) is a valid location for a statement containing an expression. In GraphQL, which is not a programming language, allowing strings (and only multiline strings) to appear at specific places where documentation is allowed feels arbitrary and subtle to me. (I am not a Python programmer, so please correct me if I am mistaken in my characterization.)

The other thing I'll be interested to see is how this — being a breaking change — ends up playing out in practice. We could, for example, use comments for documentation (indicated with a leading ## instead of #) without any breakage, and without having to worry about commented-out sections getting misidentified as documentation, but we'd still need to roll out the multi-line string change in a way that carefully manages the effects of the breakage.

@OlegIlyenko
Copy link
Contributor

OlegIlyenko commented Jul 8, 2017

I agree with @wincent on all points described in previous comments. I think important aspect we also need to consider is a copy/paste.

I can imagine authoring the docstrings directly in the GraphQL document. In this case, I would prefer indentation to be stripped. For arguments and input objects, in most practical scenarios, I will copy/paste the contents of the string from some other place, especially if it's a bigger content (based on my previous experiences with scala's multiline strings). Multi-line strings are ideal in this scenario since I don't need to re-format the original multi-line content in any way. This would not work anymore if GraphQL multi-line strings will manipulate the string in some way (copy/pasted string might also contain the indentation which is part of the content and I don't want it to be stripped).

If indentation is not stripped from normal strings, we can recommend the GraphQL implementations libraries to provide a helper function to easily strip the whitespaces in a way docstring does it automatically.

In general, the idea of providing a documentation as a multi-line string in specific position withing the AST sounds quite foreign and a bit confusing to me. I feel that the most natural way to provide the documentation would be the directive (in this case an addition of multi-line strings feature would feel natural to me as well). Based on the previous discussions, the use of directives for documentation string was considered quite inconvenient (and I would agree with it). A javadoc-style comment is my second favorite:

##
# This is a doctring comment.
# Blah.
type Foo {
  bar: String
}

Comments have specific semantics - they provide more information about the schema for human beings. ## comment syntax defines the additional semantics and makes it more permanent by making the comment contents part of type documentation.

When I see something like this:

"""
This is a doctring comment.
Blah.
"""
type Foo {
  bar: String
}

I see a string literal positioned close to the type definition. String literals do not define any specific semantics (except that they normally provide a value for an input field or an argument, which then defines its meaning), So I need to infer the semantics based on the string positioning within AST. I think this is my main concern. This is also why I mentioned directives previously. When provided as an argument to a directive, the semantics becomes much more clear and easier to infer based on the context:

@description(text: """
  This is a doctring comment.
  Blah.
  """)
 type Foo {
  bar: String
}

As @wincent mentioned, overloading string literal (can non-multiline string literal be used as a docstring?) with two significantly different interpretations based on the AST position is quite confusing.

@JeffRMoore
Copy link
Contributor

I like the @description directive idea, but currently description comments go before the element while directives go after, right?

@OlegIlyenko
Copy link
Contributor

@JeffRMoore yeah with the current syntax it would look like this:

type Foo @description(text: """
  This is a docstring comment.
  Blah.
  """) {
  bar: String
}

@leebyron
Copy link
Contributor Author

If the code that is consuming the schema artifact cares about leading indentation, it could trivially strip it itself anyway, which would probably be my preferred solution (yes, duplication of effort over magic).

My concern is that it's not the consuming code which determines if it cares about leading indentation or not, it's the client writing the input. Significant leading whitespace is going to be a property of each input, not a property of each position the input could be placed.

Add to that if the default behavior is that leading indentation is not stripped and server-side engineers need to consider when to add it, that it will likely be behavior that's applied inconsistently, and even if we could come up with tools in every implementation that makes the consistent application easier to get right, indentation significance still depends on the input.

All this would lead to clients of GraphQL not being able to anticipate when they can and cannot anticipate indentation-stripping behavior. That means they won't trust it, and they'll write their documents in a way that is harder to read, which I would count as a significant loss. Legibility is paramount to this decision.

Also, I don't think it's fair to characterize this as "magic" - what we're talking about here is a uniform behavior for this new kind of literal (strip-always) vs position-dependent behavior for this new kind of literal. In my experience, code is often considered "magic" when it tries to predict your intent amongst many options - sometimes leading to wrong and surprising results. In this case we're not trying to predict any intent and I'm proposing that we do not provide different interpretations options.

Ultimately I would like to make this decision based on anticipated real world use cases. If the majority of those would expect leading indentation to be preserved, then we should do as you suggest and not strip leading indentation by default. I think what we've found is the opposite. The two clearest cases are writing descriptions and writing prose within directives, both of which expect leading indentation to be stripped. I would imagine that future use cases for the multiline string would share this expectation.

@leebyron
Copy link
Contributor Author

The other thing I'll be interested to see is how this — being a breaking change — ends up playing out in practice. We could, for example, use comments for documentation (indicated with a leading ## instead of #) without any breakage

Could you clarify what you mean by breaking change in this context, or what scenarios you see that we should plan for? It's true that supporting a new literal is forward-breaking, such that older libraries that do not yet support them will not consume them. I'm not very worried about that, since this is going to be primarily used in a new addition to GraphQL (descriptions in SDL).

Another way to think about this being a breaking change is that existing use of # comment descriptions would need to be migrated to use the docstring style. However if we decided to use a ## style comment then it would be equivalently breaking in this way.

@leebyron
Copy link
Contributor Author

The other concern I have is with overloading multiline-strings with two uses (ie. use as docstrings and use wherever the grammar currently allows strings). In Python, this works well only because strings are expressions, and the docstring position (first statement after function body etc) is a valid location for a statement containing an expression. In GraphQL, which is not a programming language, allowing strings (and only multiline strings) to appear at specific places where documentation is allowed feels arbitrary and subtle to me. (I am not a Python programmer, so please correct me if I am mistaken in my characterization.)

Could you help me understand why you see this as a problem? I'm not sure I would characterize this as an overloading since a docstring is not a logical operator. Typically overloading is problematic when the same method/operator can have different behavior depending on the context in which it is used - (i.e. + being both numerical addition and string concatenation in JavaScript is the theme of many a wtfjs post) - however since we're talking about a value literal, not an operator, the interpretation of a string literal is going to be the same in all of the places we would expect to use it.

What we have done in this case is added a new place where a string literal can appear in the Grammar. I think debate on this is welcome, but my opinion is that descriptions in GraphQL are important enough and talked about as first-class to warrant their own place in the AST for the schema language.

In Python, docstrings started as an unofficial addition. Since Python is an imperative programming language, a string literal is an expression which is a statement but one that doesn't imperatively do anything. Then a doc generator can walk the AST and easily find them and process them. Same is true in JS, though docstrings never caught on in that community:

function docstringedIdentity(x) {
  "this docstring is a statement that has no impact";
  return x;
}

Since GraphQL isn't a programming language and doesn't have an equivalent to a sequence of statements, this isn't really open to the community building for themselves. Even if it was, I'd prefer we think about this up front since descriptions are so important to GraphQL.

@leebyron
Copy link
Contributor Author

I feel that the most natural way to provide the documentation would be the directive (in this case an addition of multi-line strings feature would feel natural to me as well). Based on the previous discussions, the use of directives for documentation string was considered quite inconvenient (and I would agree with it).

I agree that conceptually a directive could be a natural way to add descriptions, but since the primary goal is to make descriptions easy to read and write, it's my opinion that the creation of a new position in the AST for this data is worth the improvement to legibility and convenience.

However I think we could also make an argument that there is a point at which a directive is no longer the best conceptual fit for some piece of data. For example we decided not to write type X @implements(type: "Y") even though interface implementation could certainly have been written that way - instead we added custom syntax for it since it's a first-level feature of the SDL. I'd like to make descriptions a first-level feature as well.

@wincent
Copy link
Contributor

wincent commented Jul 14, 2017

If the code that is consuming the schema artifact cares about leading indentation, it could trivially strip it itself anyway, which would probably be my preferred solution (yes, duplication of effort over magic).

My concern is that it's not the consuming code which determines if it cares about leading indentation or not, it's the client writing the input. Significant leading whitespace is going to be a property of each input, not a property of each position the input could be placed.

I'm not 100% sure what you mean by "client" in these sentences. Do you mean a developer writing a GraphQL document?

Add to that if the default behavior is that leading indentation is not stripped and server-side engineers need to consider when to add it, that it will likely be behavior that's applied inconsistently, and even if we could come up with tools in every implementation that makes the consistent application easier to get right, indentation significance still depends on the input.

All this would lead to clients of GraphQL not being able to anticipate when they can and cannot anticipate indentation-stripping behavior. That means they won't trust it, and they'll write their documents in a way that is harder to read, which I would count as a significant loss. Legibility is paramount to this decision.

I'm having trouble following the argument or even parsing the text here. In the paragraph I quote below you say we're not considering position-dependent behavior, but above you seem to be talking about "indentation significance depend[ing] on the input" and I don't know what you mean by that. Also, what does "clients" mean in this paragraph?

Feel free not to answer these questions: you solicited concerns, "strong or weak", and I gave some weak ones. I'm mostly discussing this out of intellectual interest rather than firm conviction that things should be done in a certain way.

Also, I don't think it's fair to characterize this as "magic" - what we're talking about here is a uniform behavior for this new kind of literal (strip-always) vs position-dependent behavior for this new kind of literal. In my experience, code is often considered "magic" when it tries to predict your intent amongst many options - sometimes leading to wrong and surprising results. In this case we're not trying to predict any intent and I'm proposing that we do not provide different interpretations options.

We're probably off in the weeds now trying to define the exact semantics of "magic", but I'll clarify anyway: I was using it in the sense of violating the "principle of least surprise" mentioned higher up in the thread. I know there are counter-examples — although I'd argue that they're pretty rare — but in most languages I've seen the material inside the opening and closing delimiters of a string literal is basically a sequence of bytes; if a line within the string begins with N bytes of whitespace then those N bytes are no less part of the string than N other bytes of whitespace (or non-whitespace) anywhere in the middle of a line. The fact that the leading whitespace may be automatically stripped would be unexpected for me.

Ultimately I would like to make this decision based on anticipated real world use cases. If the majority of those would expect leading indentation to be preserved, then we should do as you suggest and not strip leading indentation by default. I think what we've found is the opposite. The two clearest cases are writing descriptions and writing prose within directives, both of which expect leading indentation to be stripped. I would imagine that future use cases for the multiline string would share this expectation.

I think that's likely too, which is why I said, " I think it's pretty clear that most people will value being able to indent strings for aesthetic reasons". In principal I agree with the idea of optimizing for the convenience of the common case, but I am reticent to do something that would preclude people from doing anything outside of the common case. If I don't want stripping for example, my only choice is to not use multi-line strings at all.

The other thing I'll be interested to see is how this — being a breaking change — ends up playing out in practice. We could, for example, use comments for documentation (indicated with a leading ## instead of #) without any breakage

Could you clarify what you mean by breaking change in this context, or what scenarios you see that we should plan for? It's true that supporting a new literal is forward-breaking, such that older libraries that do not yet support them will not consume them. I'm not very worried about that, since this is going to be primarily used in a new addition to GraphQL (descriptions in SDL).

I just mean this is the first major forward-breaking change that I am aware of, and given the size and number of actors in the GraphQL ecosystem I am curious to see how it goes. Obviously we can't commit to never making breaking changes — that way lies stagnation — but I wonder whether there'll be things that we can learn as time goes by to make them more "successful" (in the sense that we find ways to minimize pain in exchange for the upside). I specifically mentioned ## docstring comments as an example of a way to avoid one kind of breakage (older tools still being able to process the text, even if they don't "see" the comments as docstrings).

The other concern I have is with overloading multiline-strings with two uses (ie. use as docstrings and use wherever the grammar currently allows strings). In Python, this works well only because strings are expressions, and the docstring position (first statement after function body etc) is a valid location for a statement containing an expression. In GraphQL, which is not a programming language, allowing strings (and only multiline strings) to appear at specific places where documentation is allowed feels arbitrary and subtle to me. (I am not a Python programmer, so please correct me if I am mistaken in my characterization.)

Could you help me understand why you see this as a problem? I'm not sure I would characterize this as an overloading since a docstring is not a logical operator.

I was using the term "overloading" not in the narrow sense of operator overloading but in the more general English-as-spoken-by-programmers sense, where we use "overloading" for various things ranging from specifics like operator overloading and function overloading to more abstract things like "overloading terminology with multiple meanings". In this context, I hoped that "overloading multiline-strings with two uses (ie. ...)" would have made that clear. I don't really have anything to add beyond what I said in the paragraph that you quoted.

@leebyron
Copy link
Contributor Author

When discussed at a previous GraphQLWG meeting there was consensus around multi-line strings and SDL descriptions, so I'll be rebasing and merging these

This RFC adds a new form of `StringValue`, the multi-line string, similar to that found in Python and Scala.

A multi-line string starts and ends with a triple-quote:

```
"""This is a triple-quoted string
and it can contain multiple lines"""
```

Multi-line strings are useful for typing literal bodies of text where new lines should be interpretted literally. In fact, the only escape sequence used is `\"""` and `\` is otherwise allowed unescaped. This is beneficial when writing documentation within strings which may reference the back-slash often:

```
"""
In a multi-line string \n and C:\\ are unescaped.
"""
```

The primary value of multi-line strings are to write long-form input directly in query text, in tools like GraphiQL, and as a prerequisite to another pending RFC to allow docstring style documentation in the Schema Definition Language.
@leebyron leebyron changed the title RFC: Multi-line String RFC: Block String Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.