-
Notifications
You must be signed in to change notification settings - Fork 143
Should we be using DOMString, USVString, or CSSOMString? #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I highly recommend USVString for new APIs. |
I'd say being consistent is the most important thing here. For example, I'd assume style properties to use the same type as other CSSOM strings. On the other hand, URLs are consistently USVString elsewhere in the web platform, particularly in the |
The Working Group just discussed
The full IRC log of that discussion<dael> Topic: schedule wrangling<dael> smfr: We probably need AmeliaBR for the Opacity item. <AmeliaBR> I can call in... just a sec <dael> astearns: We can do Houdini and then do this? <dael> smfr: Fine <astearns> Should we be using DOMString, USVString, or CSSOMString? - https://github.com//issues/687 <dael> github: https://github.com//issues/687 <dael> TabAtkins: Should things be spec as DOMString, USVString, or CSSOMString. I need guidance. I got that URL should be USV, but for the rest I'm not sure what to do. ANy guidance appriciated. <dael> plinss: In general I think we should be doing USV and only DOM where we need it for backwards compat <dael> TabAtkins: And we did CSSOM to allow that but for engines that are more efficient on DOM to do that. <dael> plinss: That's why I'm thinking for new APIs let's do USV and not propigate more DOM <dael> florian: Reason for CSSOM was an impl difficulty I thought <dael> gsnedders: But elsewhere in platform spec have changed dom to usv. I'm not sure why we have cssom at all rather then we expect will change. <dael> florian: cssom is for should do usv but do dom if you have to <dael> plinss: Reality is they will do that so why give them permission to do it wrong if they can do it right? <dael> astearns: Sound to me like general consensus is to use usv. I'm not hearing anyone cheer for cssom. <dael> florian: Then we should move away from cssom everywhere. <dael> astearns: Fair but we can try it with this new API and any in the future and see how much we can go back as we find out if this one works. <dael> astearns: Objections to spec USVString for this Houdini API? <dael> TabAtkins: might as well be all <dael> astearns: Objections to spec USVString for all new Houdini APIs? <dael> RESOLVED: specify USVString for all new Houdini APIs? |
Hmm. DOMString remains the default for strings that aren't URLs. It's unfortunate that CSS is choosing to be inconsistent with the rest of the platform due to what I understand to be a Servo implementation limitation. |
Servo wasn't part of the discussion. This was based on my recollection of DOMString vs USVString discussions we had in the TAG a while back. I accept that I may have mis-remembered, was simply wrong, or that the consensus on which to use has moved on since those discussions and I was out of the loop. My understanding is that the primary difference between a DOMString and a USVString is that the former allows unpaired surrogates while the latter does not. My recollection was that allowing unpaired surrogates was a legacy mistake that we're stuck with in older APIs, but didn't want to propagate to new APIs, and that USVString should be preferred for new work where compatibility with older APIs wasn't an issue, not just for URLs. Hence this decision. If I'm wrong, we can revisit/revert this decision (and should make the guidance in WebIDL clearer and/or add a section the the TAG's design principles document to prevent this mistake in the future). If there's still debate on this, then we should file an issue on the TAG and get better consensus on how to proceed. |
Unpaired surrogates are not a legacy mistake; they're just how strings in JS work. I think Web IDL is fairly clear on this:
|
I don't want to get into a debate over the word "mistake", let me rephrase: JS strings based on 16 bit values were from a time when it was believed this would be sufficient to represent all unicode codepoints (I was there). If we were building JS from scratch today, we wouldn't allow unpaired surrogates in strings. Carrying this forward is a burden, if we have a chance to fix it, we should at least think about it very carefully and weigh the pros and cons. Again, if that's already been done, and the decision was "too late, we just have to live with this forever, suck it up", then fine, but I'd like to see the records of that discussion, and clear documentation of the outcome. I've read that section of Web IDL and I don't find it sufficiently clear (apparently the same is true for others or we wouldn't ever have been discussing DOMString vs USVString here). Define "perform text processing" (as opposed to working with text in strings, or text from the DOM for that matter, isn't that all text processing?), what is the rationale behind "When in doubt"? If the current thinking is use USVString only for URLs, and other strings that have specific similar needs, I'd like to see that stated explicitly and a list of exactly what those needs are so rational decisions can be made, not just default due to doubt (which comes from not understanding the issue and then leads to bad decisions). |
Please don't omit the second part of that clause: "and need a string of Unicode scalar values to operate on". On the platform, the only APIs I know of that need a string of Unicode scalar values are the URL APIs. CSS APIs work fine with any sequence of code units, as they do not need to interpret them. |
Stated another way: if we were to just use DOMString (i.e., the normal JS string type) throughout the platform, then the first step of all the URL APIs would be "Let thingToProcess be the result of converting arg to a sequence of scalar values." This is because URL APIs then feed into a number of algorithms that operate on the individual scalar values (e.g., percent encoding/decoding). USVString is just a convenient way to save typing out that step over and over. For APIs who don't internally call into algorithms that operate on scalar values, but instead just operate on un-interpreted strings, you would never write that preprocessing step, with all of its associated costs. So such APIs should just use DOMStrings, and not add the additional preprocessing that USVStrings bring along. If you believe USVStrings are a good idea, then I encourage you to start by writing your specs using DOMStrings + that initial conversion step, and see if that conversion step buys you anything. Only once you've shown that it gives benefits, then you can refactor to USVStrings. That's what we did for URLs. |
This isn't completely true. In at least Servo's architecture, USVStrings are the cheaper data structure to support (if i'm recalling correctly), which is the whole reason we added CSSOMString - so impls could choose one or the other to use consistently thru CSS. (I don't have a horse in this race otherwise, I just want a clear answer.) |
Right, thus my original
|
Not so much Servo, but Rust's native |
My understanding of Servo is that it can support DOMStrings (using wtf-8), but it also has support for utf-8 based USVStrings without requiring additional processing. I believe Tab's point is that USVStrings are cheaper in Servo, not required, so this is not based on a limitation, but rather a desire to "do it right" in the API (for some value of right). So if the argument is to use DOMStrings because they cheaper to implement, we have existence proof to the contrary. Like Tab, I don't have a horse in this race, and I also just want a clear answer. If this was previously litigated, I have no desire to re-litigate, but I also want to know that the answer was based on solid principles and good information. I also don't want CSS to be inconsistent with the rest of the platform, but I do want the platform to be able to evolve and not be constrained by legacy implementations where not necessary. My understanding was that USVStrings were not simply shorthand to avoid a pre-processing step, but a step forward to be able to avoid the issues related to unpaired surrogates where not needed. If I'm wrong, fine, I'm not trying to win an argument here, just get a better understanding of the issue. While not the default, I do see USVStrings being used in other APIs for things other that URLs, so if nothing else, the guidance about when to use one vs the other needs to be improved. |
@plinss The main argument for DOMString, I believe, is user expectation. After all, the concept for Unicode scalar value is still pretty niche among web developers, and it would seem odd for someone who is not aware of the nuances between DOMString and USVString to have a string downloaded from some JSON API unexpectedly |
cc @SimonSapin. Note that IIRC the only observable difference is what happens with unpaired surrogates. |
Correct.
For what it’s worth, this is what the IETF says about unpaired surrogates in JSON: https://tools.ietf.org/html/rfc8259#section-8.2
In other words this is something that happens accidentally, and some other systems already go wrong when it does. It is already an error case, not meaningful content. I think it is not useful to require new APIs to preserve these broken strings unchanged. (I’d argue that even doing so for existing |
I don't think CSS should be mangling strings given to it by other parts of the platform, regardless of what any IETF RFCs say. As stated above, web APIs should only be using USVString as a shortcut for using normal DOMStrings plus explicitly performing a conversion to scalar values. |
The Working Group just discussed
The full IRC log of that discussion<dael> Topic: Which string type? - DOMString, USVString, or CSSOMString?<TabAtkins> Github: https://github.com//issues/687 <dael> TabAtkins: Intro: I need to know what type of string to use. Everyone familiar with these three? <dael> TabAtkins: DOM is just JS strings. USV string is that but you can't write unpaired surrigates. Only actual scalar values. CSSOM string is one or the other and the UA has to choose. <dael> gsnedders: And it has to be consistant. <dael> TabAtkins: The arguments. USV string, it's an actual string and DOM allows nonsense. But DOM is exactly what JS uses. Some browsers can naively handle a scalar value faster and cheaper under the hood. Servo does that. <dael> TabAtkins: Earlier we as do USV and for browsers not doing that have a long. Dominic came in and said don't do that, we use DOM for everything except those that require scalar values. <dael> TabAtkins: There was an argument on this on github between Dominic and plinss. [scrolls through github] <dbaron> https://github.com/whatwg/webidl/issues/84 is an open issue about whether the WebIDL spec is providing the right advice there <dael> TabAtkins: So, do we wait for TAG? <dael> plinss: We discussed in Tokyo. We dont have solid advice yet. Ther'es a plan to get JS people and webIDL people together, but no answer yet. <dbaron> https://github.com/w3ctag/design-principles/issues/93 is the TAG issue on this <dael> TabAtkins: For the spec, I can put an inline in the spec saying it's under contention. <dael> Rossen_: You're saying it's one or the other. <dael> Rossen_: Right now it's CSS string and if the tag narrows down we'll align. <dael> TabAtkins: I can go with that. Change everything to cssom string. <dael> Rossen_: Yeah, it's pretty much what we do. <dael> astearns: But with the inline note saying we're hoping less vague. <dael> TabAtkins: Prop: spec goes with CSSOM String with an issue saying this is discussed by TAG and should resolve in the future <dael> Rossen_: Obj or opinions? <dael> plinss: Anyone with good info on this issue to help TAG it would be good. <dael> emilio: If we change to DOM String it makes the private style system build a non-strandard string which is annoying. <dael> fremy: It would be a mess, switching to surrogate pairs. <dael> RESOLVED: spec goes with CSSOM String with an issue saying this is discussed by TAG and should resolve in the future |
By the way Firefox has been shipping since 57 (last November) with effectively |
We should switch from DOMString to CSSOMString as resolved in CSSWG resolved to use CSSOMStrings[1]. and there is idl issue when union type have set of typedef value. so this case did not change and will follow up (https://crbug.com/838890) [1]w3c/css-houdini-drafts#687 (comment) Bug: 834164 Change-Id: Id4768a6edbcdc17bfa72cffd4a63c81d8124d256 Reviewed-on: https://chromium-review.googlesource.com/1034618 Commit-Queue: Hwanseung Lee <[email protected]> Reviewed-by: Darren Shen <[email protected]> Reviewed-by: Kentaro Hara <[email protected]> Reviewed-by: Yuki Shiino <[email protected]> Cr-Commit-Position: refs/heads/master@{#556760}
Typed OM is currently inconsistent in its string usage. Probably thru cargo-culting, some things use DOMString, others use USVString. That's almost certainly not what we want - they should all be one or the other. Should we choose one, or use the CSSOMString typedef that other CSS APIs rely on?
The text was updated successfully, but these errors were encountered: