Skip to content

Suggestion: allow slice (length s) (length s) on Strings #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Quelklef opened this issue Feb 13, 2021 · 15 comments · Fixed by #145
Closed

Suggestion: allow slice (length s) (length s) on Strings #143

Quelklef opened this issue Feb 13, 2021 · 15 comments · Fixed by #145
Labels
purs-0.15 A reminder to address this issue or merge this PR before we release PureScript v0.15.0 type: breaking change A change that requires a major version bump.

Comments

@Quelklef
Copy link
Contributor

Quelklef commented Feb 13, 2021

Currently, Data.String.CodeUnits.slice returns Nothing on slices from the final index to the final index, e.g.

slice 4 4 "test" == Nothing

I find this behavior to be undesirable. It:

  • Means that "out of bounds" has different meanings for the two bounds, which I personally find to be unintuitive.
  • Nullifies the invarant that for each i from 0 through length s, we have s == slice 0 i s <> slice i (length s). Instead, this only holds up to length s - 1.

Motivating example: I was dealing with string containing two parts separated by a delimiter, like prefix:::suffix. I had a function to separate the two parts, getParts :: String -> Maybe (String /\ String), which was implemented via indexOf and split. With the current split implementation, this function has that getParts ":::suffix" == Just ("" /\ "suffix") but getParts "prefix:::" == Nothing, which is surprising.

Happy to make an MR if it's decided that this is reasonable. I suppose it's technically a reverse-incompatible change, though.

@JordanMartinez
Copy link
Contributor

One downside to accepting this PR is that the changes would be silent. What returned Nothing would now return Just a.

However, I don't think this design was a mistake. This is what I get when I run this in my browser console:

> "1".slice(1, 1);
""

@Quelklef
Copy link
Contributor Author

Quelklef commented Feb 18, 2021

I don't quite understand your comment.

By silent you mean that it could silently break existing code? That's a fair point; it would be a reverse-incompatible change, and I understand if it's rejected for that reason.

Yeah, "1".slice(1, 1) is "". But I don't understand the implication of that. Is your thinking that "" should become Nothing? In that case, slice 1 1 "abc" should be Nothing, but it's Just "".

@JordanMartinez
Copy link
Contributor

Silent in the sense that the compiler doesn't issue an error or otherwise notify you that this function no longer works the same as it did before.

Ah, I didn't know that slice 1 1 "abc" returns back Just "". I guess it only returns Nothing when an error would otherwise be thrown. Then, I do think there is weight behind the point you're making.

@Quelklef
Copy link
Contributor Author

Quelklef commented Feb 18, 2021

Yeah, that's totally a valid concern. A fair option would be to provided a modified slice' function and a big bold note in the docs about the (subtle) difference.

JS slice actually never throws (except for on type errors). Try, for instance, "".slice(20, 25). The PS slice function returns Nothing when the indices are considered invalid, such as being "out of bounds"; this ticket is me arguing that the definition of "out of bounds" is bad on a particular edge case.

@garyb
Copy link
Member

garyb commented Feb 18, 2021

I don't really have a good answer to deal with the potential breakage here, but I'd agree that this being "out of bounds" is a mistake. An unfortunate legacy I introduced from the early days of this library... I think there may be some older issues discussing it previously where I argued against it, but I think I was wrong to do so now. Especially since the advent of NonEmptyString, since the situations I had in mind would be better expressed in terms of that.

One way to make it "obviously breaking" would be to remove the Maybe and to no longer track bounds, just have empty strings come back for all those situations, like it does in JS. Similarly, the somewhat related take and drop don't do bounds checking. I don't know... there probably is an occasional use case for having an empty string result allowable but catching out of bounds slices, but I don't know how common that would be.

@MonoidMusician
Copy link
Contributor

I had a similar complaint with a different function back in #78

@garyb
Copy link
Member

garyb commented Feb 18, 2021

Thanks for the link! I guess I ducked that conversation 😉

@JordanMartinez
Copy link
Contributor

I think the easiest fix is to add slice' with the correct implementation and then do a deprecation cycle in future breaking changes.

Thoughts?

@hdgarrood
Copy link
Contributor

Very 👎 for adding a slice' alongside the current one. As I said in #78:

The way I see it, the most important thing is that the API exported by this package is coherent and consistent

This is probably one of the most important libraries in core, and so I think it's vital that we take a lot of care over the API surface we expose. We definitely shouldn't just do what's easiest; having a slice and a slice' alongside each other would offer a poor experience as you'd have to decide which one you want to use each time - that's a decision we should be making rather than offloading to users. As a user of this library, you'd have to understand what both variants do if you work in different projects where some prefer one variant and some prefer the other.

I think the conclusion we reached in #78 was the correct one (i.e. that not returning Nothing for out of bounds is best) but I have one regret in how we made that change, in that we didn't go over the whole API to ensure that our approach to out-of-bounds cases for functions that do slicing is consistent throughout. If we had, we probably would have made this change then as well. If/when we do this I think it's crucial that we go over the whole API to make sure that everything is consistent in how it handles out-of-bounds cases, or at least as consistent as we can reasonably make it.

@Quelklef
Copy link
Contributor Author

How about:

  1. Creating a new ticket for homogenizing all string-and-index operations to behave consistently, allowing out-of-bounds indices as per splitAt without Maybe? #78; and then
  2. Creating a new ticket to re-add sliceMaybe, takeMaybe, dropMaybe, so we're still supporting these use-cases?

@kl0tl kl0tl mentioned this issue Feb 18, 2021
6 tasks
@hdgarrood
Copy link
Contributor

I think we should consider this ticket as your point 1, and I think we shouldn't do your point 2; it's not compatible with what I was saying:

having a slice and a slice' alongside each other would offer a poor experience as you'd have to decide which one you want to use each time

The situation is marginally improved by having better names but it's still very much not ideal in my mind.

@Quelklef
Copy link
Contributor Author

If you're intent on having one true slice, then I think it actually makes more sense to use the strict version. The non-strict version can trivially be recovered with >>> fromMaybe "", but going the other way around is more difficult.

@hdgarrood
Copy link
Contributor

The reason I'm unconvinced by Maybes is that we've had a few discussions about out-of-bounds slicing over the years and, as far as I know, not once has anyone been able to give a persuasive concrete example of where the Maybe behaviour is useful.

@garyb
Copy link
Member

garyb commented Feb 18, 2021

The reason I'm unconvinced by Maybes is that we've had a few discussions about out-of-bounds slicing over the years and, as far as I know, not once has anyone been able to give a persuasive concrete example of where the Maybe behaviour is useful.

Yeah, that's why I changed my mind on it too. I'm pretty sure any case where it would have been useful will be done under NonEmptyString instead.

@Quelklef
Copy link
Contributor Author

Quelklef commented Feb 19, 2021

Makes sense. (I feel like I have in the past had use-cases for Maybe, but I cannot think of anything concrete)

On the topic of consistency and cohesion, should the normalization of negative indices even be performed? Similar functions (splitAt, charAt) don't seem to do such normalization. Besides, at least to my intuition, it's only useful for syntactic convenience and otherwise may cause subtle bugs.

Regardless, I've created a WIP PR for this ticket.

@JordanMartinez JordanMartinez added purs-0.15 A reminder to address this issue or merge this PR before we release PureScript v0.15.0 type: breaking change A change that requires a major version bump. labels Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
purs-0.15 A reminder to address this issue or merge this PR before we release PureScript v0.15.0 type: breaking change A change that requires a major version bump.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants