Skip to content

Use coerce instead of unsafeCoerce where appropriate #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 26, 2020
Merged

Use coerce instead of unsafeCoerce where appropriate #130

merged 7 commits into from
Nov 26, 2020

Conversation

JordanMartinez
Copy link
Contributor

Implements the suggestions made here: replace unsafeCoerce with coerce where appropriate.

@JordanMartinez
Copy link
Contributor Author

I now understand why this is tricky to implement...

@rhendric
Copy link
Member

Just curious, with safe coercion, what's the advantage of keeping the specialized toNonEmptyString etc. instead of just using coerce inline? (Not that there's much disadvantage, other than code size.)

@MonoidMusician
Copy link
Contributor

I’m not sure Safe.Coerce should be used here, since String -> NonEmptyString is not a safe coercion. (In particular, coerce is assumed to be safe in both directions, which isn’t the case here – see one of the threads on the compiler repo.)

Also I thought that it required the constructor to be imported, or did I misunderstand?

@JordanMartinez
Copy link
Contributor Author

Let me explain the nuances here.

First, you'll notice that the three functions toNonEmptyString, fromNonEmptyString, and liftS are all functions that are not exported from those two modules. These functions exist only to make implementations of other functions a bit easier.

Second, Ci doesn't build. If you look at the error message, it's because the Newtype constructor for NonEmptyString isn't imported into the module. And that's why I said above "I now understand why this is tricky to implement."

The NonEmptyString constructor can't be exported because that would ruin the whole point of that type. Thus,we also can't use the safe version of coerce within these two modules. Since these three functions are used in the implementations of the two modules' functions, the only way to switch these three functions to coerce would be to move both modules' code into Data.NonEmptyString.Internal, which defeats the purpose. We have both a CodeUnits and CodePoints module to distinguish between the two. If we move the code into the Internal module, we would have to prefix all function calls with cu or cp to distinguish between which its ultimately operating on.

So, I'm not sure this is really worth it at the end of the day.

@hdgarrood
Copy link
Contributor

These functions aren’t affected by whether or not we happen to be thinking about code points or code units at the point they’re used, so I don’t think they need to be duplicated. Why not move them into Internal and re-export them as necessary so that they’re visible from all the places they’re currently visible now? There’s no need to add prefixes because they’re the same function.

@hdgarrood
Copy link
Contributor

To clarify: not every string function needs to choose whether to behave in terms of code units or code points, so not everything needs to be duplicated. For example, all of the functions in Data.String.Common fit into this category and so they are all re-exported from both Data.String.CodePoints and Data.String.CodeUnits.

@JordanMartinez
Copy link
Contributor Author

Why not move them into Internal and re-export them as necessary so that they’re visible from all the places they’re currently visible now? There’s no need to add prefixes because they’re the same function.

Which functions are you referring to? I'm assuming it's not toNonEmptyString, fromNonEmptyString, and liftS because moving those to Internal and reexporting them would break the guarantees of NonEmptyString.

If you are talking about functions like toCharArray, charAt, and uncons, then which can be deduplicated because they ultimately do the same thing whether one uses CodeUnits or CodePoints, and which cannot?

For example, uncons is implemented differently in both modules:

The CodeUnits version of uncons is:

uncons :: NonEmptyString -> { head :: Char, tail :: Maybe NonEmptyString }
uncons nes = 
  let s = fromNonEmptyString nes 
  in { head: U.charAt 0 s
     , tail: fromString (CU.drop 1 s) 
     }

The CodePoints version of uncons is:

uncons :: NonEmptyString -> { head :: Char, tail :: Maybe NonEmptyString }
uncons nes = 
  let s = fromNonEmptyString nes 
  in { head: unsafePartial fromJust (CP.codePointAt 0 s)
     , tail: fromString (CP.drop 1 s) 
     }

Will they produce the same content and so this function can be deduplicated? Or do they produce something different? I assume the latter. If it's moved into Internal because it uses fromNonEmptyString, then we'd need to distinguish one from the other somehow.

@hdgarrood
Copy link
Contributor

I am just talking about toNonEmptyString, fromNonEmptyString, and liftS. Exporting them from Data.String.NonEmpty.Internal doesn’t violate any guarantees, because importing the Internal module already gives you access to the constructor. The idea is that you can import the Internal module if you want but it’s now your responsibility to ensure that the invariants are upheld. You only get guarantees if you avoid importing any Internal modules. Note that the Internal module already exports the NonEmptyString constructor.

@hdgarrood
Copy link
Contributor

You’re right that we definitely don’t want to re-export these functions - their only use is in defining the other functions in Data.String.NonEmpty.{CodePoints,CodeUnits}, right?

@hdgarrood
Copy link
Contributor

Oh, looking back at what I said before I don’t know why I said to re-export them. We can just move these functions into the Internal module and only import them directly from that module where they’re needed.

@JordanMartinez
Copy link
Contributor Author

Note that the Internal module already exports the NonEmptyString constructor.

Oh! It does...

The idea is that you can import the Internal module if you want but it’s now your responsibility to ensure that the invariants are upheld. You only get guarantees if you avoid importing any Internal modules.

This seems like one of those Haskell "idioms" that never made sense to me.

If we refactored this code, so that the NonEmptyString constructor wasn't exported, would you be in agreement with that? Or do you think this escape hatch should continue to exist?

@hdgarrood
Copy link
Contributor

I’m not sure it makes a huge difference really, since if you’re determined you can always use unsafeCoerce. I would be tempted to leave things as they are though - using the actual constructor is definitely safer than using unsafeCoerce if people do want an escape hatch, and personally I don’t think the change is beneficial enough to justify rearranging things (or to justify holding 0.14 up any more).

I think the benefit of providing an Internal module is clear once you’ve been in a situation where you can’t implement something you need without access to it. We use Internal modules from the Haskell Esqueleto package all the time in production at Lumi, because Esqueleto’s safe public API isn’t yet flexible enough to cover all our needs.

@JordanMartinez
Copy link
Contributor Author

I've updated the code to use the constructor directly.

I think the benefit of providing an Internal module is clear once you’ve been in a situation where you can’t implement something you need without access to it. We use Internal modules from the Haskell Esqueleto package all the time in production at Lumi, because Esqueleto’s safe public API isn’t yet flexible enough to cover all our needs.

That makes sense in that situation. Is this repo's code mature enough to not fall into that category though? If so, then I think a breaking change fixing that should be done in the future, but not right now due to the v0.14.0 delay.

@hdgarrood
Copy link
Contributor

I don't think it's really a question of maturity: just because a library has been in use in production for a few years, it doesn't mean that there aren't still situations where it doesn't quite fill a gap that it should - it could just be that nobody has tried to use it in a particular context yet. I'm against removing the Internal module later: I don't think it presents that much of a risk (since you have to opt in to using it) and I think we need the Internal module here because both Data.String.NonEmpty.CodeUnits and Data.String.NonEmpty.CodePoints need to be able to "unsafely" construct NonEmptyString values.

If you're worried about things like IDE tooling importing things from Internal modules when they could have been imported from non-Internal modules, then I think that's something we should address in the tooling. Perhaps we can modify the relevant tooling/plugins so that they don't list Internal modules by default, or so that they give you a warning when they add an import from an Internal module, or something like that.

@thomashoneyman
Copy link
Member

(Not to bring the discussion off topic, but I would like tooling to demote internal modules. I import them by accident much more than I’d like.)

@JordanMartinez
Copy link
Contributor Author

think we need the Internal module here because both Data.String.NonEmpty.CodeUnits and Data.String.NonEmpty.CodePoints need to be able to "unsafely" construct NonEmptyString values.

Yeah, I get that. However, the Internal module seems like a workaround to a different problem: that one module cannot access "private" parts of another module whereas all other modules cannot access those same "private" parts.

I get that people want the strong guarantees of a newtype. I also get that such a newtype's guarantees can be broken via unsafeCoerce and that there are times where one should be able to do that. I think breaking the guarantees should be more explicit via unsafeCoerce than using the NonEmptyString constructor, something that can be overlooked or (as Thomas pointed out) unknowingly imported.

Here's my order of preference for solving this problem:

  1. Implement a better module system (hard and not an easy fix, especially when circular dependencies cause issues)
  2. Stop exporting the NonEmptyString constructor, inline both CodeUnits and CodePoints into Internal with prefixed names, and update CodeUnits and CodePoints to use those prefixed names' functions as their implementations to reduce breakage. (tedious but possible; possible unneeded code bloat)
  3. In tooling, demote all modules that end in Internal. When importing functions, these modules appear at the bottom of the list to reduce the chance of them being imported accidentally.
  4. Update documentation on the Internal module and NonEmptyString type.

I can be happy with the fourth option since v0.14.0 is more important than this and I don't think this is a big deal. I also realize that the second option isn't ideal.

@hdgarrood
Copy link
Contributor

FWIW I'm unlikely to be on board with changes to the module system which would enable us to expose parts of a module but only to certain other modules; I think it would complicate the module system unacceptably. I also expect it would have a pretty poor effort to payoff ratio. I think option 4 is the way to go for now.

@JordanMartinez
Copy link
Contributor Author

I think option 4 is the way to go for now.

Works for me. Due to previous experience, I'm not going to immediately push what I think the docs should say. Rather, I will present my rough draft with the understanding that we'll iterate on it here. Once we get a final version, I'll push that as a commit to this PR.

I don't think Internal modules should be explained in these docs. I've opened an issue in my learning repo to cover this in more detail as I think that's a better location for that (JordanMartinez/purescript-jordans-reference#527).

Data.String.NonEmpty.Internal Module docs:

While most of the code in this module is safe, this module does export a few partial functions and the NonEmptyString constructor. While the partial functions are obvious from the Partial constraint in their type signature, the NonEmptyString constructor can be overlooked when searching for issues in one's code. See the constructor's documentation for more information.

NonEmptyString constructor docs:

You can use this constructor to create a NonEmptyString that isn't non-empty, breaking the guarantee behind this newtype. It is provided as an escape hatch mainly for the Data.NonEmpty.CodeUnits and Data.NonEmpty.CodePoints modules. Usage of this constructor in your own code provides the same safety guarantees of unsafeCoerce, that is, none at all. Use this at your own risk when you know what you are doing.

@hdgarrood
Copy link
Contributor

That all sounds great to me 👍

@JordanMartinez
Copy link
Contributor Author

I added docs to the constructor, but it's not showing up on the generated docs. Did I do it wrong? Or is this a bug in the compiler?

data-string-nonempty-internal docs

-- | non-empty, breaking the guarantee behind this newtype. It is
-- | provided as an escape hatch mainly for the `Data.NonEmpty.CodeUnits`
-- | and `Data.NonEmpty.CodePoints` modules. Usage of this constructor
-- | in your own code provides the same safety guarantees of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite true; the NonEmptyString constructor at least requires that you provide a string, whereas unsafeCoerce doesn't. That is, you can do unsafeCoerce 3 :: NonEmptyString, whereas you can't do NonEmptyString 3 :: NonEmptyString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. No worries. What are you suggestions for how to fix it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can’t we just remove this sentence? The first one already mention that usage of this constructor breaks the newtype guarantees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds good to me.

@hdgarrood
Copy link
Contributor

I guess documenting individual constructors just hasn't been implemented yet.

@JordanMartinez
Copy link
Contributor Author

Should I move the docs to the type then? Not quite ideal as it seems to bloat the type's docs, but better than nothing...?

@kl0tl
Copy link
Member

kl0tl commented Nov 19, 2020

I did the following in purescript/purescript-arrays#184:

Capture d’écran 2020-11-19 à 23 52 42

@@ -13,7 +19,15 @@ import Prim.TypeError as TE
import Unsafe.Coerce (unsafeCoerce)

-- | A string that is known not to be empty.
newtype NonEmptyString = NonEmptyString String
newtype NonEmptyString =
-- | You can use this constructor to create a NonEmptyString that isn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*NonEmptyString (with backticks)?

@JordanMartinez
Copy link
Contributor Author

Thanks for the feedback. I've pushed all changes mentioned.

@JordanMartinez JordanMartinez merged commit e897bb7 into purescript:master Nov 26, 2020
@JordanMartinez JordanMartinez deleted the addKl0tliSuggestions branch November 26, 2020 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants