Skip to content

Compatibility with tokio #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sunshowers opened this issue May 7, 2017 · 10 comments
Open

Compatibility with tokio #316

sunshowers opened this issue May 7, 2017 · 10 comments

Comments

@sunshowers
Copy link

sunshowers commented May 7, 2017

Hey :)

So serde_json's serializer and deserializer support Write and Read respectively, but unfortunately they aren't compatible with tokio.

The basic rule of I/O with tokio (I should contribute some docs somewhere) is that any downstream I/O operation can fail with WouldBlock, which means "come back later". So any operations must be retryable. That isn't the case with serde_json right now, though. For example:

  • In deserialize_enum, remaining_depth will not be incremented again if visit_enum fails. To avoid general issues of this kind, an RAII-style lock/guard pattern might help.
  • The serializer heavily uses write_all, which is incompatible with tokio. Instead, you'd need to use an internal buffer.

I'd personally say just documenting the incompatibility is fine for now.

I'll soon be open sourcing a library to help out with testing I/O operations, so hopefully you'll find that of use here :)

@sunshowers
Copy link
Author

The library is now available as partial-io on crates.io, (repo at https://github.com/facebookincubator/rust-partial-io). You could use quickcheck to do roundtrip tests and hopefully find bugs pretty easily.

@KodrAus
Copy link

KodrAus commented Jul 26, 2017

This coupled with any improvements we can make to from_reader would make me very happy.

Do you have any thoughts on this @dtolnay? I wouldn't mind spending some time in the near-future (vague GitHub commitment) experimenting with this.

@dtolnay
Copy link
Member

dtolnay commented Jul 26, 2017

My impression was that for streaming serialization and deserialization use cases you typically want the data to be framed, as with tokio-serde-json. You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency. Can you explain more about the use case where you care about this or #160?

I might approach this by building a resumable serde_json::Value parser. Resumable meaning it can make some progress, hit a WouldBlock, and resume later from the previous incomplete state. I don't see us solving the general case of resumable parser for an arbitrary implementation of Deserialize. But going through serde_json::Value, most of the branching and processing (interpreting string escape sequences, decimal-to-binary conversion of numbers) can happen while waiting on I/O and later turning Value into T will be fast.

@KodrAus
Copy link

KodrAus commented Jul 26, 2017

Sorry @dtolnay I may have overstated my 'need' for this. It's more something I'd be interested to explore. I found my way here while playing with async response bodies in reqwest. Since the response can return WouldBlock if a chunk hasn't been received, we need to buffer it completely before handing over to serde_json. Chunks don't necessarily correspond to a frame in a codec. There are strategies around that with the current status-quo, like collecting the chunks first without buffering, or just accepting the copy which is probably going to be faster anyways.

But if serde_json did support non-blocking reads in some way then I'd like to see what that would mean for general response deserialisation, since it probably wouldn't result in more code than buffering first on my end. Having a resumable Value parser only sounds reasonable to me.

So that's a bit of a flimsy motivation, but I'd like to play with it.

@dtolnay
Copy link
Member

dtolnay commented Jul 1, 2019

Triage: I still think a resumable/nonblocking way to deserialize specifically a serde_json::Value is the way to go; we don't need to solve the much bigger challenge of resuming an arbitrary implementation of Deserialize. I would be interested to have an implementation of this that we can benchmark.

Since I haven't worked with tokio myself -- what's the most obvious API that would support nonblocking deserialize of Value from a byte stream? If one of you could provide a sketch of the public API here, I'll file the idea in https://github.com/dtolnay/request-for-implementation (or you can file it directly).

@KodrAus
Copy link

KodrAus commented Jul 1, 2019

@dtolnay it’s been a while since I’ve thought about this 🙂 So will post on the tokio dev channels, give it some thought and circle back here, unless somebody else beats me to it.

@kazigk
Copy link

kazigk commented Dec 14, 2020

@KodrAus I don't want to beat you, I just wanna say it's still a wanted feature 😉

@jonhoo
Copy link

jonhoo commented Feb 27, 2021

I feel like we might be able to leverage async/await to deal with the progress tracking/continuation state rather than make the deserializer support re-entry. That would also allow arbitrary Deserialize implementations. Here's what I'm thinking:

  • Copy de::Deserializer to a new type de::AsyncDeserializer.
  • Specialize it to de::IoRead by replacing any uses of de::Read with the concrete implementation in IoRead. The whole type should be generic over AsyncRead.
  • Replace any blocking I/O calls with the equivalent AsyncReadExt call, await it, and mark the containing function async.
  • Follow the compiler errors up the stack, adding .await and making methods async as you go.
  • Find any bits between de::Deserializer and de::AsyncDeserializer that are still the same and factor them out.
  • Profit?

@xamgore
Copy link

xamgore commented Jul 19, 2021

You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency.

@dtolnay in situations when JSON doesn't fit into RAM, and you fetch it in an async manner via streams, what is the expected out-of-the-box solution then?

@silence-coding

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants