updated sql.wit.md, and README.md #2

danbugs · 2022-12-05T23:17:41Z

Signed-off-by: danbugs [email protected]

Signed-off-by: danbugs <[email protected]>

itowlson · 2022-12-07T23:34:17Z

@etehtsea I thought you might be interested in this - I was reminded of the comments you had in spinframework/spin#813 (comment) around the generic data types we used in Spin, and the approach for supporting different database backends. I know @danbugs is not wedded to this approach and is keen for all feedback!

itowlson · 2022-12-08T00:02:37Z

@danbugs Per our chat the other day, I have a few thoughts on this, some tactical, some more existential. Partly this may be down to scope - what does this interface aim to achieve? For example, is it a simple interface for the 80% case, that Wasm hosts can implement for simple, abstracted SQL stores? Or is it intended to cover data-intensive applications too? Is it envisioned as a target for languages and libraries to compile to (in the same way that WASI stands in for Posix)? That will affect what our answers look like.

For me, the existential question is why would anyone use a new interface from the Wasm designers rather than carrying on using their familiar database package in their preferred language? Why would a Rust developer throw away the postgres crate and learn this instead? postgres has all the specific connection knobs, uses highly optimised connections and pools, surfaces Postgres types with high fidelity... what can a generic Wasm interface offer?

I think the answer is "operators will be more willing to trust modules that say 'I want a database' than modules that say 'I want raw socket access.'" In component model terms, operators may offer a world that includes a SQL interface but not a socket interface. Perhaps the sense then is that the operator manages the database and you, as the developer, don't get to think about Postgres or MySQL or SQL Server - you just get what you're given. And if you want to party hard on a specific database engine, you use your language's provider and demand a world with sockets.

But even then... what if the operator wants to express data types that aren't in the common set? What if the operator is hosting a database with money types, or spatial types? Do we need an extension mechanism? If so, how do language-level libraries support that mechanism? Conversely, if we're creating an interface which is strictly "generic database as a service," should we also abstract concepts like transactions rather than requiring people to send engine-specific SQL text?

We talked when we met about a possible tiered approach, a low-level interface that was perhaps little more than the byte-level interface, on top of which language- and database-specific libraries could build idiomatic APIs, and a high-level one that could be common across the "I just want some SQL store" category. I'm not sure whether the low-level interface gives better guarantees than sockets but it's worth exploring.

So what I would really like to see is a conversation amongst a range of stakeholders - application developers, library developers, database vendors, operators, etc. - about the use cases we want this to hit and the use cases we don't.

In the meantime, I appreciate you getting this effort kicked off. I think this is one of the most nuanced pieces of the WASI standards jigsaw, and I'm excited to see where it takes up, and to test out the ideas through Spin. Thank you!

itowlson

Added a few more tactical comments about the specifics of the proposal.

sql.wit.md

itowlson · 2022-12-08T03:56:06Z

sql.wit.md

+interface "wasi:sql" {
+    // iterator item type
+    resource row {
+        field-name: list<string>


Interfaces like .NET IDataRecord allow runtime discovery of field types. Should we make provision for that here, rather than just the name?

Are you thinking of using that in addition to the data-type in values — perhaps, the actual field-type?

itowlson · 2022-12-08T03:58:42Z

sql.wit.md

+        float(float64),
+        str(string),
+        boolean(bool),
+        date(string),


This is interesting... are you thinking the standard would define a canonical string format and implementations would adapt however they saw the database's DATE type into that string format? (I don't have a better plan.)

That's a good point. Initially, I was thinking of just relying on the format used by the underlying database. On the other hand, it might be a good idea to define a standardized format for storing dates in the interface itself for consistency's sake — we could use ISO 8601, or smt of the sort. That said, I'm not sure. How do you feel about just relying on the underlying format?

I'm not sure what "the underlying format" is. When I ask SQL Server for a datetimeoffset column, I assume it is stored in some opaque binary format, and surfaced as a System.DateTimeOffset or chrono::DateTime or whatever type by the driver. So we would probably need to format it anyway.

I could be wrong though - could you expand on your understanding of "underlying format"?

@itowlson ~ I tried out implementing this interface for postgres, and here's what I did w/ regard to this:

"date" => { let v: String = row.get(i); let parsed = NaiveDate::parse_from_str(&v, "%Y-%m-%d").unwrap(); DataType::Date(parsed.to_string()) } "time" => { let v: String = row.get(i); let parsed = NaiveTime::parse_from_str(&v, "%H:%M:%S").unwrap(); DataType::Time(parsed.to_string()) } "timestamp" => { let v: String = row.get(i); let parsed = NaiveDateTime::parse_from_str(&v, "%Y-%m-%d %H:%M:%S").unwrap(); DataType::Timestamp(parsed.to_string()) }

sql.wit.md

etehtsea · 2022-12-08T17:24:00Z

Partly this may be down to scope - what does this interface aim to achieve? For example, is it a simple interface for the 80% case, that Wasm hosts can implement for simple, abstracted SQL stores? Or is it intended to cover data-intensive applications too? Is it envisioned as a target for languages and libraries to compile to (in the same way that WASI stands in for Posix)? That will affect what our answers look like.

I'm also interested in the Goals of this proposal. Are sql/kv/pubsub/etc proposals targeted to some simplified use-cases where the user doesn't need sophisticated functionality and needs the "any kv/sql storage functionality"?

Otherwise, I'm not sure why it has to be the common interface if the actual storage implementation might have different functionality and might require different WIT interfaces.

For example, mysql provides column nullability info in the response and pg doesn't. So you might want to encode them differently in wit. Also, db-specific types and so on.

If this is some generic interface, all of these is probably out of scope.

esoterra · 2022-12-08T18:26:48Z

Not an answer to the above questions but some additional notes:

I had been working on a PR for this repo a while back but never gotten around to submitting it.
I've included my first pass of how I thought the introduction, goals, and non-goals would look below.

Additionally, I think this proposal should support / offer statically and dynamically defined queries and did some sketching to that effect.

Introduction

Structured Query Language (SQL) databases are a key part of many application architectures including popular Multitier Architectures. Wasm components in these architectures will need the ability to query SQL databases and standardizing a WASI SQL interface will allow greater interoperability between Wasm database users and database vendors.

This proposal define a capability provider interface for SQL databases.

Goals

Enable a Wasm component to execute a query against a database.

Allow users to use common SQL dialects (e.g. MySQL and PostgreSQL).

Allow users to utilize non-standard database extensions and custom types.

Provide an interface that encourages SQL best-practices and helps users avoid SQL injection.

Non-goals

This proposal is not concerned with the configuration and initialization of arbitrary databases.

sql.wit.md

Signed-off-by: danbugs <[email protected]>

danbugs · 2022-12-16T20:00:04Z

Overall, tactical comments aside, I see discussions in two major areas:

the goal of this proposal, and
the data-types.

About (1), I'll be finishing off the README for related sections right after writing this comment and I will update the PR for review. Hopefully, that will clarify things and help w/ delivering targeted feedback. That said, the overarching goal is that provide an interface for that 80% range of user applications — with this, we want to provide a way to query a database in a way that is generic, and safe (i.e., no unknown sockets).

About (2), I think that there are four big types we are missing:

array types,
json types,
spatial types, and
large object types.

There are three ways to target them:

I can do some more research to see how widely supported these are across different flavours, and, if the answer is positive, they can just become new variant types, or
like @etehtsea suggested, we could create different variants for different flavours (although, with the upcoming goal definition, I think taking dependencies on specific dbs would be negative), and, lastly
like @itowlson suggested, we could create some sort of extension type. That said, the type would probably have to be a list<u8> under the hood, so we'd be passing the burden of deserialization to the user.

edit: formatting

Signed-off-by: danbugs <[email protected]>

sql.wit.md

chrisgacsal · 2022-12-20T10:23:13Z

sql.wit.md

+    // implementors can make use of that fact to optimize 
+    // the performance of query execution (e.g., using
+    // indexes).
+    query: func(q: statement) -> result<row, error>


I would expect the query() to be able to return multiple rows. Is there a plan to add a new resource as collection of rows?

It looks at the moment like query is returning the first row, and then you call next to get to the next row - this fits with the sample in the read-me. But then I can't see how to get to an item. @danbugs I think something might have got out of whack during the various revisions - looking at the read-me has row gotten mixed up with item in some places?

FWIW, the Spin prototype has query returning a rowset which contained a list of row, and a row contained a list of value (the field info was separate). However that doesn't handle very large datasets well (everything has to be realised into the list). This might be remedies by streams, I'm not sure; but in the meantime, a resource that fetches the next record on demand (potentially lazily) makes sense to me. Not sure if there are performance considerations relating to going across a marshalling boundary on every row though.

looking at the read-me has row gotten mixed up with item in some places?

It made me confused hence my comments.

This might be remedies by streams

I assume that stream would be the right solution here at some point, might be wrong though.

a resource that fetches the next record on demand (potentially lazily) makes sense to me

That would my expectation here as well.

I'm wondering about how to get the item as well. I also wonder, does query() block until a row is ready, or does it return immediately (possibly with an error indicating the query couldn't be run) and calling row.next() blocks until the row is ready? I'm just wondering if it will be possible to fire queries on multiple connections in the case where the queries are long running and you can allow to server to process them simultaneously.

itowlson · 2022-12-21T01:06:03Z

Bringing some lessons back from the Spin implementation - we had some good feedback from @ThorstenHans, which it would be great to consider how to incorporate into wasi-sql:

I would appreciate having:

First-class transaction support

Being able to concat (or join) multiple statements in a single query /execute call without having to go down the road and write a stored proc in MySQL

https://www.thorsten-hans.com/crud-in-webassembly-with-fermyon-spin-and-mysql/

Signed-off-by: danbugs <[email protected]>

danbugs · 2023-01-13T00:46:54Z

@itowlson, @chrisgacsal, and @kesmit13 ~ Thank you for providing your feedback on this PR! I'm sorry that it took me this long to get back to you, I was on vacation and have just now been getting caught up w/ everything (:

That said, @itowlson is right~ With all the renames that have happened, things got a bit out of whack w/ the item record and row resource. I've fixed that now by actually just dropping the iterator idea altogether and using streams instead.

For the record, I'm doing this because, while attempting to actually implement this interface, I couldn't really find a way to convert between iterator types (i.e., say, a postgres iterator and whatever we want to return to the guest) while maintaining lazy loading, which is something I don't think we coulD get around.

If you are interested in seeing this in action, you can check: deislabs/spiderlightning#305

chrisgacsal · 2023-01-18T14:17:19Z

@danbugs Thanks for the updates!
Do you plan to add sql resource and the associated open function to the interface which allows clients to request connection to database(s)? The same way as it is done in Spiderlightning?

Another question is the shall the query function return list<row> instead of stream<row> as interim solution? My understanding is that stream is still in the working and won't be available in the near future. Although I might be mistaken here.

danbugs · 2023-01-26T17:26:07Z

@chrisgacsal

Do you plan to add sql resource and the associated open function to the interface which allows clients to request connection to database(s)? The same way as it is done in Spiderlightning?

As of now, we are not planning for these interfaces to implement anything related to the control plane like connecting, and whatnot – just data plane operations.

Another question is the shall the query function return list instead of stream as interim solution? My understanding is that stream is still in the working and won't be available in the near future. Although I might be mistaken here.

Yeah, that's how I got around it in the SpiderLightning implementation – just used list<row> over stream<row>. That said, I had in mind these interfaces would be more on the greenfield side of things w/o taking dependencies on bindings. For reference, we also use streams here:

https://github.com/WebAssembly/wasi-keyvalue/blob/main/README.md.

In any case, I'll circle back on this. If we decide to proceed w/ stream<row> over list<row>, I'll merge this PR!

Signed-off-by: danbugs <[email protected]>

danbugs · 2023-01-26T18:16:43Z

~update: I've changed from stream to list!

danbugs added 5 commits December 5, 2022 14:56

commenting out wit-abi 'cause of worlds

eb9f5a3

Signed-off-by: danbugs <[email protected]>

updated README up until before TOC

d76b956

Signed-off-by: danbugs <[email protected]>

updated sql.wit.md file according to discussions w/ @itowlson

ef919c2

Signed-off-by: danbugs <[email protected]>

forgot to add title to README

4eaf3a4

Signed-off-by: danbugs <[email protected]>

added wit label on worlds code block

752f298

Signed-off-by: danbugs <[email protected]>

itowlson reviewed Dec 8, 2022

View reviewed changes

dicej reviewed Dec 15, 2022

View reviewed changes

sql.wit.md Outdated Show resolved Hide resolved

danbugs added 3 commits December 15, 2022 09:52

fixing quick typo

5196c3a

Signed-off-by: danbugs <[email protected]>

typo

57fefb5

Signed-off-by: danbugs <[email protected]>

added null type to data-types

88383c3

Signed-off-by: danbugs <[email protected]>

danbugs added 2 commits December 16, 2022 12:45

added unsigned/signed short and longs

716ccbe

Signed-off-by: danbugs <[email protected]>

typos in sql.wit.md, andmostly completed README

f224767

Signed-off-by: danbugs <[email protected]>

chrisgacsal reviewed Dec 20, 2022

View reviewed changes

fixed typo, some renames, and updated examples

df15d39

Signed-off-by: danbugs <[email protected]>

change stream to list

0e3a266

Signed-off-by: danbugs <[email protected]>

danbugs merged commit 6a844fb into WebAssembly:main Jan 30, 2023

danbugs deleted the danbugs/start-sql branch January 30, 2023 16:21

updated sql.wit.md, and README.md #2

updated sql.wit.md, and README.md #2

Uh oh!

Conversation

danbugs commented Dec 5, 2022

Uh oh!

itowlson commented Dec 7, 2022

Uh oh!

itowlson commented Dec 8, 2022

Uh oh!

itowlson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itowlson Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

etehtsea commented Dec 8, 2022

Uh oh!

esoterra commented Dec 8, 2022

Introduction

Goals

Non-goals

Uh oh!

Uh oh!

danbugs commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisgacsal Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itowlson commented Dec 21, 2022

Uh oh!

danbugs commented Jan 13, 2023

Uh oh!

chrisgacsal commented Jan 18, 2023

Uh oh!

danbugs commented Jan 26, 2023

Uh oh!

danbugs commented Jan 26, 2023

Uh oh!

Uh oh!

itowlson Dec 21, 2022 •

edited

Loading

danbugs commented Dec 16, 2022 •

edited

Loading

chrisgacsal Dec 21, 2022 •

edited

Loading