Skip to content

updated sql.wit.md, and README.md #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 30, 2023

Conversation

danbugs
Copy link
Collaborator

@danbugs danbugs commented Dec 5, 2022

Signed-off-by: danbugs [email protected]

@itowlson
Copy link

itowlson commented Dec 7, 2022

@etehtsea I thought you might be interested in this - I was reminded of the comments you had in spinframework/spin#813 (comment) around the generic data types we used in Spin, and the approach for supporting different database backends. I know @danbugs is not wedded to this approach and is keen for all feedback!

@itowlson
Copy link

itowlson commented Dec 8, 2022

@danbugs Per our chat the other day, I have a few thoughts on this, some tactical, some more existential. Partly this may be down to scope - what does this interface aim to achieve? For example, is it a simple interface for the 80% case, that Wasm hosts can implement for simple, abstracted SQL stores? Or is it intended to cover data-intensive applications too? Is it envisioned as a target for languages and libraries to compile to (in the same way that WASI stands in for Posix)? That will affect what our answers look like.

For me, the existential question is why would anyone use a new interface from the Wasm designers rather than carrying on using their familiar database package in their preferred language? Why would a Rust developer throw away the postgres crate and learn this instead? postgres has all the specific connection knobs, uses highly optimised connections and pools, surfaces Postgres types with high fidelity... what can a generic Wasm interface offer?

I think the answer is "operators will be more willing to trust modules that say 'I want a database' than modules that say 'I want raw socket access.'" In component model terms, operators may offer a world that includes a SQL interface but not a socket interface. Perhaps the sense then is that the operator manages the database and you, as the developer, don't get to think about Postgres or MySQL or SQL Server - you just get what you're given. And if you want to party hard on a specific database engine, you use your language's provider and demand a world with sockets.

But even then... what if the operator wants to express data types that aren't in the common set? What if the operator is hosting a database with money types, or spatial types? Do we need an extension mechanism? If so, how do language-level libraries support that mechanism? Conversely, if we're creating an interface which is strictly "generic database as a service," should we also abstract concepts like transactions rather than requiring people to send engine-specific SQL text?

We talked when we met about a possible tiered approach, a low-level interface that was perhaps little more than the byte-level interface, on top of which language- and database-specific libraries could build idiomatic APIs, and a high-level one that could be common across the "I just want some SQL store" category. I'm not sure whether the low-level interface gives better guarantees than sockets but it's worth exploring.

So what I would really like to see is a conversation amongst a range of stakeholders - application developers, library developers, database vendors, operators, etc. - about the use cases we want this to hit and the use cases we don't.

In the meantime, I appreciate you getting this effort kicked off. I think this is one of the most nuanced pieces of the WASI standards jigsaw, and I'm excited to see where it takes up, and to test out the ideas through Spin. Thank you!

Copy link

@itowlson itowlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more tactical comments about the specifics of the proposal.

sql.wit.md Outdated
interface "wasi:sql" {
// iterator item type
resource row {
field-name: list<string>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces like .NET IDataRecord allow runtime discovery of field types. Should we make provision for that here, rather than just the name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking of using that in addition to the data-type in values — perhaps, the actual field-type?

float(float64),
str(string),
boolean(bool),
date(string),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... are you thinking the standard would define a canonical string format and implementations would adapt however they saw the database's DATE type into that string format? (I don't have a better plan.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Initially, I was thinking of just relying on the format used by the underlying database. On the other hand, it might be a good idea to define a standardized format for storing dates in the interface itself for consistency's sake — we could use ISO 8601, or smt of the sort. That said, I'm not sure. How do you feel about just relying on the underlying format?

Copy link

@itowlson itowlson Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "the underlying format" is. When I ask SQL Server for a datetimeoffset column, I assume it is stored in some opaque binary format, and surfaced as a System.DateTimeOffset or chrono::DateTime or whatever type by the driver. So we would probably need to format it anyway.

I could be wrong though - could you expand on your understanding of "underlying format"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itowlson ~ I tried out implementing this interface for postgres, and here's what I did w/ regard to this:

                        "date" => {
                            let v: String = row.get(i);
                            let parsed = NaiveDate::parse_from_str(&v, "%Y-%m-%d").unwrap();
                            DataType::Date(parsed.to_string())
                        }
                        "time" => {
                            let v: String = row.get(i);
                            let parsed = NaiveTime::parse_from_str(&v, "%H:%M:%S").unwrap();
                            DataType::Time(parsed.to_string())
                        }
                        "timestamp" => {
                            let v: String = row.get(i);
                            let parsed =
                                NaiveDateTime::parse_from_str(&v, "%Y-%m-%d %H:%M:%S").unwrap();
                            DataType::Timestamp(parsed.to_string())
                        }

@etehtsea
Copy link

etehtsea commented Dec 8, 2022

Partly this may be down to scope - what does this interface aim to achieve? For example, is it a simple interface for the 80% case, that Wasm hosts can implement for simple, abstracted SQL stores? Or is it intended to cover data-intensive applications too? Is it envisioned as a target for languages and libraries to compile to (in the same way that WASI stands in for Posix)? That will affect what our answers look like.

I'm also interested in the Goals of this proposal. Are sql/kv/pubsub/etc proposals targeted to some simplified use-cases where the user doesn't need sophisticated functionality and needs the "any kv/sql storage functionality"?

Otherwise, I'm not sure why it has to be the common interface if the actual storage implementation might have different functionality and might require different WIT interfaces.

For example, mysql provides column nullability info in the response and pg doesn't. So you might want to encode them differently in wit. Also, db-specific types and so on.

If this is some generic interface, all of these is probably out of scope.

@esoterra
Copy link

esoterra commented Dec 8, 2022

Not an answer to the above questions but some additional notes:

I had been working on a PR for this repo a while back but never gotten around to submitting it.
I've included my first pass of how I thought the introduction, goals, and non-goals would look below.

Additionally, I think this proposal should support / offer statically and dynamically defined queries and did some sketching to that effect.

Introduction

Structured Query Language (SQL) databases are a key part of many application architectures including popular Multitier Architectures. Wasm components in these architectures will need the ability to query SQL databases and standardizing a WASI SQL interface will allow greater interoperability between Wasm database users and database vendors.

This proposal define a capability provider interface for SQL databases.

Goals

  1. Enable a Wasm component to execute a query against a database.
  2. Allow users to use common SQL dialects (e.g. MySQL and PostgreSQL).
  3. Allow users to utilize non-standard database extensions and custom types.
  4. Provide an interface that encourages SQL best-practices and helps users avoid SQL injection.

Non-goals

  1. This proposal is not concerned with the configuration and initialization of arbitrary databases.

Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
@danbugs
Copy link
Collaborator Author

danbugs commented Dec 16, 2022

Overall, tactical comments aside, I see discussions in two major areas:

  1. the goal of this proposal, and
  2. the data-types.

About (1), I'll be finishing off the README for related sections right after writing this comment and I will update the PR for review. Hopefully, that will clarify things and help w/ delivering targeted feedback. That said, the overarching goal is that provide an interface for that 80% range of user applications — with this, we want to provide a way to query a database in a way that is generic, and safe (i.e., no unknown sockets).

About (2), I think that there are four big types we are missing:

  1. array types,
  2. json types,
  3. spatial types, and
  4. large object types.

There are three ways to target them:

  1. I can do some more research to see how widely supported these are across different flavours, and, if the answer is positive, they can just become new variant types, or
  2. like @etehtsea suggested, we could create different variants for different flavours (although, with the upcoming goal definition, I think taking dependencies on specific dbs would be negative), and, lastly
  3. like @itowlson suggested, we could create some sort of extension type. That said, the type would probably have to be a list<u8> under the hood, so we'd be passing the burden of deserialization to the user.

edit: formatting

sql.wit.md Outdated
// implementors can make use of that fact to optimize
// the performance of query execution (e.g., using
// indexes).
query: func(q: statement) -> result<row, error>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the query() to be able to return multiple rows. Is there a plan to add a new resource as collection of rows?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks at the moment like query is returning the first row, and then you call next to get to the next row - this fits with the sample in the read-me. But then I can't see how to get to an item. @danbugs I think something might have got out of whack during the various revisions - looking at the read-me has row gotten mixed up with item in some places?

FWIW, the Spin prototype has query returning a rowset which contained a list of row, and a row contained a list of value (the field info was separate). However that doesn't handle very large datasets well (everything has to be realised into the list). This might be remedies by streams, I'm not sure; but in the meantime, a resource that fetches the next record on demand (potentially lazily) makes sense to me. Not sure if there are performance considerations relating to going across a marshalling boundary on every row though.

Copy link

@chrisgacsal chrisgacsal Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the read-me has row gotten mixed up with item in some places?

It made me confused hence my comments.

This might be remedies by streams

I assume that stream would be the right solution here at some point, might be wrong though.

a resource that fetches the next record on demand (potentially lazily) makes sense to me

That would my expectation here as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about how to get the item as well. I also wonder, does query() block until a row is ready, or does it return immediately (possibly with an error indicating the query couldn't be run) and calling row.next() blocks until the row is ready? I'm just wondering if it will be possible to fire queries on multiple connections in the case where the queries are long running and you can allow to server to process them simultaneously.

@itowlson
Copy link

Bringing some lessons back from the Spin implementation - we had some good feedback from @ThorstenHans, which it would be great to consider how to incorporate into wasi-sql:

I would appreciate having:

  • First-class transaction support
  • Being able to concat (or join) multiple statements in a single query /execute call without having to go down the road and write a stored proc in MySQL

https://www.thorsten-hans.com/crud-in-webassembly-with-fermyon-spin-and-mysql/

@danbugs
Copy link
Collaborator Author

danbugs commented Jan 13, 2023

@itowlson, @chrisgacsal, and @kesmit13 ~ Thank you for providing your feedback on this PR! I'm sorry that it took me this long to get back to you, I was on vacation and have just now been getting caught up w/ everything (:

That said, @itowlson is right~ With all the renames that have happened, things got a bit out of whack w/ the item record and row resource. I've fixed that now by actually just dropping the iterator idea altogether and using streams instead.

For the record, I'm doing this because, while attempting to actually implement this interface, I couldn't really find a way to convert between iterator types (i.e., say, a postgres iterator and whatever we want to return to the guest) while maintaining lazy loading, which is something I don't think we coulD get around.

If you are interested in seeing this in action, you can check: deislabs/spiderlightning#305

@chrisgacsal
Copy link

@danbugs Thanks for the updates!
Do you plan to add sql resource and the associated open function to the interface which allows clients to request connection to database(s)? The same way as it is done in Spiderlightning?

Another question is the shall the query function return list<row> instead of stream<row> as interim solution? My understanding is that stream is still in the working and won't be available in the near future. Although I might be mistaken here.

@danbugs
Copy link
Collaborator Author

danbugs commented Jan 26, 2023

@chrisgacsal

Do you plan to add sql resource and the associated open function to the interface which allows clients to request connection to database(s)? The same way as it is done in Spiderlightning?

As of now, we are not planning for these interfaces to implement anything related to the control plane like connecting, and whatnot – just data plane operations.

Another question is the shall the query function return list instead of stream as interim solution? My understanding is that stream is still in the working and won't be available in the near future. Although I might be mistaken here.

Yeah, that's how I got around it in the SpiderLightning implementation – just used list<row> over stream<row>. That said, I had in mind these interfaces would be more on the greenfield side of things w/o taking dependencies on bindings. For reference, we also use streams here:

In any case, I'll circle back on this. If we decide to proceed w/ stream<row> over list<row>, I'll merge this PR!

Signed-off-by: danbugs <[email protected]>
@danbugs
Copy link
Collaborator Author

danbugs commented Jan 26, 2023

~update: I've changed from stream to list!

@danbugs danbugs merged commit 6a844fb into WebAssembly:main Jan 30, 2023
@danbugs danbugs deleted the danbugs/start-sql branch January 30, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants