-
Notifications
You must be signed in to change notification settings - Fork 6
updated sql.wit.md, and README.md #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
@etehtsea I thought you might be interested in this - I was reminded of the comments you had in spinframework/spin#813 (comment) around the generic data types we used in Spin, and the approach for supporting different database backends. I know @danbugs is not wedded to this approach and is keen for all feedback! |
@danbugs Per our chat the other day, I have a few thoughts on this, some tactical, some more existential. Partly this may be down to scope - what does this interface aim to achieve? For example, is it a simple interface for the 80% case, that Wasm hosts can implement for simple, abstracted SQL stores? Or is it intended to cover data-intensive applications too? Is it envisioned as a target for languages and libraries to compile to (in the same way that WASI stands in for Posix)? That will affect what our answers look like. For me, the existential question is why would anyone use a new interface from the Wasm designers rather than carrying on using their familiar database package in their preferred language? Why would a Rust developer throw away the I think the answer is "operators will be more willing to trust modules that say 'I want a database' than modules that say 'I want raw socket access.'" In component model terms, operators may offer a world that includes a SQL interface but not a socket interface. Perhaps the sense then is that the operator manages the database and you, as the developer, don't get to think about Postgres or MySQL or SQL Server - you just get what you're given. And if you want to party hard on a specific database engine, you use your language's provider and demand a world with sockets. But even then... what if the operator wants to express data types that aren't in the common set? What if the operator is hosting a database with money types, or spatial types? Do we need an extension mechanism? If so, how do language-level libraries support that mechanism? Conversely, if we're creating an interface which is strictly "generic database as a service," should we also abstract concepts like transactions rather than requiring people to send engine-specific SQL text? We talked when we met about a possible tiered approach, a low-level interface that was perhaps little more than the byte-level interface, on top of which language- and database-specific libraries could build idiomatic APIs, and a high-level one that could be common across the "I just want some SQL store" category. I'm not sure whether the low-level interface gives better guarantees than sockets but it's worth exploring. So what I would really like to see is a conversation amongst a range of stakeholders - application developers, library developers, database vendors, operators, etc. - about the use cases we want this to hit and the use cases we don't. In the meantime, I appreciate you getting this effort kicked off. I think this is one of the most nuanced pieces of the WASI standards jigsaw, and I'm excited to see where it takes up, and to test out the ideas through Spin. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more tactical comments about the specifics of the proposal.
sql.wit.md
Outdated
interface "wasi:sql" { | ||
// iterator item type | ||
resource row { | ||
field-name: list<string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interfaces like .NET IDataRecord
allow runtime discovery of field types. Should we make provision for that here, rather than just the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of using that in addition to the data-type
in values
— perhaps, the actual field-type?
float(float64), | ||
str(string), | ||
boolean(bool), | ||
date(string), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting... are you thinking the standard would define a canonical string format and implementations would adapt however they saw the database's DATE type into that string format? (I don't have a better plan.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Initially, I was thinking of just relying on the format used by the underlying database. On the other hand, it might be a good idea to define a standardized format for storing dates in the interface itself for consistency's sake — we could use ISO 8601, or smt of the sort. That said, I'm not sure. How do you feel about just relying on the underlying format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "the underlying format" is. When I ask SQL Server for a datetimeoffset
column, I assume it is stored in some opaque binary format, and surfaced as a System.DateTimeOffset
or chrono::DateTime
or whatever type by the driver. So we would probably need to format it anyway.
I could be wrong though - could you expand on your understanding of "underlying format"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itowlson ~ I tried out implementing this interface for postgres, and here's what I did w/ regard to this:
"date" => {
let v: String = row.get(i);
let parsed = NaiveDate::parse_from_str(&v, "%Y-%m-%d").unwrap();
DataType::Date(parsed.to_string())
}
"time" => {
let v: String = row.get(i);
let parsed = NaiveTime::parse_from_str(&v, "%H:%M:%S").unwrap();
DataType::Time(parsed.to_string())
}
"timestamp" => {
let v: String = row.get(i);
let parsed =
NaiveDateTime::parse_from_str(&v, "%Y-%m-%d %H:%M:%S").unwrap();
DataType::Timestamp(parsed.to_string())
}
I'm also interested in the Otherwise, I'm not sure why it has to be the common interface if the actual storage implementation might have different functionality and might require different WIT interfaces. For example, mysql provides column nullability info in the response and pg doesn't. So you might want to encode them differently in wit. Also, db-specific types and so on. If this is some generic interface, all of these is probably out of scope. |
Not an answer to the above questions but some additional notes: I had been working on a PR for this repo a while back but never gotten around to submitting it. Additionally, I think this proposal should support / offer statically and dynamically defined queries and did some sketching to that effect.
|
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
Overall, tactical comments aside, I see discussions in two major areas:
About (1), I'll be finishing off the README for related sections right after writing this comment and I will update the PR for review. Hopefully, that will clarify things and help w/ delivering targeted feedback. That said, the overarching goal is that provide an interface for that 80% range of user applications — with this, we want to provide a way to query a database in a way that is generic, and safe (i.e., no unknown sockets). About (2), I think that there are four big types we are missing:
There are three ways to target them:
edit: formatting |
Signed-off-by: danbugs <[email protected]>
Signed-off-by: danbugs <[email protected]>
sql.wit.md
Outdated
// implementors can make use of that fact to optimize | ||
// the performance of query execution (e.g., using | ||
// indexes). | ||
query: func(q: statement) -> result<row, error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the query()
to be able to return multiple rows. Is there a plan to add a new resource as collection of rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks at the moment like query
is returning the first row, and then you call next
to get to the next row - this fits with the sample in the read-me. But then I can't see how to get to an item
. @danbugs I think something might have got out of whack during the various revisions - looking at the read-me has row
gotten mixed up with item
in some places?
FWIW, the Spin prototype has query
returning a rowset
which contained a list of row
, and a row
contained a list of value
(the field info was separate). However that doesn't handle very large datasets well (everything has to be realised into the list). This might be remedies by streams, I'm not sure; but in the meantime, a resource that fetches the next record on demand (potentially lazily) makes sense to me. Not sure if there are performance considerations relating to going across a marshalling boundary on every row though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the read-me has row gotten mixed up with item in some places?
It made me confused hence my comments.
This might be remedies by streams
I assume that stream
would be the right solution here at some point, might be wrong though.
a resource that fetches the next record on demand (potentially lazily) makes sense to me
That would my expectation here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about how to get the item as well. I also wonder, does query()
block until a row is ready, or does it return immediately (possibly with an error indicating the query couldn't be run) and calling row.next()
blocks until the row is ready? I'm just wondering if it will be possible to fire queries on multiple connections in the case where the queries are long running and you can allow to server to process them simultaneously.
Bringing some lessons back from the Spin implementation - we had some good feedback from @ThorstenHans, which it would be great to consider how to incorporate into wasi-sql:
https://www.thorsten-hans.com/crud-in-webassembly-with-fermyon-spin-and-mysql/ |
Signed-off-by: danbugs <[email protected]>
@itowlson, @chrisgacsal, and @kesmit13 ~ Thank you for providing your feedback on this PR! I'm sorry that it took me this long to get back to you, I was on vacation and have just now been getting caught up w/ everything (: That said, @itowlson is right~ With all the renames that have happened, things got a bit out of whack w/ the For the record, I'm doing this because, while attempting to actually implement this interface, I couldn't really find a way to convert between iterator types (i.e., say, a postgres iterator and whatever we want to return to the guest) while maintaining lazy loading, which is something I don't think we coulD get around. If you are interested in seeing this in action, you can check: deislabs/spiderlightning#305 |
@danbugs Thanks for the updates! Another question is the shall the |
As of now, we are not planning for these interfaces to implement anything related to the control plane like connecting, and whatnot – just data plane operations.
Yeah, that's how I got around it in the SpiderLightning implementation – just used In any case, I'll circle back on this. If we decide to proceed w/ |
Signed-off-by: danbugs <[email protected]>
~update: I've changed from |
Signed-off-by: danbugs [email protected]