Skip to content

How to best create this complicated query? #887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bellebethcooper opened this issue Dec 10, 2020 · 8 comments
Closed

How to best create this complicated query? #887

bellebethcooper opened this issue Dec 10, 2020 · 8 comments
Labels

Comments

@bellebethcooper
Copy link
Collaborator

What did you do?

Hi Gwendal,

Thanks for all your hard work with GRDB 5! I love having all the GRDBCombine stuff built-in now :)

I've spent a long time working on a complicated query recently and I got it working but wanted to check if you have any advice about a better way to handle this situation. I hope that's okay! Feel free to close this issue if not, as it's not an actual bug report or anything.

The basic situation is this: I have three models, and I want to create a query using associations that will fill a local model (i.e. not in the database). I want to keep this in one query so I can observe it using Combine publishers to keep my SwiftUI views updated.

Here are basic versions of the models I'm working with:

// Goal
struct Goal: Codable, Identifiable {
    var id: Int64?
    let rule: Rule

    static let goalPeriods = hasMany(GoalPeriod.self)
        var goalPeriods: QueryInterfaceRequest<GoalPeriod> {
            request(for: Goal.goalPeriods)
        }

    static let goalDatas = hasMany(GoalData.self)
        var goalDatas: QueryInterfaceRequest<GoalData> {
            request(for: Goal.goalDatas)
        }
}

enum Rule: Int, Codable {
    case minPerDay, maxPerDay
    case minPerWeek, maxPerWeek
}

extension Rule: DatabaseValueConvertible { }

// GoalPeriod
struct GoalPeriod: Codable, Identifiable {
    var id: Int64?
    let goalId: Int64
    let created: Date
    
    static let goal = belongsTo(Goal.self)
        var goal: QueryInterfaceRequest<Goal> {
            request(for: GoalPeriod.goal)
        }
}

// GoalData
struct GoalData: Codable, Identifiable {
    var id: Int64?
    var date: Date
    var goalId: Int64

    static let goal = belongsTo(Goal.self)
        var goal: QueryInterfaceRequest<Goal> {
            request(for: GoalData.goal)
        }
}

And here's the intermediary model I want to fill from my request:

struct GoalPeriodInfo: FetchableRecord, Codable {
    var goal: Goal
    var goalPeriod: GoalPeriod
    var goalData: [GoalData]
}

So a Goal has many GoalPeriods and many GoalDatas. I want to request for a subset of goals based on their Rule type, and then for each of those goals I want to fill a GoalPeriodInfo model with that goal, only the most recent of its related GoalPeriods, and a subset of its GoalData models based on a filter.

Here's the request I came up with that seems to do what I want:

let rules: [Rule] = listType == .daily ? [.maxPerDay, .minPerDay] : [.maxPerWeek, .minPerWeek]

let request = GoalPeriod
    // limit to the most recent one
    .annotated(with: max(Column("created")))
    // group by goal so I get one per goal
    .group(Column("goalId"))
    // Also get the related Goal
    .including(required: GoalPeriod.goal
                // filter by goals matching rules
                .filter(rules.contains(Column("rule")))
                // Get all of the GoalDatas for this Goal for the current week
                .including(all: Goal.goalDatas.filter(Column("date") >= Current.mostRecentMonday)))

I got the idea to limit my GoalPeriod request to the most recent one using max from this issue.

Here are a couple of other ways I tried to structure this request that didn't work:

let request = Goal
    .filter(rules.contains(Column("rule")))
    .including(all: Goal.goalPeriods.annotated(with: max(Column("created"))))
    .including(all: Goal.goalDatas.filter(Column("date") >= Current.mostRecentMonday))

let request = GoalData
    .annotated(with: max(Column("date")))
    .filter(Column("date") >= Current.mostRecentMonday)
    .including(required: GoalData.goal
        .filter(rules.contains(Column("rule")))
        .including(all: Goal.goalPeriods.annotated(with: max(Column("created")))))

I'm using ValueObservation to observe the request pretty much exactly how you've done it in the demo app.

So my question is just, is there anything I'm doing obviously wrong here? And is there a different way you'd recommend handling this complicated request?

Thanks!

Environment

GRDB flavor(s): GRDB
GRDB version: 5.2.0
Installation method: SPM
Xcode version: 12.2
Swift version: 5
Platform(s) running GRDB: iOS
macOS version running Xcode: 11.0.1

@groue
Copy link
Owner

groue commented Dec 10, 2020

Hello @bellebethcooper,

The basic situation is this: I have three models, and I want to create a query using associations that will fill a local model (i.e. not in the database). I want to keep this in one query so I can observe it using Combine publishers to keep my SwiftUI views updated.

Allow me a preliminary comment about this paragraph.

Fetching values with a single request that packs a lot of associations has a few advantages, but it is not required for Combine support.

Yes, packed requests are often more efficient. They can also streamline code (look nicer - honestly, not always).

But Combine publishers accept as many fetches as you want:

// OK! Works as well with ValueObservation publisher
let cancellable = dbQueue
    .readPublisher { db in
        let value1 = try fetchValue1(db)
        let value2 = try fetchValue2(db)
        return (value1, value2)
    }
    .sink { (value1, value2) in 
        ...
    }

What is most often wrong, because it disregards database consistency, is merging several publishers together. The sample code below can't guarantee that values 1 and 2 are fetched from the same database state. Concurrent writes could sneak in between the independent publishers, and ruin some important invariant. Such case may be harmless. But it may also introduce bugs.

// DUBIOUS, TO SAY THE LEAST
let cancellable = Publishers
    .Zip(
        dbQueue.readPublisher(fetchValue1),
        dbQueue.readPublisher(fetchValue2))
    .sink { (value1, value2) in 
        ...
    }

From this point of view, "packed requests" foster data consistency: you can't split them, so you can't zip their components, or combineLatest, or merge, etc.

You can safely perform several independent requests, from a consistent database state, as long as values are fetched inside a single database access block, between { db in and }. This is true for DatabaseQueue.read, DatabaseQueue.readPublisher, ValueObservation.tracking, etc. The "unit of guaranteed data consistency" is the closure with a db argument. Inside, do whatever you want, perform as many fetches as you need, with simple or complex requests, just as you prefer.

End of the preliminary comment, which sheds some light on this sentence of the Good Practices for Designing Record Types, which may have inspired your own introduction:

This is why GRDB has removed lazy loading from the list of desirable features. Instead, it provides the tooling needed to fetch data, even complex ones, in a single and safe stroke.

I'll look at your precise setup in a further comment :-)

@groue groue added the support label Dec 10, 2020
@groue
Copy link
Owner

groue commented Dec 10, 2020

I want to request for a subset of goals based on their Rule type, and then for each of those goals I want to fill a GoalPeriodInfo model with that goal, only the most recent of its related GoalPeriods, and a subset of its GoalData models based on a filter.

All right, sounds good :-)

First of all, thanks for providing the definition of your records: it made it very easy to setup a support playground!


Here's the request I came up with that seems to do what I want:

Your request runs the following SQL queries. As seen with you in #888, the first request deals with everything but including(all:), and the second one with including(all:):

SELECT goalPeriod.*, MAX(goalPeriod.created), goal.*
FROM goalPeriod
JOIN goal ON (goal.id = goalPeriod.goalID) 
         AND (goal.rule IN (?, ...))
GROUP BY goalPeriod.goalId;

SELECT goalData.*, goal.id AS grdb_id
FROM goalData 
JOIN goal ON (goal.id = goalData.goalID)
         AND (goal.rule IN (?, ...)) 
         AND (goal.id IN (?, ...))
WHERE goalData.date >= ?

The first request indeed fetches "a subset of goals based on their Rule type", and "with that goal, only the most recent of its related GoalPeriod". This can be tested - I did and it works :-)

(Yes, complex queries should often be tested, regardless of your SQL and GRDB skills).

The second request fetches the filtered GoalData for the previously fetched goals.

Kudos, the goal is achieved :-)


Maybe you feel like your request is not in the correct order: you want to fetch goals with associated records, and you end up fetching goal periods with associated records.

Maybe you had a difficulty with the group method. You want to group on a GoalPeriod's column, and this is easier to do when GoalPeriod is at the root of the request, because the group method is only defined on query interface requests, not on associations.

You can still rewrite the request so that it looks like it fetches goals with associated records. But we need a way to reference GoalPeriod in the group method. This is the role of TableAlias:

let goalPeriods = TableAlias()

let request = Goal
    // filter by goals matching rules
    .filter(/* rules */)
    
    // Also get the latest GoalPeriod
    .including(required: Goal.goalPeriods.aliased(goalPeriods)
                .annotated(with: max(Column("created"))))
    .group(goalPeriods[Column("goalId")])
    
    // Also get all of the GoalDatas for the current week
    .including(all: Goal.goalDatas.filter(/* date */))

Your call.


I got the idea to limit my GoalPeriod request to the most recent one using max from this issue.

Good.

As stated in the sample code, this technique relies on a special processing of MAX() by SQLite. This technique would not work with other databases. This technique does not scale when you want to "sort" on several columns (date and another one in order to solve the ambiguity when two records share the same date). And most of all, it's better to know what is the executed SQL in order to evaluate the relevance of this technique.

The subject of selecting ONE record in a hasMany association is the topic of #767, a failed pull request. I expect #880 to help solving it for good. When it is, we will have a better solution than the "hack" of the special processing of MAX():

// FUTURE
Goal
    .filter(/* rules */)
    .including(required: Goal.goalPeriods.order(Column("date")).last)
    .including(all: Goal.goalDatas.filter(/* date */))

Here are a couple of other ways I tried to structure this request that didn't work:

This will be the topic of another answer :-)

@groue
Copy link
Owner

groue commented Dec 10, 2020

You can still rewrite the request so that it looks like it fetches goals with associated records. But we need a way to reference GoalPeriod in the group method. This is the role of TableAlias:

let goalPeriods = TableAlias()

let request = Goal
    // filter by goals matching rules
    .filter(/* rules */)
    
    // Also get the latest GoalPeriod
    .including(required: Goal.goalPeriods.aliased(goalPeriods)
                .annotated(with: max(Column("created"))))
    .group(goalPeriods[Column("goalId")])
    
    // Also get all of the GoalDatas for the current week
    .including(all: Goal.goalDatas.filter(/* date */))

I notice that goalPeriod.goalId and goal.id are the same! We can go without any table alias, eventually:

let request = Goal
    // filter by goals matching rules
    .filter(/* rules */)
    
    // Also get the latest GoalPeriod
    .including(required: Goal.goalPeriods.annotated(with: max(Column("created"))))
    .group(Column("id"))
    
    // Also get all of the GoalDatas for the current week
    .including(all: Goal.goalDatas.filter(/* date */))

@bellebethcooper
Copy link
Collaborator Author

Hi @groue, thanks so much for your quick and detailed response! This is really helpful.

Great point about being able to observe multiple requests. I hadn't thought of doing it that way at all, but it's really helpful to know that's an option.

This can be tested - I did and it works :-)

(Yes, complex queries should often be tested, regardless of your SQL and GRDB skills).

I'm on-board with this! What's your go-to approach to testing part of a query like this one?

Your final suggestion seems to work fine, thank you! I think part of my problem was that I mistakenly thought I had to use including(all:) when working with a to-many association, but I've realised I was incorrect about that now.

Thanks again for your thoughts on this, I really appreciate it!

@groue
Copy link
Owner

groue commented Dec 11, 2020

complex queries should often be tested

I'm on-board with this! What's your go-to approach to testing part of a query like this one?

Sure. I usually distinguish:

  • The creation of the database. For example, the AppDelegate instantiates a DatabasePool at some chosen location, while tests instantiate an in-memory DatabaseQueue. Give them access to a shared Configuration if needed.

  • The database setup: that's the migrations, and maybe making sure some singletons are created in the database. AppDelegate and tests have access to this setup method.

  • The low-level tests: a test instantiates an empty DatabaseQueue or DatabasePool, as said above, inserts interesting data, or loads a resource sqlite file, and tests migrations, records, requests, etc. Tests are faster when they run an in-memory DatabaseQueue, obviously.

    I've been playing with snapshot testing recently, but I'm not quite satisfied yet. Those tests break on each migration, because you can't scope exactly the database area you want to check. I'm sure this can be improved.

  • The high-level tests: Some apps that do not expose the raw DatabasePool or DatabaseQueue to the rest of the app, but prefer building "Managers". I have Managers accept both DatabasePool (for the app) and DatabaseQueue (for tests), with a property of type DatabaseWriter. Testing them is like low-level tests: setup an empty db, or load a db from a resource, and run tests of the manager methods.

GRDB tries really hard to provide apis that behave the same for both DatabasePool and DatabaseQueue. Testing against DatabaseWriter is thus generally OK. But some demanding apps will need DatabasePool-specific apis. Then just forget about in-memory DatabaseQueue, and run everything with DatabasePools.

More or less, that's how I do it those days!

@bellebethcooper
Copy link
Collaborator Author

Great, that helps a lot. Thanks very much!

@groue
Copy link
Owner

groue commented Dec 15, 2020

I think part of my problem was that I mistakenly thought I had to use including(all:) when working with a to-many association, but I've realised I was incorrect about that now.

I wanted to cheer you up for writing this sentence.

To-many associations are usually used with including(all:), yes!

You "unlock" new features when you make the link between joining Swift methods, and their SQL mapping:

// SELECT foo.* FROM foo JOIN bar ON ...
Foo.joining(required: Foo.bar)

// SELECT foo.* FROM foo LEFT JOIN bar ON ...
Foo.joining(optional: Foo.bar)

// SELECT foo.*, bar.* FROM foo JOIN bar ON ...
Foo.including(required: Foo.bar)

// SELECT foo.*, bar.* FROM foo LEFT JOIN bar ON ...
Foo.including(optional: Foo.bar)

For example, it's clear that you can load all authors with their books with including(all:)

struct AuthorInfo {
    var author: Author
    var books: [Book]
}
// Two SQL requests
let request = Author
    .including(all: Author.books)
    .asRequest(of: AuthorInfo.self)

But you may also load all (author, book) pairs, with including(required:) (and the same to-many Author.books association).

struct Authorship {
    var author: Author
    var book: Book
}
// SELECT author.*, book.*
// FROM author
// JOIN book ON book.authorID = author.id
let request = Author
    .including(required: Author.books)
    .asRequest(of: Authorship.self)

Author.including(required: Author.books) is very similar to Book.including(required: Book.author). However, when you use optional instead of required, you can choose which element of the (author, book) pair can be nil.

And from the SQL query above, you can derive more SQL queries, based on the same Author.including(required: Author.books) base. For example, let's group by author:

struct AuthorInfo {
    var author: Author
    var latestBook: Book
}
// SELECT author.*, book.*, MAX(book.date)
// FROM author
// JOIN book ON book.authorID = author.id
// GROUP BY author.id
let request = Author
    .including(required: Author.books
        .annotated(with: max(Column("date")))
        .forKey("latestBook"))
    .groupByPrimaryKey()
    .asRequest(of: AuthorInfo.self)

And voilà, we have the authors with their latest book again (it's even simpler this way than in the sample code in #856... a never-ending quest...)

SQL is rich 😅 There are multiple ways to write requests that fetch the desired data. The more one gets fluent with SQL, the more one knows what the database can do, the more one's will becomes sharp.

In order to help apply SQL skills, GRDB joining methods map directly onto SQL joins, as seen in the beginning of this comment.

The including(all:) method is an exception, because it implements a feature which is not available in SQL. Relational databases can only fetch grids of columns and rows. But fetching all authors along with all their books is fetching something which looks more like a tree.

@bellebethcooper
Copy link
Collaborator Author

Thanks @groue, this is really helpful. Figuring out how to build up requests in GRDB to get the data I want is one of my favourite things to do at the moment! But I'm also working on improving my SQL knowledge and I can see how this will make these problems easier for me to figure out. Thanks again for your pointers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants