Skip to content

Are self-joins on the same column possible with the Record-based API? #740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mallman opened this issue Mar 26, 2020 · 8 comments
Closed

Comments

@mallman
Copy link
Collaborator

mallman commented Mar 26, 2020

I'm trying to write a QueryInterfaceRequest for a self-join on a table on a single column and coming up short. I would greatly appreciate your assistance.

Assume the following model:

struct PetSitter {
  let petId: Int
  let sitterId: Int
}

The SQL query I'm trying to replicate in the record API looks like this:

select ps.sitterId from petSitter ps
inner join petSitter ps1 on ps.petId = ps1.petId
where ps1.sitterId in (1, 2, 3)

In plain English, "Find all the sitters who pet-sit the pets that pet sitters 1, 2, 3 pet-sit." It's sort of a transitive closure over a set of pet sitters via the pets they sit.

Does my question make sense?

@groue
Copy link
Owner

groue commented Mar 26, 2020

Hello again @mallman,

You sure make sense 😅

The query interface request can generate SQL joins, through record associations. Self-joins are mentioned at https://github.com/groue/GRDB.swift/blob/master/Documentation/AssociationsBasics.md#self-joins.

But those don't quite match your need.

Indeed, the query interface request focuses on parent-children relationships based on foreign keys. But you need a "sibling" relationship that is not supported by any foreign key in the database.

Such relationships are not mentionned in the documentation at all. Actually, the query interface has not been particularly vetted for this use case yet. I'm happy to explore it with you.

So... let's do it!

First, let's make PetSitter able to generate SQL:

extension PetSitter: TableRecord { }

Now let's define an association. A pet-sitter has many "co-pet-sitters", which are the pet-sitters that pet-sit the same pet:

extension PetSitter {
    static let coPetSitters = hasMany(
        PetSitter.self,
        key: "coPetSitters",
        using: ForeignKey(["petId"], to: ["petId"]))
}

It's a proper "has many" association, from one row of a table to several rows in another. But we abuse the ForeignKey type here: there's no proper foreign key at the database level. We can hack it because this "foreign key" feeds the ON clause of joined SQL queries.

Now we can fetch all pet-sitters who can be joined with a co-pet-sitter of id 1, 2, or 3:

let sitterId = Column("sitterId")

let coPetSitters = PetSitter
    .coPetSitters
    .filter([1, 2, 3].contains(sitterId))

let request = PetSitter
    .joining(required: coPetSitters)
    .select(sitterId, as: Int.self)
    
let ids = try request.fetchAll(db) // [Int]

The generated SQL is:

SELECT "petSitter1"."sitterId"
FROM "petSitter" "petSitter1"
JOIN "petSitter" "petSitter2"
  ON ("petSitter2"."petId" = "petSitter1"."petId")
 AND ("petSitter2"."sitterId" IN (1, 2, 3))

@groue groue added the support label Mar 26, 2020
@mallman
Copy link
Collaborator Author

mallman commented Mar 26, 2020

I believe I've found a solution. We create the following associations:

extension PetSitter {
  static let sitter = belongsTo(Sitter.self)
  var sitter: QueryInterfaceRequest<Sitter> {
    return request(for: PetSitter.sitter)
  }

  static let petSitter = hasMany(PetSitter.self, using: ForeignKey(["petId"], to: ["petId"]))
  var petSitter: QueryInterfaceRequest<PetSitter> {
    return request(for: PetSitter.petSitter)
  }
}

Then the request looks like

let alias = TableAlias()
let request =
  Sitter.including(required:
      Sitter.pet.joining(required: PetSitter.petSitter.aliased(alias)))
    .filter([1, 2, 3].contains(alias[Column("sitterId")]))

@groue
Copy link
Owner

groue commented Mar 26, 2020

Great, @mallman, you did it as well 👍

The table alias is indeed necessary for a joined table to enter the WHERE clause. When filtering in the ON clause is equivalent, you can just move the filter to the association, and get rid of the table alias.

@groue groue closed this as completed Mar 26, 2020
@mallman
Copy link
Collaborator Author

mallman commented Mar 26, 2020

I just noticed you replied to my solution almost simultaneously! Let me compare.

Your coPetSitter variable name is definitely more descriptive. 👍

Your query request uses a single joining. Mine uses two (including and joining, actually). That's because in my actual use case (no, I'm not actually creating a neighborhood pet sitting app) the record I want to hydrate includes the columns from the joined table. (Or, I don't really want ps.sitterId, I want sitter.*).

I used a filter instead of a join clause. As you point out, these are equivalent for an inner join and works perfectly for the scenario I presented. The reason I use a filter on an alias here is so I can create a function that takes a table alias, creates the request using that alias, then build on the returned request with reference to the table alias.

Thank you very much for your assistance here, @groue. If you want to take anything from our conversation here, you might want to document this kind of self-join for future devs.

@groue
Copy link
Owner

groue commented Mar 26, 2020

If you want to take anything from our conversation here, you might want to document this kind of self-join for future devs.

I don't quite know how to derive documentation of a general pattern from our use case, I admit.

The topic of "joins in general" is insanely wide. Associations cover a great deal of it, but they are not a good conceptual fit for the whole problem space. They properly deal with parent-children relationships (and that's an great deal of the use cases for joins), but that's all.

In this issue, we had GRDB generate the desired sql, and that's cool. But we had to double-guess the library, rely on implementation details. We have defined a "foreign key" which does not exist, and this does not smell good.

The good news is that I don't plan to break our code at all. I'm actually quite aware of this "back-door": it will remain. I'll answer questions like yours, and I won't prevent users from hacking around. Maybe one day the query interface will have a proper chapter about other kinds of join, but I guess we'll need new apis.

@groue
Copy link
Owner

groue commented Mar 26, 2020

Actually, I'd be somewhat satisfied with something like:

extension PetSitter {
    static let coPetSitters = hasMany(PetSitter.self, joining: ["petId"], to: ["petId"])
}

But what about supporting any joining condition, as below?

extension PetSitter {
    // left and right below are TableAlias
    static let coPetSitters = hasMany(PetSitter.self, joiningOn: { left, right in
        left[Column("petId")] == right[Column("petId")]
    })
}

Can we use SQL interpolation?

extension PetSitter {
    // left and right below are TableAlias
    static let coPetSitters = hasMany(PetSitter.self, joiningOn: { left, right in
        "\(left).petId = \(right).petId"
    })
}

Using the "join" term in the API would lift my conceptual concerns :-)

This exploration has just started, and remains to be done, though.

warning

The joining condition ought to conform to Equatable. That's how we can join the same association several times. The closures in the sample codes above are not equatable, but maybe their result can be.

@mallman
Copy link
Collaborator Author

mallman commented Mar 28, 2020

Hi @groue.

These all look very promising. I'd like to suggest one more possible syntax:

extension PetSitter {
  static let coPetSitters =
     hasMany(PetSitter.self, joiningColumns: ["petId", "neighborhoodId"])
}

This would perform a self-join on the columns "petId", "neighborhoodId". Just a thought.

Thanks as always for your help and insight.

@groue
Copy link
Owner

groue commented Mar 28, 2020

Indeed there is a SQL syntax which is interesting:

SELECT FROM a JOIN b USING(col1, col2)

I wonder if we could inject this "using" word in your suggestion:

extension PetSitter {
  static let coPetSitters =
     hasMany(PetSitter.self, joinedUsing: ["petId", "neighborhoodId"])
}

I like it. It uses an established SQL idiom that some users will be happy to recognize.

Yet it's still pretty restricted. If the user now wants ON left.foo = right.bar, he'll be stuck. If the user wants a to-one relationship, he'll be stuck as well. And if he wants ON left.foo > right.bar + 1, or even an empty joining clause, he'll be completely clueless. He won't understand why such expectable variations on a theme are not supported. This creates frustration for everybody.

Now you're focused on your self-join. But you'll get over it eventually, and move to other joining adventures, don't you?

We need to identify and address other use cases before we can ship hasMany(_: joinedUsing:), one in a few other apis that tell a consistent story about joins, and don't block users into dead-ends.

groue added a commit that referenced this issue Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants