-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Package collections: improve search performance #3213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an alternate to swiftlang#3090 but is a complete solution. Motivation: Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., `package.summary.contains("foobar")`). This can be optimized. Modifications: Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for `findPackage`, `searchPackages`, and `searchTargets` methods of `SQLitePackageCollectionsStorage`. Without optimization, `PackageCollectionsTests.testPackageSearchPerformance` and `testTargetsSearchPerformance` take about ~400ms to run on my local machine. With FTS, `testPackageSearchPerformance` takes ~40ms and `testTargetsSearchPerformance` ~50ms. The `testSearchTargetsPerformance` in `InMemoryPackageCollectionsSearchTests` (swiftlang#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer. Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in `InMemoryPackageCollectionsSearchTests`, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. `put`/`remove` updates both the SQLite FTS and trie. The improvement with using trie varies--`testTargetsSearchPerformance` takes between ~15-50ms to complete. `put` now takes longer to complete because of the search index updates. Result: Better search performance.
@swift-ci please smoke test |
Verified locally with Amazon Linux 2 that We can't fallback to FTS3--while DDL statements work with some modifications, query results are different when run on different platforms. |
@swift-ci please smoke test |
tomerd
reviewed
Jan 21, 2021
Sources/PackageCollections/Storage/SQLitePackageCollectionsStorage.swift
Outdated
Show resolved
Hide resolved
tomerd
approved these changes
Jan 21, 2021
@swift-ci please smoke test |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an alternative to #3090 but is a complete solution.
Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g.,
package.summary.contains("foobar")
). This can be optimized.Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for
findPackage
,searchPackages
, andsearchTargets
methods ofSQLitePackageCollectionsStorage
.Without optimization,
PackageCollectionsTests.testPackageSearchPerformance
andtestTargetsSearchPerformance
take about ~400ms to run on my local machine.With FTS,
testPackageSearchPerformance
takes ~40ms andtestTargetsSearchPerformance
~50ms.The
testSearchTargetsPerformance
inInMemoryPackageCollectionsSearchTests
(#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in
InMemoryPackageCollectionsSearchTests
, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization.put
/remove
updates both the SQLite FTS and trie. The improvement with using trie varies--testTargetsSearchPerformance
takes between ~15-50ms to complete.put
now takes longer to complete because of the search index updates.Result:
Better search performance.