Skip to content

Package collections: improve search performance #3213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 21, 2021

Conversation

yim-lee
Copy link
Contributor

@yim-lee yim-lee commented Jan 20, 2021

This is an alternative to #3090 but is a complete solution.

Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., package.summary.contains("foobar")). This can be optimized.

Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for findPackage, searchPackages, and searchTargets methods of SQLitePackageCollectionsStorage.

Without optimization, PackageCollectionsTests.testPackageSearchPerformance and testTargetsSearchPerformance take about ~400ms to run on my local machine.

With FTS, testPackageSearchPerformance takes ~40ms and testTargetsSearchPerformance ~50ms.

The testSearchTargetsPerformance in InMemoryPackageCollectionsSearchTests (#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.

Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in InMemoryPackageCollectionsSearchTests, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. put/remove updates both the SQLite FTS and trie. The improvement with using trie varies--testTargetsSearchPerformance takes between ~15-50ms to complete.

put now takes longer to complete because of the search index updates.

Result:
Better search performance.

This is an alternate to swiftlang#3090 but is a complete solution.

Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., `package.summary.contains("foobar")`). This can be optimized.

Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for `findPackage`, `searchPackages`, and `searchTargets` methods of `SQLitePackageCollectionsStorage`.

Without optimization, `PackageCollectionsTests.testPackageSearchPerformance` and `testTargetsSearchPerformance` take about ~400ms to run on my local machine.

With FTS, `testPackageSearchPerformance` takes ~40ms and `testTargetsSearchPerformance` ~50ms.

The `testSearchTargetsPerformance` in `InMemoryPackageCollectionsSearchTests` (swiftlang#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.

Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in `InMemoryPackageCollectionsSearchTests`, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. `put`/`remove` updates both the SQLite FTS and trie. The improvement with using trie varies--`testTargetsSearchPerformance` takes between ~15-50ms to complete.

`put` now takes longer to complete because of the search index updates.

Result:
Better search performance.
@yim-lee yim-lee changed the title [DNM] Package collections: improve search performance Package collections: improve search performance Jan 21, 2021
@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 21, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 21, 2021

Verified locally with Amazon Linux 2 that PackageCollectionsTests pass with 47c8986.

We can't fallback to FTS3--while DDL statements work with some modifications, query results are different when run on different platforms.

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 21, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 21, 2021

@swift-ci please smoke test

@yim-lee yim-lee merged commit 95e5c3f into swiftlang:main Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants