Skip to content

Add Parallel Indexing #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 19, 2023
Merged

Add Parallel Indexing #30

merged 23 commits into from
May 19, 2023

Conversation

PJColombo
Copy link
Member

@PJColombo PJColombo commented May 7, 2023

It depends on the following API changes.

This PR introduces parallel indexing by dividing the slots to be processed into manageable chunks, each of which is handled by a dedicated indexing thread.

To enhance error resilience during large-scale slot processing, an initial division of slots is performed. Each chunk is then processed sequentially. This approach allows for frequent updates to the latest slot and mitigates the need for extensive rollbacks in the event of a failure.

Additional modifications included in this PR:

  • Context refactoring.
  • Addition of local development tracing spans.
  • Minor adjustments to the client's types.

Also, I attach the logs from a preliminary benchmark test, in which 1000 slots were indexed:

  • Sequential indexing (~4 minutes and 39 seconds):
[2023-05-17T18:28:10.597Z]  INFO: blobscan_indexer/82110 on tallahassee: [SLOTS_PROCESSOR - END] (elapsed_milliseconds=279484,file=src/main.rs,final_slot=88000,initial_slot=87001,line=47,target=blob_indexer)
  • Parallel indexing (~24 seconds)
[2023-05-17T18:35:45.750Z]  INFO: blobscan_indexer/85108 on tallahassee: [SLOTS_PROCESSOR - END] (elapsed_milliseconds=24255,file=src/main.rs,final_slot=88000,initial_slot=87001,line=47,target=blob_indexer)

There's a substantial improvement in speed, performing approximately 11.52 times faster than the sequential approach.

@PJColombo PJColombo linked an issue May 7, 2023 that may be closed by this pull request
@PJColombo PJColombo closed this May 9, 2023
@PJColombo PJColombo reopened this May 9, 2023
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from 124b9fc to 92867b4 Compare May 9, 2023 13:03
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from 92867b4 to f884509 Compare May 9, 2023 13:19
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from 8141b50 to 1ed0d31 Compare May 12, 2023 17:26
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from 1ed0d31 to d21a5f9 Compare May 13, 2023 18:21
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from a4d7e25 to d09e496 Compare May 15, 2023 01:51
@PJColombo PJColombo marked this pull request as ready for review May 15, 2023 02:37
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from d09e496 to 40d9cbd Compare May 15, 2023 02:49
PJColombo added 4 commits May 17, 2023 14:28
- Rollback slot retryer
- Rename slot processor manager for better legibility
- Move multiple slots processing logic out of the `SlotProcessor`
- Implement `Display` trait for slots processor error
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch 2 times, most recently from c943978 to ba18f70 Compare May 17, 2023 23:18
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from ba18f70 to 0aa7e29 Compare May 17, 2023 23:18
@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from f84bb0e to 87f24c1 Compare May 18, 2023 00:22
Copy link
Member

@PabloCastellano PabloCastellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I'll try it today

@PJColombo PJColombo force-pushed the feature/parallel-indexing branch from 7fe9ac6 to 6013b40 Compare May 18, 2023 11:08
@PJColombo PJColombo merged commit d08fc1c into master May 19, 2023
@PJColombo PJColombo deleted the feature/parallel-indexing branch May 19, 2023 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up indexer's blobs synchronization via parallelization
2 participants