Skip to content

Add chunking for documentation in HTML form #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

max-svistunov
Copy link
Contributor

@syedriko Could you PTAL? At this point I'm mostly interested whether you agree with the approach, as I'm still finishing work on the prepend_parent_section_text and keep_siblings_together options. Thank you.

@openshift-ci openshift-ci bot requested review from bparees and tisnik April 3, 2025 13:04
Copy link

openshift-ci bot commented Apr 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xrajesh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@syedriko
Copy link
Contributor

syedriko commented Apr 3, 2025

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 3, 2025
@danilo-gemoli
Copy link

/cc

@openshift-ci openshift-ci bot requested a review from danilo-gemoli April 4, 2025 14:32
@max-svistunov max-svistunov force-pushed the ols-1498-chunking-html branch from 6b0f7dc to b69bdee Compare April 8, 2025 12:56
@max-svistunov max-svistunov force-pushed the ols-1498-chunking-html branch from a4448d3 to ddceb5e Compare June 3, 2025 12:35
Add HTML tag-to-text ratio detection

Improve section hierarchy handling with; better fallback handling

Better parent context handling in section tree chunking

Fix parsing when soup.body is None

Fix code blcok handling
@max-svistunov max-svistunov force-pushed the ols-1498-chunking-html branch from ddceb5e to b2ab598 Compare June 3, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants