Skip to content

Document summarizer #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rglobisch-bpc opened this issue Nov 13, 2023 · 8 comments
Open

Document summarizer #952

rglobisch-bpc opened this issue Nov 13, 2023 · 8 comments
Labels

Comments

@rglobisch-bpc
Copy link

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Is it possible to use the current solution/ indexer to summarize documents?

@pamelafox
Copy link
Collaborator

You could potentially use this for summarizing documents, if you amend the prompt in the approaches files. How are you envisioning people interacting with it? Would they say "summarize " or would they select a document from a list and then ask for a summary? I'd wonder whether it'd be more efficient to generate the summaries ahead of time, for every document, instead of waiting for a user to ask.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

@github-actions github-actions bot added the Stale label Jan 22, 2024
@gregorcvek
Copy link

We would like to use it to c/p our internal company data text into prompt and make summary out of it with the model, instead of using the "public" openai service. There is a limit 1000 chars for the input prompt. Could we change this (to which limit then) and how that would also probably impact the performance as well?

@pamelafox
Copy link
Collaborator

Hm, what do you mean by a limit of 1000 characters for the input prompt? I believe the limit should be for the entire request, and that usually varies between 4K and 128K tokens, depending on what model you're using. Generally, performance does depend on both input tokens and output tokens. If you know you always want to summarize, then you could run an offline script to generate the summaries and save them.

@gregorcvek
Copy link

Well if I try to paste more than 1000 characters into the question input (prompt), I can't do it. I think this is connected to the limitation in QuestionInput.tsx (frontend/src/components/QuestionInput/QuestionInput.tsx), starting on line 44:

const onQuestionChange = (_ev: React.FormEvent<HTMLInputElement | HTMLTextAreaElement>, newValue?: string) => {
if (!newValue) {
setQuestion("");
} else if (newValue.length <= 1000) {
setQuestion(newValue);
}
};

@pamelafox
Copy link
Collaborator

Ah good find! I didn't write that original code so didn't realize we had a limitation imposed in the frontend. I'm not sure we need that there, given that developers may be using GPT models with different context length limits. @mattgotteiner What do you think about removing that constraint? We could also ask @pablocastro about its original intent, as he wrote that line.

You can remove that for now, for your own needs.

@mattgotteiner
Copy link
Collaborator

We should remove this constraint - it is probably due to an oversight in the original implementation. Thanks for finding this problem.

@pablocastro
Copy link
Collaborator

+1, feel free to change, I suspect this was originally in place to make sure the input + instructions didn't exceed the context window limit, but those limits have gone up, and there's probably better ways to restrict this if needed. As a general good practice, you still want a limit (either a higher one or a more elaborate one that depends on the target LLM's max prompt length) since you should not allow inputs that haven't been tested, so you need a test that tries whatever static/dynamic max length input you want to allow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants