-
Notifications
You must be signed in to change notification settings - Fork 184
feat: support uploading pdf, docx, txt #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
8db5405
feat: upload pdf and send content to LLM
thucpn 0d140f4
Merge branch 'main' into feat/upload-pdf
thucpn 5a06cf9
fix: lint
thucpn 97af04e
refactor: move embed to chat api folder
thucpn 312b6d3
refactor: file content preview component
thucpn d73ee43
refactor: use content file type for all text files
thucpn d458783
Merge branch 'main' into feat/upload-pdf
thucpn 035e96e
fix: lint
thucpn f452027
use pipeline transformation & upgrade llamaindex latest
thucpn f707c6f
use svg image for pdf, docx, txt
thucpn 2d19560
fix: body collapsed when open dialog
thucpn 04a4dc9
return backend for useClientConfig only
thucpn 4ba4c64
fix: lint
thucpn fa991f4
Create late-weeks-sneeze.md
thucpn 9c8192d
refactor: rename ContentFile to DocumentFile
thucpn 7ff4843
refactor: rename annotation type
thucpn 3ca2f65
refactor: document preview component
thucpn 689ad9b
fix: lint
thucpn ee8cb00
refactor: move upload logic to useFile
thucpn 28696d5
refactor: use PDFReader and TextNode
marcusschiesser cca49c5
fix: next.config
marcusschiesser afb5405
feat: support uploading docx, pdf, txt
thucpn 4b66d29
move embeddng to chat folder
thucpn d07ffe9
feat: add embed api for express
thucpn 948b1b6
fix: lint
thucpn 0a195f8
feat: use local index
marcusschiesser d22310d
Merge branch 'main' into feat/upload-pdf
marcusschiesser aff87bb
add todos for using doc ids
marcusschiesser 38f231c
add support for fastapi
leehuwuj 2629c88
add filters
leehuwuj c21e843
fix chat filering python
leehuwuj 34ab445
fix instantiate reader
leehuwuj 321d77d
add save file and fix issues
leehuwuj 259c3ec
change to /chat/upload route
leehuwuj d6afe28
refactor(frontend): support ref type for document content
thucpn 6298e4b
refactor(nextjs): rename route /embed to /upload
thucpn 608a338
feat: save document when uploading document
thucpn a9fa5cd
feat: add nodes to vectorstore and query
thucpn a00cb3d
feat: update document metadata from uploaded file infor
thucpn 425580d
feat: get query filters from document ids
thucpn 3b1c743
refactor(express): use new llamaindex for sharing chat logic between …
thucpn 7a0ce3f
docs: remove useless log
thucpn d5f4395
fix: wrong import embedding path
thucpn cc15059
fix: persist vectordb
thucpn efdd43f
feat: don't open preview dialog for ref content
thucpn 07e0821
use in filter operator
leehuwuj 709ef1f
use mimetypes lib and change private file folder
leehuwuj 43e8035
change tool-output to tool/output
leehuwuj fdb32b7
add back csv handler
leehuwuj 445b4cc
add a prefix message if user uploaded a file
leehuwuj a2787ae
add txt reader and fix typo
leehuwuj d48bcb8
improve code
leehuwuj 93fde20
fix: construct file url from private and file_name
thucpn 884bc6d
refactor: rename embeddings to documents
thucpn 2cc21ac
refactor: split llamaindex folder
thucpn ce9ce5e
fix: import path to llamaindex ts folder
thucpn 27332e6
fix: wrong engine path
thucpn c3f70a1
remove redundant log
leehuwuj 20a58c1
remove wrong log
leehuwuj ab279c6
update file upload to only send text content instead of list
leehuwuj b638eae
add missing fe and use flatreader
leehuwuj 6aa7d57
improve code
leehuwuj 1f85358
remove adding message prefix
leehuwuj 498723a
update milvus package
leehuwuj be45b3f
improve log
leehuwuj 5c8e79c
Merge remote-tracking branch 'origin/main' into feat/upload-pdf
leehuwuj 695923c
update code comments
leehuwuj 0e8786b
fix: use makeDir function with default recursive option
thucpn 3404554
fix: lint
thucpn 0ccb51e
fix: set request body size
thucpn 55684b2
fix: lint
thucpn 197cc90
cleanup PR
marcusschiesser e9ad3ed
fix: DocumentFileContent value can be string in backend ts
thucpn 503141f
Update templates/components/vectordbs/python/none/generate.py
marcusschiesser 30ebbe3
improve code
leehuwuj f670b1a
fix: while testing fastapi contextengine
marcusschiesser 43149bf
refactor: clean streaming
marcusschiesser 8068ad5
fix: use all annotations in TS code
marcusschiesser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
"create-llama": patch | ||
--- | ||
|
||
Support upload document files: pdf, docx, txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
36 changes: 36 additions & 0 deletions
36
templates/types/streaming/nextjs/app/api/chat/embed/embeddings.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import { | ||
Document, | ||
IngestionPipeline, | ||
MetadataMode, | ||
Settings, | ||
SimpleNodeParser, | ||
} from "llamaindex"; | ||
import pdf from "pdf-parse"; | ||
|
||
export async function splitAndEmbed(content: string) { | ||
const document = new Document({ text: content }); | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [ | ||
new SimpleNodeParser({ | ||
chunkSize: Settings.chunkSize, | ||
chunkOverlap: Settings.chunkOverlap, | ||
}), | ||
Settings.embedModel, | ||
], | ||
}); | ||
const nodes = await pipeline.run({ documents: [document] }); | ||
return nodes.map((node, i) => ({ | ||
text: node.getContent(MetadataMode.NONE), | ||
embedding: node.embedding, | ||
})); | ||
marcusschiesser marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
export async function getPdfDetail(rawPdf: string) { | ||
const pdfBuffer = Buffer.from(rawPdf.split(",")[1], "base64"); | ||
const content = (await pdf(pdfBuffer)).text; | ||
const embeddings = await splitAndEmbed(content); | ||
return { | ||
content, | ||
embeddings, | ||
}; | ||
} | ||
marcusschiesser marked this conversation as resolved.
Show resolved
Hide resolved
|
28 changes: 28 additions & 0 deletions
28
templates/types/streaming/nextjs/app/api/chat/embed/route.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
import { NextRequest, NextResponse } from "next/server"; | ||
import { initSettings } from "../engine/settings"; | ||
import { getPdfDetail } from "./embeddings"; | ||
|
||
initSettings(); | ||
|
||
export async function POST(request: NextRequest) { | ||
try { | ||
const { pdf }: { pdf: string } = await request.json(); | ||
if (!pdf) { | ||
return NextResponse.json( | ||
{ error: "pdf is required in the request body" }, | ||
{ status: 400 }, | ||
); | ||
} | ||
const pdfDetail = await getPdfDetail(pdf); | ||
return NextResponse.json(pdfDetail); | ||
} catch (error) { | ||
console.error("[Embed API]", error); | ||
return NextResponse.json( | ||
{ error: (error as Error).message }, | ||
{ status: 500 }, | ||
); | ||
} | ||
} | ||
marcusschiesser marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
export const runtime = "nodejs"; | ||
export const dynamic = "force-dynamic"; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.