-
Notifications
You must be signed in to change notification settings - Fork 183
feat: able to display file url from llamacloud #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
marcusschiesser
merged 50 commits into
main
from
feat/able-to-display-file-url-from-llamacloud
Jul 18, 2024
Merged
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
3befca4
feat: get llamacloud file url in fast api
thucpn 46823c0
refactor: move llamacloud to service
thucpn 0854c5b
fix: long url should be truncated
thucpn 444e3b2
pre check pdf url
thucpn bc514c0
fix: display link on hover card when pdf fail
thucpn 873b8f3
feat(ts): get file url from LLamaCloudFileService
thucpn 9aead8a
Create happy-hairs-kick.md
thucpn 41a544e
refactor: rename variables and add try-catch
thucpn 72beab6
Merge branch 'feat/able-to-display-file-url-from-llamacloud' of githu…
thucpn 7073584
style: don't need to change UI
thucpn e728391
feat(ts): save llamacloud file to local
thucpn 49987ca
feat: download llamacloud file in python
thucpn 9c34d6f
improve python code
leehuwuj 174755e
feat: donot await download file function
thucpn ce65c0c
Merge branch 'feat/able-to-display-file-url-from-llamacloud' of githu…
thucpn ac22120
refactor python
leehuwuj bf3a8f5
Merge branch 'main' into feat/able-to-display-file-url-from-llamacloud
thucpn db9396d
refactor: move service file to llamaindex folder
thucpn 43817f7
refactor: move private function to bottom
thucpn 9490b28
download llamacloud file in output/llamacloud folder
thucpn 7a90d6c
remove old getNodeUrl
thucpn 8c70ce3
resolve python conflict
leehuwuj 809f888
Upgrade to llamaindex 0.4.14
thucpn cf017fc
make dir const
thucpn 086f90f
fix redundant python code
leehuwuj 79a454a
refactor code
leehuwuj ee28a6e
refactor: rename documents file to upload.ts
thucpn 9f1713f
refactor: split upload file to helper and pipeline
thucpn 09f6537
feat: add custom pipeline for llamacloud
thucpn c377439
feat: copy pipeline from llamacloud
thucpn 22ee650
remove llamacloud folder after copying
thucpn 112d033
fix(css): url should be truncate
thucpn 163824a
fix python filter with LlamaCloud
leehuwuj 0088d17
refactor python code
leehuwuj 35798b3
fix typo
leehuwuj c3b7621
remove copy typescript pipeline
leehuwuj f8062b4
fix TS to use local file
leehuwuj 3d98549
add back showing log for llamacloud
leehuwuj 4eb9339
refactor: Rename controllers to services for llama_cloud and file
leehuwuj 4a069f1
feat: Add "is_local_file" metadata to documents in generate.py
leehuwuj fd57765
fix: index can be null when add nodes
thucpn c933ef4
Merge branch 'main' into feat/able-to-display-file-url-from-llamacloud
thucpn 2bb7b67
fix filename
leehuwuj 0272ca5
Merge remote-tracking branch 'origin/main' into feat/able-to-display-…
leehuwuj ffc221f
refactor code
leehuwuj 8a5f098
fix: LlamaCloud auth for fromDocuments
marcusschiesser 56197d6
fix: merge error
marcusschiesser 42a0cd1
fix: is_local_file
marcusschiesser 782f39b
fix: mount sub directories
marcusschiesser 25b74a1
fix: always create output folders
marcusschiesser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
"create-llama": patch | ||
--- | ||
|
||
Display files in sources using LlamaCloud indexes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
104 changes: 0 additions & 104 deletions
104
templates/components/llamaindex/typescript/documents/documents.ts
This file was deleted.
Oops, something went wrong.
44 changes: 44 additions & 0 deletions
44
templates/components/llamaindex/typescript/documents/helper.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import fs from "fs"; | ||
import crypto from "node:crypto"; | ||
import { getExtractors } from "../../engine/loader"; | ||
|
||
const MIME_TYPE_TO_EXT: Record<string, string> = { | ||
"application/pdf": "pdf", | ||
"text/plain": "txt", | ||
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": | ||
"docx", | ||
}; | ||
|
||
const UPLOADED_FOLDER = "output/uploaded"; | ||
|
||
export async function loadDocuments(fileBuffer: Buffer, mimeType: string) { | ||
const extractors = getExtractors(); | ||
const reader = extractors[MIME_TYPE_TO_EXT[mimeType]]; | ||
|
||
if (!reader) { | ||
throw new Error(`Unsupported document type: ${mimeType}`); | ||
} | ||
console.log(`Processing uploaded document of type: ${mimeType}`); | ||
return await reader.loadDataAsContent(fileBuffer); | ||
} | ||
|
||
export async function saveDocument(fileBuffer: Buffer, mimeType: string) { | ||
const fileExt = MIME_TYPE_TO_EXT[mimeType]; | ||
if (!fileExt) throw new Error(`Unsupported document type: ${mimeType}`); | ||
|
||
const filename = `${crypto.randomUUID()}.${fileExt}`; | ||
const filepath = `${UPLOADED_FOLDER}/${filename}`; | ||
const fileurl = `${process.env.FILESERVER_URL_PREFIX}/${filepath}`; | ||
|
||
if (!fs.existsSync(UPLOADED_FOLDER)) { | ||
fs.mkdirSync(UPLOADED_FOLDER, { recursive: true }); | ||
} | ||
await fs.promises.writeFile(filepath, fileBuffer); | ||
|
||
console.log(`Saved document file to ${filepath}.\nURL: ${fileurl}`); | ||
return { | ||
filename, | ||
filepath, | ||
fileurl, | ||
}; | ||
} |
65 changes: 65 additions & 0 deletions
65
templates/components/llamaindex/typescript/documents/pipeline.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
import { | ||
BaseNode, | ||
Document, | ||
IngestionPipeline, | ||
Metadata, | ||
Settings, | ||
SimpleNodeParser, | ||
storageContextFromDefaults, | ||
VectorStoreIndex, | ||
} from "llamaindex"; | ||
import { LlamaCloudIndex } from "llamaindex/cloud/LlamaCloudIndex"; | ||
import { getDataSource } from "../../engine"; | ||
|
||
export async function runPipeline(documents: Document[], filename: string) { | ||
const currentIndex = await getDataSource(); | ||
|
||
// Update documents with metadata | ||
for (const document of documents) { | ||
document.metadata = { | ||
...document.metadata, | ||
file_name: filename, | ||
private: "true", // to separate from other public documents | ||
}; | ||
} | ||
|
||
if (currentIndex instanceof LlamaCloudIndex) { | ||
// LlamaCloudIndex processes the documents automatically | ||
// so we don't need ingestion pipeline, just insert the documents directly | ||
for (const document of documents) { | ||
await currentIndex.insert(document); | ||
} | ||
} else { | ||
// Use ingestion pipeline to process the documents into nodes and add them to the vector store | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [ | ||
new SimpleNodeParser({ | ||
chunkSize: Settings.chunkSize, | ||
chunkOverlap: Settings.chunkOverlap, | ||
}), | ||
Settings.embedModel, | ||
], | ||
}); | ||
const nodes = await pipeline.run({ documents }); | ||
await addNodesToVectorStore(nodes, currentIndex); | ||
} | ||
|
||
return documents.map((document) => document.id_); | ||
} | ||
|
||
async function addNodesToVectorStore( | ||
nodes: BaseNode<Metadata>[], | ||
currentIndex: VectorStoreIndex | null, | ||
) { | ||
if (currentIndex) { | ||
await currentIndex.insertNodes(nodes); | ||
} else { | ||
// Not using vectordb and haven't generated local index yet | ||
const storageContext = await storageContextFromDefaults({ | ||
persistDir: "./cache", | ||
}); | ||
currentIndex = await VectorStoreIndex.init({ nodes, storageContext }); | ||
} | ||
currentIndex.storageContext.docStore.persist(); | ||
console.log("Added nodes to the vector store."); | ||
} |
11 changes: 11 additions & 0 deletions
11
templates/components/llamaindex/typescript/documents/upload.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import { loadDocuments, saveDocument } from "./helper"; | ||
import { runPipeline } from "./pipeline"; | ||
|
||
export async function uploadDocument(raw: string): Promise<string[]> { | ||
const [header, content] = raw.split(","); | ||
const mimeType = header.replace("data:", "").replace(";base64", ""); | ||
const fileBuffer = Buffer.from(content, "base64"); | ||
const documents = await loadDocuments(fileBuffer, mimeType); | ||
const { filename } = await saveDocument(fileBuffer, mimeType); | ||
return await runPipeline(documents, filename); | ||
} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,23 +6,31 @@ import { | |
ToolCall, | ||
ToolOutput, | ||
} from "llamaindex"; | ||
import { LLamaCloudFileService } from "./service"; | ||
|
||
export function appendSourceData( | ||
export async function appendSourceData( | ||
data: StreamData, | ||
sourceNodes?: NodeWithScore<Metadata>[], | ||
) { | ||
if (!sourceNodes?.length) return; | ||
data.appendMessageAnnotation({ | ||
type: "sources", | ||
data: { | ||
nodes: sourceNodes.map((node) => ({ | ||
try { | ||
const nodes = await Promise.all( | ||
sourceNodes.map(async (node) => ({ | ||
...node.node.toMutableJSON(), | ||
id: node.node.id_, | ||
score: node.score ?? null, | ||
url: getNodeUrl(node.node.metadata), | ||
url: await getNodeUrl(node.node.metadata), | ||
})), | ||
}, | ||
}); | ||
); | ||
data.appendMessageAnnotation({ | ||
type: "sources", | ||
data: { | ||
nodes, | ||
}, | ||
}); | ||
} catch (error) { | ||
console.error("Error appending source data:", error); | ||
} | ||
} | ||
|
||
export function appendEventData(data: StreamData, title?: string) { | ||
|
@@ -68,9 +76,9 @@ export function createStreamTimeout(stream: StreamData) { | |
export function createCallbackManager(stream: StreamData) { | ||
const callbackManager = new CallbackManager(); | ||
|
||
callbackManager.on("retrieve-end", (data) => { | ||
callbackManager.on("retrieve-end", async (data) => { | ||
const { nodes, query } = data.detail.payload; | ||
appendSourceData(stream, nodes); | ||
await appendSourceData(stream, nodes); | ||
appendEventData(stream, `Retrieving context for query: '${query}'`); | ||
appendEventData( | ||
stream, | ||
|
@@ -97,19 +105,27 @@ export function createCallbackManager(stream: StreamData) { | |
return callbackManager; | ||
} | ||
|
||
function getNodeUrl(metadata: Metadata) { | ||
const url = metadata["URL"]; | ||
if (url) return url; | ||
const fileName = metadata["file_name"]; | ||
async function getNodeUrl(metadata: Metadata) { | ||
marcusschiesser marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (!process.env.FILESERVER_URL_PREFIX) { | ||
console.warn( | ||
"FILESERVER_URL_PREFIX is not set. File URLs will not be generated.", | ||
); | ||
return undefined; | ||
} | ||
if (fileName) { | ||
const folder = metadata["private"] ? "output/uploaded" : "data"; | ||
const fileName = metadata["file_name"]; | ||
if (fileName && process.env.FILESERVER_URL_PREFIX) { | ||
// file_name exists and file server is configured | ||
const isLocalFile = metadata["is_local_file"] === "true"; | ||
const pipelineId = metadata["pipeline_id"]; | ||
if (pipelineId && !isLocalFile) { | ||
// file is from LlamaCloud and was not ingested locally | ||
// TODO trigger but don't await file download and just use convention to generate the URL (see Python code) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thucpn see TODO |
||
// return `${process.env.FILESERVER_URL_PREFIX}/output/llamacloud/${pipelineId}\$${fileName}`; | ||
return await LLamaCloudFileService.getFileUrl(fileName, pipelineId); | ||
} | ||
const isPrivate = metadata["private"] === "true"; | ||
const folder = isPrivate ? "output/uploaded" : "data"; | ||
return `${process.env.FILESERVER_URL_PREFIX}/${folder}/${fileName}`; | ||
} | ||
return undefined; | ||
// fallback to URL in metadata (e.g. for websites) | ||
return metadata["URL"]; | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.