Skip to content

feat: able to display file url from llamacloud #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
3befca4
feat: get llamacloud file url in fast api
thucpn Jul 5, 2024
46823c0
refactor: move llamacloud to service
thucpn Jul 5, 2024
0854c5b
fix: long url should be truncated
thucpn Jul 5, 2024
444e3b2
pre check pdf url
thucpn Jul 5, 2024
bc514c0
fix: display link on hover card when pdf fail
thucpn Jul 5, 2024
873b8f3
feat(ts): get file url from LLamaCloudFileService
thucpn Jul 5, 2024
9aead8a
Create happy-hairs-kick.md
thucpn Jul 8, 2024
41a544e
refactor: rename variables and add try-catch
thucpn Jul 8, 2024
72beab6
Merge branch 'feat/able-to-display-file-url-from-llamacloud' of githu…
thucpn Jul 10, 2024
7073584
style: don't need to change UI
thucpn Jul 10, 2024
e728391
feat(ts): save llamacloud file to local
thucpn Jul 10, 2024
49987ca
feat: download llamacloud file in python
thucpn Jul 10, 2024
9c34d6f
improve python code
leehuwuj Jul 11, 2024
174755e
feat: donot await download file function
thucpn Jul 11, 2024
ce65c0c
Merge branch 'feat/able-to-display-file-url-from-llamacloud' of githu…
thucpn Jul 11, 2024
ac22120
refactor python
leehuwuj Jul 11, 2024
bf3a8f5
Merge branch 'main' into feat/able-to-display-file-url-from-llamacloud
thucpn Jul 12, 2024
db9396d
refactor: move service file to llamaindex folder
thucpn Jul 12, 2024
43817f7
refactor: move private function to bottom
thucpn Jul 12, 2024
9490b28
download llamacloud file in output/llamacloud folder
thucpn Jul 12, 2024
7a90d6c
remove old getNodeUrl
thucpn Jul 12, 2024
8c70ce3
resolve python conflict
leehuwuj Jul 12, 2024
809f888
Upgrade to llamaindex 0.4.14
thucpn Jul 12, 2024
cf017fc
make dir const
thucpn Jul 12, 2024
086f90f
fix redundant python code
leehuwuj Jul 13, 2024
79a454a
refactor code
leehuwuj Jul 15, 2024
ee28a6e
refactor: rename documents file to upload.ts
thucpn Jul 15, 2024
9f1713f
refactor: split upload file to helper and pipeline
thucpn Jul 15, 2024
09f6537
feat: add custom pipeline for llamacloud
thucpn Jul 15, 2024
c377439
feat: copy pipeline from llamacloud
thucpn Jul 15, 2024
22ee650
remove llamacloud folder after copying
thucpn Jul 15, 2024
112d033
fix(css): url should be truncate
thucpn Jul 15, 2024
163824a
fix python filter with LlamaCloud
leehuwuj Jul 15, 2024
0088d17
refactor python code
leehuwuj Jul 16, 2024
35798b3
fix typo
leehuwuj Jul 16, 2024
c3b7621
remove copy typescript pipeline
leehuwuj Jul 16, 2024
f8062b4
fix TS to use local file
leehuwuj Jul 16, 2024
3d98549
add back showing log for llamacloud
leehuwuj Jul 16, 2024
4eb9339
refactor: Rename controllers to services for llama_cloud and file
leehuwuj Jul 16, 2024
4a069f1
feat: Add "is_local_file" metadata to documents in generate.py
leehuwuj Jul 16, 2024
fd57765
fix: index can be null when add nodes
thucpn Jul 17, 2024
c933ef4
Merge branch 'main' into feat/able-to-display-file-url-from-llamacloud
thucpn Jul 17, 2024
2bb7b67
fix filename
leehuwuj Jul 18, 2024
0272ca5
Merge remote-tracking branch 'origin/main' into feat/able-to-display-…
leehuwuj Jul 18, 2024
ffc221f
refactor code
leehuwuj Jul 18, 2024
8a5f098
fix: LlamaCloud auth for fromDocuments
marcusschiesser Jul 18, 2024
56197d6
fix: merge error
marcusschiesser Jul 18, 2024
42a0cd1
fix: is_local_file
marcusschiesser Jul 18, 2024
782f39b
fix: mount sub directories
marcusschiesser Jul 18, 2024
25b74a1
fix: always create output folders
marcusschiesser Jul 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/happy-hairs-kick.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"create-llama": patch
---

Display files in sources using LlamaCloud indexes.
7 changes: 3 additions & 4 deletions helpers/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,9 @@ export const installTemplate = async (
}

// Create outputs directory
if (props.tools && props.tools.length > 0) {
await makeDir(path.join(props.root, "output/tools"));
await makeDir(path.join(props.root, "output/uploaded"));
}
await makeDir(path.join(props.root, "output/tools"));
await makeDir(path.join(props.root, "output/uploaded"));
await makeDir(path.join(props.root, "output/llamacloud"));
} else {
// this is a frontend for a full-stack app, create .env file with model information
await createFrontendEnvFile(props.root, {
Expand Down
104 changes: 0 additions & 104 deletions templates/components/llamaindex/typescript/documents/documents.ts

This file was deleted.

44 changes: 44 additions & 0 deletions templates/components/llamaindex/typescript/documents/helper.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import fs from "fs";
import crypto from "node:crypto";
import { getExtractors } from "../../engine/loader";

const MIME_TYPE_TO_EXT: Record<string, string> = {
"application/pdf": "pdf",
"text/plain": "txt",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document":
"docx",
};

const UPLOADED_FOLDER = "output/uploaded";

export async function loadDocuments(fileBuffer: Buffer, mimeType: string) {
const extractors = getExtractors();
const reader = extractors[MIME_TYPE_TO_EXT[mimeType]];

if (!reader) {
throw new Error(`Unsupported document type: ${mimeType}`);
}
console.log(`Processing uploaded document of type: ${mimeType}`);
return await reader.loadDataAsContent(fileBuffer);
}

export async function saveDocument(fileBuffer: Buffer, mimeType: string) {
const fileExt = MIME_TYPE_TO_EXT[mimeType];
if (!fileExt) throw new Error(`Unsupported document type: ${mimeType}`);

const filename = `${crypto.randomUUID()}.${fileExt}`;
const filepath = `${UPLOADED_FOLDER}/${filename}`;
const fileurl = `${process.env.FILESERVER_URL_PREFIX}/${filepath}`;

if (!fs.existsSync(UPLOADED_FOLDER)) {
fs.mkdirSync(UPLOADED_FOLDER, { recursive: true });
}
await fs.promises.writeFile(filepath, fileBuffer);

console.log(`Saved document file to ${filepath}.\nURL: ${fileurl}`);
return {
filename,
filepath,
fileurl,
};
}
65 changes: 65 additions & 0 deletions templates/components/llamaindex/typescript/documents/pipeline.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import {
BaseNode,
Document,
IngestionPipeline,
Metadata,
Settings,
SimpleNodeParser,
storageContextFromDefaults,
VectorStoreIndex,
} from "llamaindex";
import { LlamaCloudIndex } from "llamaindex/cloud/LlamaCloudIndex";
import { getDataSource } from "../../engine";

export async function runPipeline(documents: Document[], filename: string) {
const currentIndex = await getDataSource();

// Update documents with metadata
for (const document of documents) {
document.metadata = {
...document.metadata,
file_name: filename,
private: "true", // to separate from other public documents
};
}

if (currentIndex instanceof LlamaCloudIndex) {
// LlamaCloudIndex processes the documents automatically
// so we don't need ingestion pipeline, just insert the documents directly
for (const document of documents) {
await currentIndex.insert(document);
}
} else {
// Use ingestion pipeline to process the documents into nodes and add them to the vector store
const pipeline = new IngestionPipeline({
transformations: [
new SimpleNodeParser({
chunkSize: Settings.chunkSize,
chunkOverlap: Settings.chunkOverlap,
}),
Settings.embedModel,
],
});
const nodes = await pipeline.run({ documents });
await addNodesToVectorStore(nodes, currentIndex);
}

return documents.map((document) => document.id_);
}

async function addNodesToVectorStore(
nodes: BaseNode<Metadata>[],
currentIndex: VectorStoreIndex | null,
) {
if (currentIndex) {
await currentIndex.insertNodes(nodes);
} else {
// Not using vectordb and haven't generated local index yet
const storageContext = await storageContextFromDefaults({
persistDir: "./cache",
});
currentIndex = await VectorStoreIndex.init({ nodes, storageContext });
}
currentIndex.storageContext.docStore.persist();
console.log("Added nodes to the vector store.");
}
11 changes: 11 additions & 0 deletions templates/components/llamaindex/typescript/documents/upload.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import { loadDocuments, saveDocument } from "./helper";
import { runPipeline } from "./pipeline";

export async function uploadDocument(raw: string): Promise<string[]> {
const [header, content] = raw.split(",");
const mimeType = header.replace("data:", "").replace(";base64", "");
const fileBuffer = Buffer.from(content, "base64");
const documents = await loadDocuments(fileBuffer, mimeType);
const { filename } = await saveDocument(fileBuffer, mimeType);
return await runPipeline(documents, filename);
}
52 changes: 34 additions & 18 deletions templates/components/llamaindex/typescript/streaming/events.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,31 @@ import {
ToolCall,
ToolOutput,
} from "llamaindex";
import { LLamaCloudFileService } from "./service";

export function appendSourceData(
export async function appendSourceData(
data: StreamData,
sourceNodes?: NodeWithScore<Metadata>[],
) {
if (!sourceNodes?.length) return;
data.appendMessageAnnotation({
type: "sources",
data: {
nodes: sourceNodes.map((node) => ({
try {
const nodes = await Promise.all(
sourceNodes.map(async (node) => ({
...node.node.toMutableJSON(),
id: node.node.id_,
score: node.score ?? null,
url: getNodeUrl(node.node.metadata),
url: await getNodeUrl(node.node.metadata),
})),
},
});
);
data.appendMessageAnnotation({
type: "sources",
data: {
nodes,
},
});
} catch (error) {
console.error("Error appending source data:", error);
}
}

export function appendEventData(data: StreamData, title?: string) {
Expand Down Expand Up @@ -68,9 +76,9 @@ export function createStreamTimeout(stream: StreamData) {
export function createCallbackManager(stream: StreamData) {
const callbackManager = new CallbackManager();

callbackManager.on("retrieve-end", (data) => {
callbackManager.on("retrieve-end", async (data) => {
const { nodes, query } = data.detail.payload;
appendSourceData(stream, nodes);
await appendSourceData(stream, nodes);
appendEventData(stream, `Retrieving context for query: '${query}'`);
appendEventData(
stream,
Expand All @@ -97,19 +105,27 @@ export function createCallbackManager(stream: StreamData) {
return callbackManager;
}

function getNodeUrl(metadata: Metadata) {
const url = metadata["URL"];
if (url) return url;
const fileName = metadata["file_name"];
async function getNodeUrl(metadata: Metadata) {
if (!process.env.FILESERVER_URL_PREFIX) {
console.warn(
"FILESERVER_URL_PREFIX is not set. File URLs will not be generated.",
);
return undefined;
}
if (fileName) {
const folder = metadata["private"] ? "output/uploaded" : "data";
const fileName = metadata["file_name"];
if (fileName && process.env.FILESERVER_URL_PREFIX) {
// file_name exists and file server is configured
const isLocalFile = metadata["is_local_file"] === "true";
const pipelineId = metadata["pipeline_id"];
if (pipelineId && !isLocalFile) {
// file is from LlamaCloud and was not ingested locally
// TODO trigger but don't await file download and just use convention to generate the URL (see Python code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thucpn see TODO

// return `${process.env.FILESERVER_URL_PREFIX}/output/llamacloud/${pipelineId}\$${fileName}`;
return await LLamaCloudFileService.getFileUrl(fileName, pipelineId);
}
const isPrivate = metadata["private"] === "true";
const folder = isPrivate ? "output/uploaded" : "data";
return `${process.env.FILESERVER_URL_PREFIX}/${folder}/${fileName}`;
}
return undefined;
// fallback to URL in metadata (e.g. for websites)
return metadata["URL"];
}
Loading
Loading