Skip to content

Commit 57bfc44

Browse files
praveshkumar1988kartikpersistentprakriti-solankeyvasanthasaikalluriabhishekkumar-27
authored
DEV to STAGING (#1022)
* fixed the rerendering of the table while file status is processing * fix: Read Only User Fix * Global search fulltext (#767) * added global search+vector+fulltext mode * added community details in chunk entities * added node ids * updated vector graph query * added entities and modified chat response * added params * api response changes * added chunk entity query * modifies query * payload changes * added nodetails properties * payload new changes * communities check * communities selecetion check * Communities bug solutions (#770) * added local chat history * added write access check * added write access param * labels cahnge for nodes * added fulltext creation * disabled the write and delete actions for read only user mode * modified query * test updates * test uupdated * enable communities * removed the selected prop * Read Only User Support (#766) * added local chat history * added write access check * added write access param * added fulltext creation * disabled the write and delete actions for read only user mode * modified query --------- Co-authored-by: vasanthasaikalluri <[email protected]> * storing the gds status and write access on refresh * enable communities label change --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> * readonly fixed on refresh * clear chat history * slectedFiles check for Chatbot * clear history --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> * Added elapsed time for extarction on each breakdown function * lint and format fixes * removed dev logs * communities fix * disabled the generate graph for read only user * format fixes * graph labels change * added the readonly check for already added waiting files * Retriever evaluation using RAGAS * deleted unused file * code optimization using memo * Added elapsed_time on each api and getiing time per_entity * Added the post processing Alert showcasing the ongoing post processing jobs * fix: readonly user retry option disable * update script to get details of extarcted doc * Issue fixed, Latency count per entity * Multiple chat modes selection (#780) * added Multi modes selection * multimodes state mangement * fix: state handling of chat details of the default mode * Added the ChatModeSwitch Component * modes switch statemangement * added the chatmodes switch in both view * removed the copied text * Handled the error scenario * fix: speech issue between modes * fix: Handled activespeech speech and othermessage modes switch * used requestanimationframe instead of setTimeOut * removed the commented code * Fix: ChatModes DeSelection on FIle Selection * Fix: Order of the chatmodes accordoing to selected chatmodes * Community optimization (#790) * modified leidens parameters * updated disconnected nodes query * excluded communities from dedup * added index creation * modified de dup query * added delete query for communities * Async way to create entities from multiple chunks (#788) * LLMs with latest langchain dev libraries * conflict resolved * all llm models with latest library changes * async way to get graph documents * indentation correction * fixed graph mode error (#792) * Raga's Evaluation Metrics (#787) * added Multi modes selection * ragas eval * added response * multimodes state mangement * fix: state handling of chat details of the default mode * Added the ChatModeSwitch Component * modes switch statemangement * added the chatmodes switch in both view * removed the copied text * Handled the error scenario * fix: speech issue between modes * ragas evaluation metric show * Output return type changed * fix: Handled activespeech speech and othermessage modes switch * used requestanimationframe instead of setTimeOut * removed the commented code * Added ragas to requirements * Integrated the metric api * ragas response updated, llm list updated * resolved syntax error in score * Added the Metrics Table * fix: Long text UI Issue * code optimization for evaluation * added the download button for downloading the info * key name change * Optimized the downloadClickHandler --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kaustubh-darekar <[email protected]> Co-authored-by: a-s-poorna <[email protected]> * Openai gemini config (#794) * openai and gemini models as config backend * updated dropdown llm values * updated docs * Added the user action for metrics table * Graph enhancements (#795) * graph changes * graph properties changes * graph communities changes * graph type selection * checkbox check changes * format changes * Communities Bug fixes (#775) * added global search+vector+fulltext mode * added community details in chunk entities * added node ids * updated vector graph query * added entities and modified chat response * added params * api response changes * added chunk entity query * modifies query * labels cahnge for nodes * payload changes * added nodetails properties * payload new changes * communities check * communities selecetion check * enable communities * removed the selected prop * enable communities label change * communities name change * cred check * tooltip * fix: Copy Icon Theme Fix --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * llm name changes * build fix * default mode fix * ragas model names update * lint fixes * Chunk Entities API condition * added the tooltip for unsupported lllms for ragas metric loading * removed unused imports * multimode fix when we get error response * mode changes for score display * fix: Fixed the details state handling between multiple chats feature: Added the warning banner If selected llm model is not supported for raga's evaluation * Fix: Entity Mode Width Fix * diffbot fix for async (#797) * Minor changes (#798) * added congig variable for default diffbot chat model * fulltext index creation is skipped when the labels are empty * entity vector change * added optinal to communities for entity mode * updated the entity query --------- Co-authored-by: kartikpersistent <[email protected]> * New: Added the supported llm models for ragas evaluation * Fix: Communitites Tab is displayed based communitites length * added the conversation download button (#800) * model name correction * chatmode switch mode fix * Add API payload GCP logging (#805) * Adding Links to get neighboring nodes (#796) * addition of link * added neighbours query * implemented with driver * updated the query * communitiesInfo name change * communities.tsx removed * api integration * modified response * entities change * chunk and communities * chunk space removal * added element id to chunks * loading on click * format changes * added file name for Dcoumrnt node * chat token cut off model name update * icon change * duplicate sources removal * Entity change --------- Co-authored-by: vasanthasaikalluri <[email protected]> * added error message for doc retriver (#807) * copy row (#803) * copy row * column for copy * column copy * Raga's Evaluation For Multi Modes (#806) * Updatedmodels for ragas eval * context utilization metrics removed * updated supported llms for ragas * removed context utilization * Implemented Parallel API * multi api calls error resolved * MultiMode Metrics * Fix: Metric Evalution For Single Mode * multi modes ragas evaluation * api payload changes * metric api output format changed * multi mode ragas changes * removed pre process dataset * api response changes * Multimode metrics api integration * nan error for no answer resolved * QA integration changes --------- Co-authored-by: kaustubh-darekar <[email protected]> * lint fixes * fix: multimode metrics state handling fix: lint fixes * fix: Multimode metrics mode change state issue fix: chunk list style issue * fix: list style fix * Correct TYPO mistake * added new env for ragas embedding model * Props name changes (#811) * Props name changes * removed the accesstoken from row on copy action * props changes for dropzone component * graph view changes --------- Co-authored-by: Prakriti Solankey <[email protected]> * test * view graph * nodes count and relationshipcount updation fix * sourceUrl Fix * empty string "" fix to keep the default values we should keep the value blank instead "" * prop changes * props changes * retry condition update for failed files (#820) * Chat modes name changes (#815) * Props name changes * removed the accesstoken from row on copy action * updated chat mode names * Chat Modes Name Changes * lint fixes * using readble format In UI * removal of size to avoid console warning * key add --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> * Youtube transcript fix with proxy (#822) * update script for async func * ragas changes for graph retrieval mode. context added in api output (#825) * Remove extract latency from logging and add LIMIT in duplicate nodes * Document updates (#828) * document updated with ragas evaluation information * formatting changes * chatbot api documentation updated * api details added in document * function name changed for drop create vector index api * Update README.md * updated api structire in docs (#827) * Update backend_docs.adoc * 821 llm model listing (#823) * added logic for document filters * LLM models * message change * link added * removed the text --------- Co-authored-by: vasanthasaikalluri <[email protected]> * Exclude session lable node from duplicate nodes list * Added the tooltip for disabled llm option (#835) * node size changes * mode removal of rows check * formatting * Exclude __Entity__ node label from duplicate node list * Update README.md * Update README.md * Update README.md * fixed the youtube link * Security header and GZIPMiddleware (#847) * Added security header all API * Add GZipMiddleware * Chunk Text Details (#850) * Community title added * Added api for fetching chunk text details * output format changed for chunk text * integrated the service layer for chunkdata * added the chunks * formatting output of llm call for title generation * formatting llm output for title generation * added flex row * Changes related to pagination of fetch chunk api * Integrated the pagination * page changes error resolved for fetch chunk api * for get neighbours api , community title added in properties * moving community title related changes to separate branch * Removed Query module from fastapi import statement * icon changes --------- Co-authored-by: kartikpersistent <[email protected]> * Communities Id to Title (#851) * Staging to main (#735) * Dev (#537) * format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> * disabled the sumbit buttom on loading * Deduplication tab (#566) * de-duplication API * Update De-Duplicate query * created the Deduplication tab * added the API service * added the removeable tags for similar nodes in deduplication tab * Integrate Tag * added GraphLabel * added loader state * added the merge service * integrated the merge API * Merge Query issue fixed * Auto refresh the duplicate nodes after merging operation * added the description for de duplication * reset on merging --------- Co-authored-by: Pravesh Kumar <[email protected]> * Update frontend_docs.adoc (#538) * Update frontend_docs.adoc * doc update * Images * Images folder change * Images folder change * test image * Update frontend_docs.adoc * image change * Update frontend_docs.adoc * Update frontend_docs.adoc * added the Graph Mode SS * added the Query SS * Update frontend_docs.adoc * conflics fix * conflict fix * Update frontend_docs.adoc --------- Co-authored-by: aashipandya <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * updated langchain versions (#565) * Update the De-Duplication query * Node relationship id type none issue (#547) * de-duplication API * Update De-Duplicate query * Issue fixed Nodes,Relationship Id and Type None or Blank * added the tooltips * type fix * Unneccory import * added score threshold and added some error handling (#571) * Update requirements.txt * Tooltip and other UI fixes (#572) * Staging To Main (#495) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * Dev (#433) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: Pravesh Kumar <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env * DEV to STAGING (#461) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: aashipandya <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * DEV to STAGING (#462) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <[email protected]> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <[email protected]> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <[email protected]> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: aashipandya <[email protected]> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <[email protected]> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <[email protected]> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <[email protected]> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <[email protected]> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <[email protected]> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <[email protected]> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: kartikpersistent <[email protected]> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <[email protected]> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <[email protected]> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <[email protected]> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <[email protected]> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <[email protected]> Co-authored-by: kartikpersistent <[email protected]> Co-authored-by: aashipandya <[email protected]> Co-authored-by: vasanthasaikalluri <[email protected]> Co-authored-by: Prakriti Solankey <[email protected]> Co-authored-by: Ajay Meena <[email protected]> Co-authored-by: Morgan Senechal <[email protected]> Co-authored-by: karanchellani <[email protected]> * added upload api * changed the dropzone error message * Dev to staging (#466) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <[email protected]> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering…
1 parent 8b169f5 commit 57bfc44

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+832
-602
lines changed

Diff for: README.md

+3
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,9 @@ Allow unauthenticated request : Yes
149149
| VITE_GOOGLE_CLIENT_ID | Optional | | Client ID for Google authentication |
150150
| VITE_LLM_MODELS_PROD | Optional | openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash | To Distinguish models based on the Enviornment PROD or DEV
151151
| VITE_LLM_MODELS | Optional | 'diffbot,openai_gpt_3.5,openai_gpt_4o,openai_gpt_4o_mini,gemini_1.5_pro,gemini_1.5_flash,azure_ai_gpt_35,azure_ai_gpt_4o,ollama_llama3,groq_llama3_70b,anthropic_claude_3_5_sonnet' | Supported Models For the application
152+
| VITE_AUTH0_CLIENT_ID | Mandatory if you are enabling Authentication otherwise it is optional | |Okta Oauth Client ID for authentication
153+
| VITE_AUTH0_DOMAIN | Mandatory if you are enabling Authentication otherwise it is optional | | Okta Oauth Cliend Domain
154+
| VITE_SKIP_AUTH | Optional | true | Flag to skip the authentication
152155

153156
## LLMs Supported
154157
1. OpenAI

Diff for: backend/example.env

+5-1
Original file line numberDiff line numberDiff line change
@@ -44,4 +44,8 @@ LLM_MODEL_CONFIG_ollama_llama3="model_name,model_local_url"
4444
YOUTUBE_TRANSCRIPT_PROXY="https://user:pass@domain:port"
4545
EFFECTIVE_SEARCH_RATIO=5
4646
GRAPH_CLEANUP_MODEL="openai_gpt_4o"
47-
CHUNKS_TO_BE_PROCESSED="50"
47+
CHUNKS_TO_BE_CREATED="50"
48+
BEDROCK_EMBEDDING_MODEL="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.titan-embed-text-v1"
49+
LLM_MODEL_CONFIG_bedrock_nova_micro_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-micro-v1:0"
50+
LLM_MODEL_CONFIG_bedrock_nova_lite_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-lite-v1:0"
51+
LLM_MODEL_CONFIG_bedrock_nova_pro_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-pro-v1:0"

Diff for: backend/score.py

+72-60
Large diffs are not rendered by default.

Diff for: backend/src/create_chunks.py

+12-6
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import logging
55
from src.document_sources.youtube import get_chunks_with_timestamps, get_calculated_timestamps
66
import re
7+
import os
78

89
logging.basicConfig(format="%(asctime)s - %(message)s", level="INFO")
910

@@ -25,23 +26,28 @@ def split_file_into_chunks(self):
2526
"""
2627
logging.info("Split file into smaller chunks")
2728
text_splitter = TokenTextSplitter(chunk_size=200, chunk_overlap=20)
29+
chunk_to_be_created = int(os.environ.get('CHUNKS_TO_BE_CREATED', '50'))
2830
if 'page' in self.pages[0].metadata:
2931
chunks = []
3032
for i, document in enumerate(self.pages):
3133
page_number = i + 1
32-
for chunk in text_splitter.split_documents([document]):
33-
chunks.append(Document(page_content=chunk.page_content, metadata={'page_number':page_number}))
34+
if len(chunks) >= chunk_to_be_created:
35+
break
36+
else:
37+
for chunk in text_splitter.split_documents([document]):
38+
chunks.append(Document(page_content=chunk.page_content, metadata={'page_number':page_number}))
3439

3540
elif 'length' in self.pages[0].metadata:
3641
if len(self.pages) == 1 or (len(self.pages) > 1 and self.pages[1].page_content.strip() == ''):
3742
match = re.search(r'(?:v=)([0-9A-Za-z_-]{11})\s*',self.pages[0].metadata['source'])
3843
youtube_id=match.group(1)
3944
chunks_without_time_range = text_splitter.split_documents([self.pages[0]])
40-
chunks = get_calculated_timestamps(chunks_without_time_range, youtube_id)
41-
45+
chunks = get_calculated_timestamps(chunks_without_time_range[:chunk_to_be_created], youtube_id)
4246
else:
43-
chunks_without_time_range = text_splitter.split_documents(self.pages)
44-
chunks = get_chunks_with_timestamps(chunks_without_time_range)
47+
chunks_without_time_range = text_splitter.split_documents(self.pages)
48+
chunks = get_chunks_with_timestamps(chunks_without_time_range[:chunk_to_be_created])
4549
else:
4650
chunks = text_splitter.split_documents(self.pages)
51+
52+
chunks = chunks[:chunk_to_be_created]
4753
return chunks

Diff for: backend/src/graphDB_dataAccess.py

+27-1
Original file line numberDiff line numberDiff line change
@@ -535,4 +535,30 @@ def update_node_relationship_count(self,document_name):
535535
"nodeCount" : nodeCount,
536536
"relationshipCount" : relationshipCount
537537
}
538-
return response
538+
return response
539+
540+
def get_nodelabels_relationships(self):
541+
node_query = """
542+
CALL db.labels() YIELD label
543+
WITH label
544+
WHERE NOT label IN ['Document', 'Chunk', '_Bloom_Perspective_', '__Community__', '__Entity__']
545+
CALL apoc.cypher.run("MATCH (n:`" + label + "`) RETURN count(n) AS count",{}) YIELD value
546+
WHERE value.count > 0
547+
RETURN label order by label
548+
"""
549+
550+
relation_query = """
551+
CALL db.relationshipTypes() yield relationshipType
552+
WHERE NOT relationshipType IN ['PART_OF', 'NEXT_CHUNK', 'HAS_ENTITY', '_Bloom_Perspective_','FIRST_CHUNK','SIMILAR','IN_COMMUNITY','PARENT_COMMUNITY']
553+
return relationshipType order by relationshipType
554+
"""
555+
556+
try:
557+
node_result = self.execute_query(node_query)
558+
node_labels = [record["label"] for record in node_result]
559+
relationship_result = self.execute_query(relation_query)
560+
relationship_types = [record["relationshipType"] for record in relationship_result]
561+
return node_labels,relationship_types
562+
except Exception as e:
563+
print(f"Error in getting node labels/relationship types from db: {e}")
564+
return []

Diff for: backend/src/llm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ def get_llm(model: str):
8989
)
9090

9191
llm = ChatBedrock(
92-
client=bedrock_client, model_id=model_name, model_kwargs=dict(temperature=0)
92+
client=bedrock_client,region_name=region_name, model_id=model_name, model_kwargs=dict(temperature=0)
9393
)
9494

9595
elif "ollama" in model:

Diff for: backend/src/main.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,6 @@ async def processing_source(uri, userName, password, database, model, file_name,
361361

362362
logging.info('Update the status as Processing')
363363
update_graph_chunk_processed = int(os.environ.get('UPDATE_GRAPH_CHUNKS_PROCESSED'))
364-
chunk_to_be_processed = int(os.environ.get('CHUNKS_TO_BE_PROCESSED', '50'))
365364
# selected_chunks = []
366365
is_cancelled_status = False
367366
job_status = "Completed"
@@ -676,7 +675,7 @@ def get_labels_and_relationtypes(graph):
676675
query = """
677676
RETURN collect {
678677
CALL db.labels() yield label
679-
WHERE NOT label IN ['Chunk','_Bloom_Perspective_', '__Community__', '__Entity__']
678+
WHERE NOT label IN ['Document','Chunk','_Bloom_Perspective_', '__Community__', '__Entity__']
680679
return label order by label limit 100 } as labels,
681680
collect {
682681
CALL db.relationshipTypes() yield relationshipType as type

Diff for: backend/src/post_processing.py

+26-40
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
from langchain_core.prompts import ChatPromptTemplate
99
from src.shared.constants import GRAPH_CLEANUP_PROMPT
1010
from src.llm import get_llm
11-
from src.main import get_labels_and_relationtypes
11+
from src.graphDB_dataAccess import graphDBdataAccess
12+
import time
1213

1314
DROP_INDEX_QUERY = "DROP INDEX entities IF EXISTS;"
1415
LABELS_QUERY = "CALL db.labels()"
@@ -195,50 +196,35 @@ def update_embeddings(rows, graph):
195196
return graph.query(query,params={'rows':rows})
196197

197198
def graph_schema_consolidation(graph):
198-
nodes_and_relations = get_labels_and_relationtypes(graph)
199-
logging.info(f"nodes_and_relations in existing graph : {nodes_and_relations}")
200-
node_labels = []
201-
relation_labels = []
202-
203-
node_labels.extend(nodes_and_relations[0]['labels'])
204-
relation_labels.extend(nodes_and_relations[0]['relationshipTypes'])
205-
199+
graphDb_data_Access = graphDBdataAccess(graph)
200+
node_labels,relation_labels = graphDb_data_Access.get_nodelabels_relationships()
206201
parser = JsonOutputParser()
207-
prompt = ChatPromptTemplate(messages=[("system",GRAPH_CLEANUP_PROMPT),("human", "{input}")],
208-
partial_variables={"format_instructions": parser.get_format_instructions()})
209-
210-
graph_cleanup_model = os.getenv("GRAPH_CLEANUP_MODEL",'openai_gpt_4o')
202+
prompt = ChatPromptTemplate(
203+
messages=[("system", GRAPH_CLEANUP_PROMPT), ("human", "{input}")],
204+
partial_variables={"format_instructions": parser.get_format_instructions()}
205+
)
206+
graph_cleanup_model = os.getenv("GRAPH_CLEANUP_MODEL", 'openai_gpt_4o')
211207
llm, _ = get_llm(graph_cleanup_model)
212208
chain = prompt | llm | parser
213-
nodes_dict = chain.invoke({'input':node_labels})
214-
relation_dict = chain.invoke({'input':relation_labels})
215-
216-
node_match = {}
217-
relation_match = {}
218-
for new_label , values in nodes_dict.items() :
219-
for old_label in values:
220-
if new_label != old_label:
221-
node_match[old_label]=new_label
222-
223-
for new_label , values in relation_dict.items() :
224-
for old_label in values:
225-
if new_label != old_label:
226-
relation_match[old_label]=new_label
227209

228-
logging.info(f"updated node labels : {node_match}")
229-
logging.info(f"updated relationship labels : {relation_match}")
230-
231-
# Update node labels in graph
232-
for old_label, new_label in node_match.items():
233-
query = f"""
234-
MATCH (n:`{old_label}`)
235-
SET n:`{new_label}`
236-
REMOVE n:`{old_label}`
237-
"""
238-
graph.query(query)
210+
nodes_relations_input = {'nodes': node_labels, 'relationships': relation_labels}
211+
mappings = chain.invoke({'input': nodes_relations_input})
212+
node_mapping = {old: new for new, old_list in mappings['nodes'].items() for old in old_list if new != old}
213+
relation_mapping = {old: new for new, old_list in mappings['relationships'].items() for old in old_list if new != old}
214+
215+
logging.info(f"Node Labels: Total = {len(node_labels)}, Reduced to = {len(set(node_mapping.values()))} (from {len(node_mapping)})")
216+
logging.info(f"Relationship Types: Total = {len(relation_labels)}, Reduced to = {len(set(relation_mapping.values()))} (from {len(relation_mapping)})")
217+
218+
if node_mapping:
219+
for old_label, new_label in node_mapping.items():
220+
query = f"""
221+
MATCH (n:`{old_label}`)
222+
SET n:`{new_label}`
223+
REMOVE n:`{old_label}`
224+
"""
225+
graph.query(query)
239226

240-
# Update relation types in graph
241-
for old_label, new_label in relation_match.items():
227+
for old_label, new_label in relation_mapping.items():
242228
query = f"""
243229
MATCH (n)-[r:`{old_label}`]->(m)
244230
CREATE (n)-[r2:`{new_label}`]->(m)

Diff for: backend/src/shared/common_fn.py

+41-2
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
import os
1212
from pathlib import Path
1313
from urllib.parse import urlparse
14-
14+
import boto3
15+
from langchain_community.embeddings import BedrockEmbeddings
1516

1617
def check_url_source(source_type, yt_url:str=None, wiki_query:str=None):
1718
language=''
@@ -77,6 +78,10 @@ def load_embedding_model(embedding_model_name: str):
7778
)
7879
dimension = 768
7980
logging.info(f"Embedding: Using Vertex AI Embeddings , Dimension:{dimension}")
81+
elif embedding_model_name == "titan":
82+
embeddings = get_bedrock_embeddings()
83+
dimension = 1536
84+
logging.info(f"Embedding: Using bedrock titan Embeddings , Dimension:{dimension}")
8085
else:
8186
embeddings = HuggingFaceEmbeddings(
8287
model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"
@@ -134,4 +139,38 @@ def last_url_segment(url):
134139
parsed_url = urlparse(url)
135140
path = parsed_url.path.strip("/") # Remove leading and trailing slashes
136141
last_url_segment = path.split("/")[-1] if path else parsed_url.netloc.split(".")[0]
137-
return last_url_segment
142+
return last_url_segment
143+
144+
def get_bedrock_embeddings():
145+
"""
146+
Creates and returns a BedrockEmbeddings object using the specified model name.
147+
Args:
148+
model (str): The name of the model to use for embeddings.
149+
Returns:
150+
BedrockEmbeddings: An instance of the BedrockEmbeddings class.
151+
"""
152+
try:
153+
env_value = os.getenv("BEDROCK_EMBEDDING_MODEL")
154+
if not env_value:
155+
raise ValueError("Environment variable 'BEDROCK_EMBEDDING_MODEL' is not set.")
156+
try:
157+
model_name, aws_access_key, aws_secret_key, region_name = env_value.split(",")
158+
except ValueError:
159+
raise ValueError(
160+
"Environment variable 'BEDROCK_EMBEDDING_MODEL' is improperly formatted. "
161+
"Expected format: 'model_name,aws_access_key,aws_secret_key,region_name'."
162+
)
163+
bedrock_client = boto3.client(
164+
service_name="bedrock-runtime",
165+
region_name=region_name.strip(),
166+
aws_access_key_id=aws_access_key.strip(),
167+
aws_secret_access_key=aws_secret_key.strip(),
168+
)
169+
bedrock_embeddings = BedrockEmbeddings(
170+
model_id=model_name.strip(),
171+
client=bedrock_client
172+
)
173+
return bedrock_embeddings
174+
except Exception as e:
175+
print(f"An unexpected error occurred: {e}")
176+
raise

Diff for: backend/src/shared/constants.py

+55-20
Original file line numberDiff line numberDiff line change
@@ -831,27 +831,62 @@
831831
DELETE_ENTITIES_AND_START_FROM_BEGINNING = "delete_entities_and_start_from_beginning"
832832
START_FROM_LAST_PROCESSED_POSITION = "start_from_last_processed_position"
833833

834-
GRAPH_CLEANUP_PROMPT = """Please consolidate the following list of types into a smaller set of more general, semantically
835-
related types. The consolidated types must be drawn from the original list; do not introduce new types.
836-
Return a JSON object representing the mapping of original types to consolidated types. Every key is the consolidated type
837-
and value is list of the original types that were merged into the consolidated type. Prioritize using the most generic and
838-
repeated term when merging. If a type doesn't merge with any other type, it should still be included in the output,
839-
mapped to itself.
840-
841-
**Input:** A list of strings representing the types to be consolidated. These types may represent either node
842-
labels or relationship labels Your algorithm should do appropriate groupings based on semantic similarity.
843-
844-
Example 1:
845-
Input:
846-
[ "Person", "Human", "People", "Company", "Organization", "Product"]
847-
Output :
848-
[Person": ["Person", "Human", "People"], Organization": ["Company", "Organization"], Product": ["Product"]]
849-
850-
Example 2:
851-
Input :
852-
["CREATED_FOR", "CREATED_TO", "CREATED", "PLACE", "LOCATION", "VENUE"]
834+
GRAPH_CLEANUP_PROMPT = """
835+
You are tasked with organizing a list of types into semantic categories based on their meanings, including synonyms or morphological similarities. The input will include two separate lists: one for **Node Labels** and one for **Relationship Types**. Follow these rules strictly:
836+
### 1. Input Format
837+
The input will include two keys:
838+
- `nodes`: A list of node labels.
839+
- `relationships`: A list of relationship types.
840+
### 2. Grouping Rules
841+
- Group similar items into **semantic categories** based on their meaning or morphological similarities.
842+
- The name of each category must be chosen from the types in the input list (node labels or relationship types). **Do not create or infer new names for categories**.
843+
- Items that cannot be grouped must remain in their own category.
844+
### 3. Naming Rules
845+
- The category name must reflect the grouped items and must be an existing type in the input list.
846+
- Use a widely applicable type as the category name.
847+
- **Do not introduce new names or types** under any circumstances.
848+
### 4. Output Rules
849+
- Return the output as a JSON object with two keys:
850+
- `nodes`: A dictionary where each key represents a category name for nodes, and its value is a list of original node labels in that category.
851+
- `relationships`: A dictionary where each key represents a category name for relationships, and its value is a list of original relationship types in that category.
852+
- Every key and value must come from the provided input lists.
853+
### 5. Examples
854+
#### Example 1:
855+
Input:
856+
{{
857+
"nodes": ["Person", "Human", "People", "Company", "Organization", "Product"],
858+
"relationships": ["CREATED_FOR", "CREATED_TO", "CREATED", "PUBLISHED","PUBLISHED_BY", "PUBLISHED_IN", "PUBLISHED_ON"]
859+
}}
860+
Output in JSON:
861+
{{
862+
"nodes": {{
863+
"Person": ["Person", "Human", "People"],
864+
"Organization": ["Company", "Organization"],
865+
"Product": ["Product"]
866+
}},
867+
"relationships": {{
868+
"CREATED": ["CREATED_FOR", "CREATED_TO", "CREATED"],
869+
"PUBLISHED": ["PUBLISHED_BY", "PUBLISHED_IN", "PUBLISHED_ON"]
870+
}}
871+
}}
872+
#### Example 2: Avoid redundant or incorrect grouping
873+
Input:
874+
{{
875+
"nodes": ["Process", "Process_Step", "Step", "Procedure", "Method", "Natural Process", "Step"],
876+
"relationships": ["USED_FOR", "USED_BY", "USED_WITH", "USED_IN"]
877+
}}
853878
Output:
854-
["CREATED": ["CREATED_FOR", "CREATED_TO", "CREATED"],"PLACE": ["PLACE", "LOCATION", "VENUE"]]
879+
{{
880+
"nodes": {{
881+
"Process": ["Process", "Process_Step", "Step", "Procedure", "Method", "Natural Process"]
882+
}},
883+
"relationships": {{
884+
"USED": ["USED_FOR", "USED_BY", "USED_WITH", "USED_IN"]
885+
}}
886+
}}
887+
### 6. Key Rule
888+
If any item cannot be grouped, it must remain in its own category using its original name. Do not repeat values or create incorrect mappings.
889+
Use these rules to group and name categories accurately without introducing errors or new types.
855890
"""
856891

857892
ADDITIONAL_INSTRUCTIONS = """Your goal is to identify and categorize entities while ensuring that specific data

Diff for: docker-compose.yml

+3
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ services:
6363
- VITE_BATCH_SIZE=${VITE_BATCH_SIZE-2}
6464
- VITE_LLM_MODELS=${VITE_LLM_MODELS-}
6565
- VITE_LLM_MODELS_PROD=${VITE_LLM_MODELS_PROD-openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash}
66+
- VITE_AUTH0_DOMAIN=${VITE_AUTH0_DOMAIN-}
67+
- VITE_AUTH0_CLIENT_ID=${VITE_AUTH0_CLIENT_ID-}
68+
- VITE_SKIP_AUTH=$VITE_SKIP_AUTH-true}
6669
- DEPLOYMENT_ENV=local
6770
volumes:
6871
- ./frontend:/app

Diff for: frontend/Dockerfile

+6
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ ARG VITE_CHAT_MODES=""
1313
ARG VITE_ENV="DEV"
1414
ARG VITE_BATCH_SIZE=2
1515
ARG VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
16+
ARG VITE_AUTH0_CLIENT_ID=""
17+
ARG VITE_AUTH0_DOMAIN=""
18+
ARG VITE_SKIP_AUTH="false"
1619

1720
WORKDIR /app
1821
COPY package.json yarn.lock ./
@@ -30,6 +33,9 @@ RUN VITE_BACKEND_API_URL=$VITE_BACKEND_API_URL \
3033
VITE_BATCH_SIZE=$VITE_BATCH_SIZE \
3134
VITE_LLM_MODELS=$VITE_LLM_MODELS \
3235
VITE_LLM_MODELS_PROD=$VITE_LLM_MODELS_PROD \
36+
VITE_AUTH0_CLIENT_ID=$VITE_AUTH0_CLIENT_ID \
37+
VITE_AUTH0_DOMAIN=$VITE_AUTH0_DOMAIN \
38+
VITE_SKIP_AUTH=$VITE_SKIP_AUTH \
3339
yarn run build
3440

3541
# Step 2: Serve the application using Nginx

Diff for: frontend/example.env

+3
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,6 @@ VITE_BATCH_SIZE=2
1212
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
1313
VITE_FRONTEND_HOSTNAME="localhost:8080"
1414
VITE_SEGMENT_API_URL=""
15+
VITE_AUTH0_CLIENT_ID=""
16+
VITE_AUTH0_DOMAIN=""
17+
VITE_SKIP_AUTH=true

Diff for: frontend/package.json

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
"preview": "vite preview"
1212
},
1313
"dependencies": {
14+
"@auth0/auth0-react": "^2.2.4",
1415
"@emotion/styled": "^11.11.0",
1516
"@mui/material": "^5.15.10",
1617
"@mui/styled-engine": "^5.15.9",

0 commit comments

Comments
 (0)