Skip to content

Additional metrics using ground truth #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
1dfac56
Updating ragas metrics
a-s-poorna Oct 30, 2024
10ed123
added the service for additional metrics
kartikpersistent Oct 30, 2024
c79ce37
additional metrics api
kartikpersistent Oct 30, 2024
3bd2dce
Adding Rouge to requirement
a-s-poorna Nov 4, 2024
a1f7d9d
changes done for additional metrics for gemini model
kaustubh-darekar Nov 5, 2024
952e426
Additional metrics changes related to gemini model
kaustubh-darekar Nov 5, 2024
8a1af97
Adding Rouge_Score Version
a-s-poorna Nov 6, 2024
ab180cd
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Nov 7, 2024
8a04328
Api Integration
kartikpersistent Nov 7, 2024
06006bc
payload changes
kartikpersistent Nov 7, 2024
8589013
payload fix
kartikpersistent Nov 8, 2024
ba18bdc
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Nov 8, 2024
c2767b7
Fixing Eval Error
a-s-poorna Nov 8, 2024
fd9c12e
Adding fact_score metric
a-s-poorna Nov 8, 2024
015b5e0
code refactoring
kartikpersistent Nov 8, 2024
727af4e
table integration
kartikpersistent Nov 8, 2024
5c5fe4b
data binding
kartikpersistent Nov 8, 2024
640c348
Integrated additional metrics on multimodes
kartikpersistent Nov 11, 2024
5b79a47
removed fact score
kartikpersistent Nov 11, 2024
84c7207
Removing Fact Score
a-s-poorna Nov 11, 2024
79ab316
fix: Multimode fix
kartikpersistent Nov 11, 2024
efc76d4
Merge branch 'ragas_metric_addition' of https://github.com/neo4j-labs…
kartikpersistent Nov 11, 2024
90ae3f5
custommiddleware for gzip
kartikpersistent Nov 11, 2024
60f97d6
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Nov 11, 2024
b09915f
removed unused state
kartikpersistent Nov 11, 2024
a998bf5
message changes
kartikpersistent Nov 11, 2024
7e3f5a0
uncommented gzipmiddleware
kartikpersistent Nov 11, 2024
e50d3ef
code refactoring
kartikpersistent Nov 12, 2024
ec7d3ff
removed settings modal code
kartikpersistent Nov 12, 2024
2baeee4
Table UI Fixes
kartikpersistent Nov 12, 2024
609cb0f
removed state
kartikpersistent Nov 12, 2024
0c9fbfc
Merge branch 'DEV' into ragas_metric_addition
kartikpersistent Nov 12, 2024
3b90eb2
UX improvements for chunks popup
kartikpersistent Nov 12, 2024
20288bb
added the status check
kartikpersistent Nov 12, 2024
89b5d78
ndl version changes
kartikpersistent Nov 13, 2024
fb9b410
tip and dropdown changes
kartikpersistent Nov 13, 2024
25c729a
icon fixes
kartikpersistent Nov 13, 2024
85a233a
contextmenu fix
kartikpersistent Nov 13, 2024
19fe776
Box CSS fix
kartikpersistent Nov 13, 2024
17b3d27
icon fixes
kartikpersistent Nov 13, 2024
090bf76
icon changes
kartikpersistent Nov 13, 2024
1bdcbc9
IsRoot fix
kartikpersistent Nov 13, 2024
4bc1f45
added the tooltip for metrics
kartikpersistent Nov 14, 2024
b333215
Menu fix inside modal
kartikpersistent Nov 14, 2024
3362e80
hover color fix
kartikpersistent Nov 14, 2024
28df550
menu changes
kartikpersistent Nov 14, 2024
8ec99c5
format and lint fixes
kartikpersistent Nov 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,5 +179,5 @@ PyMuPDF==1.24.5
pypandoc==1.13
graphdatascience==1.10
Secweb==1.11.0
ragas==0.1.14

ragas==0.2.2
rouge_score==0.1.2
35 changes: 34 additions & 1 deletion backend/score.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ async def __call__(self, scope: Scope, receive: Receive, send: Send):
app.add_middleware(ContentSecurityPolicy, Option={'default-src': ["'self'"], 'base-uri': ["'self'"], 'block-all-mixed-content': []}, script_nonce=False, style_nonce=False, report_only=False)
app.add_middleware(XContentTypeOptions)
app.add_middleware(XFrame, Option={'X-Frame-Options': 'DENY'})
#app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=5)
app.add_middleware(CustomGZipMiddleware, minimum_size=1000, compresslevel=5,paths=["/sources_list","/url/scan","/extract","/chat_bot","/chunk_entities","/get_neighbours","/graph_query","/schema","/populate_graph_schema","/get_unconnected_nodes_list","/get_duplicate_nodes","/fetch_chunktext"])
app.add_middleware(
CORSMiddleware,
Expand Down Expand Up @@ -847,6 +846,40 @@ async def calculate_metric(question: str = Form(),
)
finally:
gc.collect()


@app.post('/additional_metrics')
async def calculate_additional_metrics(question: str = Form(),
context: str = Form(),
answer: str = Form(),
reference: str = Form(),
model: str = Form(),
mode: str = Form(),
):
try:
context_list = [str(item).strip() for item in json.loads(context)] if context else []
answer_list = [str(item).strip() for item in json.loads(answer)] if answer else []
mode_list = [str(item).strip() for item in json.loads(mode)] if mode else []
result = await get_additional_metrics(question, context_list,answer_list, reference, model)
if result is None or "error" in result:
return create_api_response(
'Failed',
message='Failed to calculate evaluation metrics.',
error=result.get("error", "Ragas evaluation returned null")
)
data = {mode: {metric: result[i][metric] for metric in result[i]} for i, mode in enumerate(mode_list)}
return create_api_response('Success', data=data)
except Exception as e:
logging.exception(f"Error while calculating evaluation metrics: {e}")
return create_api_response(
'Failed',
message="Error while calculating evaluation metrics",
error=str(e)
)
finally:
gc.collect()



@app.post("/fetch_chunktext")
async def fetch_chunktext(
Expand Down
2 changes: 1 addition & 1 deletion backend/src/QA_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -660,7 +660,7 @@ def QA_RAG(graph,model, question, document_names, session_id, mode, write_access
if document_names and not chat_mode_settings["document_filter"]:
result = {
"session_id": "",
"message": "This chat mode does support document selection",
"message": "Please deselect all documents in the table before using this chat mode",
"info": {
"sources": [],
"model": "",
Expand Down
48 changes: 48 additions & 0 deletions backend/src/ragas_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@
from ragas import evaluate
from ragas.metrics import answer_relevancy, faithfulness
from src.shared.common_fn import load_embedding_model
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import BleuScore, RougeScore, SemanticSimilarity, ContextEntityRecall
from ragas.metrics._factual_correctness import FactualCorrectness
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
import nltk

nltk.download('punkt')
load_dotenv()

EMBEDDING_MODEL = os.getenv("RAGAS_EMBEDDING_MODEL")
Expand Down Expand Up @@ -52,3 +62,41 @@ def get_ragas_metrics(question: str, context: list, answer: list, model: str):
except Exception as e:
logging.exception(f"Error during metrics evaluation: {e}")
return {"error": str(e)}


async def get_additional_metrics(question: str, contexts: list, answers: list, reference: str, model_name: str):
"""Calculates multiple metrics for given question, answers, contexts, and reference."""
try:
if ("diffbot" in model_name) or ("ollama" in model_name):
raise ValueError(f"Unsupported model for evaluation: {model_name}")
llm, model_name = get_llm(model=model_name)
ragas_llm = LangchainLLMWrapper(llm)
embeddings = EMBEDDING_FUNCTION
embedding_model = LangchainEmbeddingsWrapper(embeddings=embeddings)
rouge_scorer = RougeScore()
semantic_scorer = SemanticSimilarity()
entity_recall_scorer = ContextEntityRecall()
entity_recall_scorer.llm = ragas_llm
semantic_scorer.embeddings = embedding_model
metrics = []
for response, context in zip(answers, contexts):
sample = SingleTurnSample(response=response, reference=reference)
rouge_score = await rouge_scorer.single_turn_ascore(sample)
rouge_score = round(rouge_score,4)
semantic_score = await semantic_scorer.single_turn_ascore(sample)
semantic_score = round(semantic_score, 4)
if "gemini" in model_name:
entity_recall_score = "Not Available"
else:
entity_sample = SingleTurnSample(reference=reference, retrieved_contexts=[context])
entity_recall_score = await entity_recall_scorer.single_turn_ascore(entity_sample)
entity_recall_score = round(entity_recall_score, 4)
metrics.append({
"rouge_score": rouge_score,
"semantic_score": semantic_score,
"context_entity_recall_score": entity_recall_score
})
return metrics
except Exception as e:
logging.exception("Error in get_additional_metrics")
return {"error": str(e)}
9 changes: 5 additions & 4 deletions frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,12 @@
"@mui/material": "^5.15.10",
"@mui/styled-engine": "^5.15.9",
"@neo4j-devtools/word-color": "^0.0.8",
"@neo4j-ndl/base": "^2.12.7",
"@neo4j-ndl/react": "^2.16.9",
"@neo4j-nvl/base": "^0.3.3",
"@neo4j-nvl/react": "^0.3.3",
"@neo4j-ndl/base": "^3.0.10",
"@neo4j-ndl/react": "^3.0.17",
"@neo4j-nvl/base": "^0.3.6",
"@neo4j-nvl/react": "^0.3.6",
"@react-oauth/google": "^0.12.1",
"@tanstack/react-table": "^8.20.5",
"@types/uuid": "^9.0.7",
"axios": "^1.6.5",
"clsx": "^2.1.1",
Expand Down
11 changes: 10 additions & 1 deletion frontend/src/App.css
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
}

.contentWithExpansion {
width: calc(-840px + 100dvw);
width: calc(-807px + 100dvw);
height: calc(100dvh - 58px);
padding: 3px;
display: flex;
Expand Down Expand Up @@ -386,4 +386,13 @@
.custom-menu {
min-width: 250px;
max-width: 305px;
}
.ndl-modal-root{
z-index: 39 !important;
}
.tbody-dark .ndl-data-grid-tr:hover {
--cell-background: rgb(60 63 68) !important;
}
.tbody-light .ndl-data-grid-tr:hover {
--cell-background: rgb(226 227 229) !important;
}
7 changes: 4 additions & 3 deletions frontend/src/HOC/CustomModal.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ const CustomModal: React.FC<CustomModalProps> = ({
return (
<Dialog
size='small'
open={open}
isOpen={open}
modalProps={{
id: 'default-menu',
}}
Expand All @@ -25,16 +25,17 @@ const CustomModal: React.FC<CustomModalProps> = ({
<Dialog.Content className='n-flex n-flex-col n-gap-token-4 mt-6'>
{status !== 'unknown' && (
<Banner
closeable
isCloseable
description={statusMessage}
onClose={() => setStatus('unknown')}
type={status}
name='Custom Banner'
usage='inline'
/>
)}
<div className='n-flex n-flex-row n-flex-wrap'>{children}</div>
<Dialog.Actions className='mt-4'>
<Button onClick={submitHandler} size='medium' disabled={isDisabled}>
<Button onClick={submitHandler} size='medium' isDisabled={isDisabled}>
{submitLabel}
</Button>
</Dialog.Actions>
Expand Down
14 changes: 14 additions & 0 deletions frontend/src/HOC/withVisibility.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
interface VisibilityProps {
isVisible: boolean;
}
export function withVisibility<P>(WrappedComponent: React.ComponentType<P>) {
const VisibityControlled = (props: P & VisibilityProps) => {
if (props.isVisible === false) {
return null;
}

return <WrappedComponent {...props} />;
};

return VisibityControlled;
}
Loading