Multimodal AI & Agent API Evaluation Framework (GSoC 2026 PoC) by uddalak2005 · Pull Request #76 · foss42/gsoc-poc

uddalak2005 · 2026-04-20T15:07:59Z

EvalForge

EvalForge is a comprehensive evaluation framework for AI APIs across four core modalities:

Text (MMLU)
Multimodal (VQA)*
Agent Tool-Call Tracing (TFS)
MCP App Integration

Core Features

Multimodal VQA

Evaluate vision-language models with base64 image support.

Agent Trajectory Fidelity Score (TFS)

An original metric to measure how faithfully agents follow tool-call sequences compared to a gold standard.

MCP Apps Integration

The evaluation dashboard is itself an MCP App, allowing researchers to:

Trigger evaluations
View results directly inside an AI agent chat window

Live Analytics

A real-time React dashboard with:

Accuracy tracking
Latency monitoring
Token cost analysis
Historical insights

Remove outdated instructions and examples for React, TypeScript, and ESLint configuration.

uddalak2005 and others added 10 commits April 16, 2026 22:41

POC working

64454af

POC Ready

969d402

Delete 2026/uddalak_multimodal_ai_agent_eval/gsoc_poc_project_plan.md

7d47241

POC Ready with DOCS

582070a

Merge branch 'main' of https://github.com/uddalak2005/gsoc-poc

842421b

Create README.md

97c8473

Readme Updated

ad47355

Delete README.md content for React setup

03404fd

Remove outdated instructions and examples for React, TypeScript, and ESLint configuration.

Delete 2026/uddalak_multimodal_ai_agent_eval/frontend/README.md

3369a8f

Merge branch 'foss42:main' into main

0cff4f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal AI & Agent API Evaluation Framework (GSoC 2026 PoC)#76

Multimodal AI & Agent API Evaluation Framework (GSoC 2026 PoC)#76
uddalak2005 wants to merge 10 commits intofoss42:mainfrom
uddalak2005:main

uddalak2005 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uddalak2005 commented Apr 20, 2026

EvalForge

Core Features

Multimodal VQA

Agent Trajectory Fidelity Score (TFS)

MCP Apps Integration

Live Analytics

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant