Ready Tensor Agentic AI Certification – Unit 7

This repository contains lesson materials, code examples, and evaluation scripts for Unit 7 of the Agentic AI Developer Certification Program by Ready Tensor. This week is all about evaluating agentic systems — how to measure performance, diagnose behavior, and build trust in AI that thinks and acts.

⚠️ Note: Code examples for Lessons 4, 5, and 6 will be added soon. We’re actively finalizing the evaluation scripts and datasets to ensure clarity and alignment with the lesson content. Stay tuned — updates coming shortly!

What You'll Learn

Why traditional evaluation breaks down for adaptive, agentic systems
A practical toolkit of evaluation methods — from LLM-as-judge to red teaming
How to select the right metrics based on your system’s goals and design
How to use RAGAS for scoring retrieval-augmented generation pipelines
How to use DeepEval for multi-step evaluation with custom metrics
How to evaluate multi-agent systems — measuring coordination, not just components

Lessons in This Repository

1. Evaluating Agentic AI: Beyond Traditional Testing

Learn why adaptive AI systems require new thinking in evaluation. We go beyond test cases and accuracy scores to ask: What does “success” look like for a system that reasons, adapts, and collaborates?

2. The Evaluation Toolkit: Methods for Testing Agentic AI

Explore seven hands-on methods — from human review to red teaming to automated scoring. Understand strengths, weaknesses, and best-fit use cases for each.

3. Your System, Your Metrics: A Practical Guide to Agentic Evaluation

Choose evaluation metrics that actually matter. We walk through seven design dimensions (like output type and interaction mode) to help you tailor metrics to your own system.

4. How to Evaluate LLM Applications: A Complete RAGAS Tutorial

Hands-on walkthrough of RAGAS — a framework for evaluating RAG pipelines. Learn how to generate test sets, define evaluation workflows, and create custom metrics.

5. How to Evaluate LLM Applications: A Complete DeepEval Tutorial

Dive into DeepEval, a flexible evaluation toolkit for real-world LLM apps. Learn how to create structured evaluation flows, define correctness and faithfulness, and integrate with your dev workflow.

6. Evaluating Multi-Agent AI Systems: A Comprehensive Case Study

A real case study for evaluating a multi-agent system using golden datasets, coordination metrics, and system-level scoring. Learn how to assess collaboration — not just component quality.

Repository Structure

rt-agentic-ai-cert-unit7/
├── code/
│   ├── llm.py                                  # LLM utility wrapper
│   ├── paths.py                                # Standardized file path management
│   ├── prompt_builder.py                       # Modular prompt construction functions
│   ├── run_lesson4_ragas_eval.py                 # Lesson 4: Example script for RAGAS-based evaluation
│   ├── run_lesson5_deepeval_demo.py              # Lesson 5: Evaluation pipeline using DeepEval
│   ├── run_lesson6_multiagent_case_study.py      # Lesson 6: Multi-agent evaluation case study
│   └── utils.py                                # Common utilities
├── config/                                       # Configuration files
├── data/                                         # Input data for code examples
├── outputs/                                      # Output files from code examples
├── lessons/                                      # Lesson content and assets
├── .env.example                                  # Sample environment file (e.g., for API keys)
├── .gitignore
├── LICENSE
├── README.md                                     # You are here
└── requirements.txt                              # Python dependencies for evaluation tools

Installation & Setup

Clone the repository:

git clone https://github.com/readytensor/rt-agentic-ai-cert-week7.git
cd rt-agentic-ai-cert-week7

Install dependencies:
```
pip install -r requirements.txt
```
Set up your environment variables:

Copy the .env.example to .env and fill in required values (e.g., OpenAI or Groq API keys):
```
cp .env.example .env
```

Running the Evaluation Examples

Each code example is runnable as a standalone script:

Lesson 4 – RAGAS Evaluation:
```
python code/run_lesson4_ragas_eval.py
```

Lesson 5 – DeepEval Evaluation:

python code/run_lesson5_deepeval_demo.py

Lesson 6 – Multi-Agent Evaluation:

python code/run_lesson6_multiagent_case_study.py

Evaluation reports will be saved to the outputs/evaluation_reports/ folder.

License

This project is licensed under the CC BY-NC-SA 4.0 License – see the LICENSE file for details.

Contact

Ready Tensor, Inc.

Email: contact at readytensor dot com
Issues & Contributions: Open an issue or PR on this repo
Website: https://readytensor.ai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ready Tensor Agentic AI Certification – Unit 7

What You'll Learn

Lessons in This Repository

1. Evaluating Agentic AI: Beyond Traditional Testing

2. The Evaluation Toolkit: Methods for Testing Agentic AI

3. Your System, Your Metrics: A Practical Guide to Agentic Evaluation

4. How to Evaluate LLM Applications: A Complete RAGAS Tutorial

5. How to Evaluate LLM Applications: A Complete DeepEval Tutorial

6. Evaluating Multi-Agent AI Systems: A Comprehensive Case Study

Repository Structure

Installation & Setup

Running the Evaluation Examples

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
code		code
config		config
data		data
lessons		lessons
outputs		outputs
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Ready Tensor Agentic AI Certification – Unit 7

What You'll Learn

Lessons in This Repository

1. Evaluating Agentic AI: Beyond Traditional Testing

2. The Evaluation Toolkit: Methods for Testing Agentic AI

3. Your System, Your Metrics: A Practical Guide to Agentic Evaluation

4. How to Evaluate LLM Applications: A Complete RAGAS Tutorial

5. How to Evaluate LLM Applications: A Complete DeepEval Tutorial

6. Evaluating Multi-Agent AI Systems: A Comprehensive Case Study

Repository Structure

Installation & Setup

Running the Evaluation Examples

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages