A comprehensive suite of tools for tracking and analyzing OpenShift CI test failures.
This repository contains two complementary tools:
- Dashboard - Web-based CI test health tracking with pass rate analytics and weekly reports
- ReportPortal to Jira Bridge - Automated failure tracking and ticket creation
Both tools are generic and customizable for any QE team tracking OpenShift CI tests.
This tool works for any QE team running periodic CI tests. To use it:
-
Fork this repository to your GitHub account/org
-
Choose an example configuration:
# See examples for Networking, Storage, or other teams cp examples/networking-team-config.yaml dashboard/config.yaml # OR customize from scratch - see dashboard/config.yaml
-
Edit configuration for your team:
- Update
job_patternsto match your periodic CI jobs - Set
versions(e.g., 4.21, 4.22, 4.23) - Choose
platforms(aws, gcp, azure, vsphere, etc.)
- Update
-
Deploy:
- OpenShift: See
dashboard/openshift/README.md - Local:
cd dashboard && pip install -r requirements.txt && ./dashboard.py serve
- OpenShift: See
See CONTRIBUTING.md for detailed setup instructions.
- Examples: Check
examples/directory for sample configurations - Documentation: See
CONTRIBUTING.mdfor detailed customization guide - Issues: Open an issue if you need assistance
Example job patterns for different teams:
- WinC:
periodic-ci-*-winc-* - Networking:
periodic-ci-*-network-*,periodic-ci-*-ovn-* - Storage:
periodic-ci-*-storage-*,periodic-ci-*-csi-* - Your team: Find your jobs at https://prow.ci.openshift.org/
┌─────────────────┐
│ ReportPortal │ ← Prow periodic jobs report results
│ (Data Source) │
└────────┬────────┘
│ API queries
↓
┌─────────────────┐
│ Failure Tracker │ ← Python script running periodically
│ Script │
└────────┬────────┘
│
├─→ Query ReportPortal API for recent failures
├─→ Filter by: version (4.20, 4.21, 4.22), test type (winc)
├─→ Analyze failure patterns
├─→ Check for existing Jira tickets
│
↓
┌─────────────────┐
│ Jira (WINC) │ ← Auto-create tickets for new failures
│ Project │
└─────────────────┘
- One Ticket Per Test Case: Creates a single Jira ticket for each unique test (e.g., OCP-11111)
- Multi-Platform Aggregation: If OCP-11111 fails on AWS, GCP, and Azure, all failures appear in ONE ticket
- Comprehensive Failure Links: Lists every failure instance with direct links to ReportPortal logs
- Platform Breakdown: Shows failure count per platform (AWS: 3 failures, GCP: 2 failures, etc.)
- Version Tracking: Track failures across multiple OpenShift versions (4.19, 4.20, 4.21, 4.22)
- Team Configuration: YAML-based per-team configuration (jobs, platforms, thresholds)
- Server-Side Filtering: Efficient API queries with status and name filtering
- Configurable CLI: Adjust page size, max pages, and workers via command-line options
- SAML/OAuth Authentication: Optional Red Hat SSO integration for dashboard access control (see SAML Authentication)
GET /api/v1/{projectName}/launch
Filter by:
- Launch name pattern: periodic-ci-openshift-openshift-tests-private-release-*-winc-*
- Status: FAILED
- Time range: Last 7 days
GET /api/v1/{projectName}/item/{launchId}
For each failed launch:
- Get test items with status FAILED
- Extract test name, error message, stack trace
- Identify test file and line number
Failure Pattern = {
"test_name": "OCP-39451",
"test_file": "test/extended/winc/winc.go",
"error_signature": hash(error_message),
"versions_affected": ["4.20", "4.21", "4.22"],
"platforms_affected": ["aws", "azure", "gcp"],
"failure_count": 15,
"first_seen": "2026-01-25",
"last_seen": "2026-02-02"
}JQL Query:
project = WINC AND
labels = "ci-failure" AND
summary ~ "OCP-39451" AND
status NOT IN (Closed, Resolved)
- If no ticket exists → Create new
- If ticket exists but pattern changed → Add comment
- If ticket is old but failure recurring → Reopen with new data
ci-failure-tracker/
├── README.md # This file
├── requirements.txt # Python dependencies
├── ci_failure_tracker.py # Main script (new tool)
├── src/
│ └── core/
│ ├── config_loader.py # Team configuration loader
│ └── jira_client.py # Jira API client
├── teams/
│ ├── winc.yaml # WINC team configuration (example)
│ └── README.md # Team config documentation
└── venv/ # Python virtual environment
reportportal:
url: "https://reportportal-openshift.apps.dno.ocp-hub.prod.psi.redhat.com"
project: "prow"
api_token: "${REPORTPORTAL_API_TOKEN}" # Set in environment
jira:
url: "https://issues.redhat.com"
project: "WINC"
parent_story: "WINC-1552" # Parent epic for CI failures
# Authentication via environment variables:
# - JIRA_USER (username or email)
# - JIRA_API_TOKEN (personal API token)
tracking:
versions:
- "4.20"
- "4.21"
- "4.22"
platforms:
- "aws"
- "azure"
- "gcp"
- "vsphere"
- "nutanix"
job_patterns:
- "periodic-ci-openshift-openshift-tests-private-release-*-winc-*"
- "periodic-ci-openshift-windows-machine-config-operator-release-*"
lookback_days: 7 # How far back to search
failure_threshold: 3 # Minimum failures before creating ticket
labels:
- "ci-failure"
- "automated"
- "phase-1-stabilization"
ticket_template: |
h2. Automated CI Failure Report
*Test*: {test_name}
*Test File*: {{test_file}}:{line_number}
*Affected Versions*: {versions}
*Affected Platforms*: {platforms}
h2. Failure Summary
* *First Seen*: {first_seen}
* *Last Seen*: {last_seen}
* *Failure Count*: {failure_count} failures in {lookback_days} days
* *Failure Rate*: {failure_rate}%
h2. Error Message
{code}
{error_message}
{code}
h2. Recent Failures
{failure_table}
h2. ReportPortal Links
{reportportal_links}
h2. Recommended Actions
# Review test code at {{test_file}}:{line_number}
# Check for recent changes in affected test
# Verify if issue is platform-specific or version-specific
# Investigate error logs in ReportPortal
---
_This ticket was automatically created by CI Failure Tracker_
_Configuration: versions={versions}, threshold={failure_threshold}, lookback={lookback_days}d_cd /Users/rrasouli/Documents/GitHub/openshift-tests-private/tools/ci-failure-tracker
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Required for all operations
export REPORTPORTAL_API_TOKEN="your-reportportal-token"
# Required only for creating tickets (not needed for --dry-run)
export JIRA_USER="your-jira-username-or-email"
export JIRA_API_TOKEN="your-jira-api-token"Note:
- For dry-run mode, only
REPORTPORTAL_API_TOKENis needed - Jira authentication uses REST API with username/token
- Get Jira API token from: https://id.atlassian.com/manage-profile/security/api-tokens
# Dry run (no ticket creation, analyze only)
./ci_failure_tracker.py --team winc --dry-run --days 7
# Analyze with custom lookback period
./ci_failure_tracker.py --team winc --dry-run --days 14
# Fetch more launches (if failures are missing)
./ci_failure_tracker.py --team winc --dry-run --days 7 --max-pages 20
# Use larger page sizes for fewer API calls
./ci_failure_tracker.py --team winc --dry-run --page-size 300 --max-pages 3
# Run and create tickets (remove --dry-run)
./ci_failure_tracker.py --team winc --days 7
# Show help
./ci_failure_tracker.py --help--team TEXT: Team ID (required) - must match a YAML file inteams/directory--dry-run: Analyze failures without creating Jira tickets--verbose: Enable verbose output for debugging--days N: Override lookback period from config (default: from team config)--page-size N: Number of launches per API page (default: 150)--max-pages N: Maximum API pages to fetch (default: 5)--max-workers N: Number of parallel workers (default: from team config)
# Add to crontab to run daily at 9 AM
0 9 * * * cd /path/to/ci-failure-tracker && ./venv/bin/python ci_failure_tracker.py
# Or create a Prow periodic job (recommended for running in CI cluster)The tool creates one Jira ticket per unique test case, regardless of:
- How many platforms it failed on (AWS, GCP, Azure, etc.)
- How many times it failed
- What the specific error messages were
- Which OCP versions were affected
Example:
- Test:
OCP-11111 - Failures:
- AWS: 3 failures across 4.19, 4.20
- GCP: 2 failures on 4.21
- Azure: 1 failure on 4.20
- Result: ONE ticket with all 6 failures listed, grouped by platform
# Group by test_name ONLY (not error signature)
for instance in all_failures:
ticket_key = instance.test_name # e.g., "OCP-11111"
grouped_failures[ticket_key].append(instance)
# Check if ticket already exists in Jira
if jira.find_ticket(summary=f"CI Failure: {test_name}"):
skip # Ticket already exists
else:
create_ticket(test_name, all_instances_for_this_test)CI Failure Tracker - Run at 2026-02-02 10:00:00
============================================
Querying ReportPortal for versions: 4.20, 4.21, 4.22
Looking back: 7 days
Found 45 failed periodic jobs:
- 4.20: 12 failures
- 4.21: 18 failures
- 4.22: 15 failures
Analyzing failure patterns...
Pattern 1: OCP-39451 - Windows→Linux ClusterIP failure
Versions: 4.20, 4.21, 4.22
Platforms: aws, azure, gcp
Failures: 15 (23% failure rate)
Status: Existing ticket WINC-1605 ✓
Action: Skipped (ticket exists)
Pattern 2: OCP-77777 - WMCO metrics timeout
Versions: 4.21, 4.22
Platforms: vsphere
Failures: 8 (45% failure rate)
Status: No ticket found
Action: Creating ticket... WINC-1606 ✓
Pattern 3: OCP-43832 - BYOH zone parsing
Versions: 4.20
Platforms: nutanix
Failures: 2 (5% failure rate)
Status: Below threshold (3)
Action: Skipped (not enough failures)
Summary:
- Total patterns found: 12
- Tickets created: 3
- Tickets updated: 1
- Skipped (existing): 5
- Skipped (threshold): 3
See example in the ticket template above.
- ReportPortal API: https://developers.reportportal.io/api-docs/
- Jira REST API: https://developer.atlassian.com/cloud/jira/platform/rest/v3/
- How to get ReportPortal API token: https://reportportal.io/docs/log-data-in-reportportal/HowToGetAnAccessTokenInReportPortal/
A companion tool for tracking test health over time. Located in /dashboard:
Features:
- Track test pass rates over 7/14/30/60/90 days
- Compare performance across OpenShift versions (4.21 vs 4.22)
- Identify worst-performing tests
- Interactive web dashboard with charts and trends
- Filter by version and time range
Quick Start:
cd dashboard
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Collect data from ReportPortal
./dashboard.py collect --days 30
# Start web dashboard at http://localhost:8080
./dashboard.py serveDocumentation:
- User Guide - Complete walkthrough
- Quick Start - One-page reference
- Technical README - Architecture and development
Use Case: While this CI Failure Tracker creates Jira tickets for individual failures, the dashboard provides a high-level view of overall test health and trends.
- Slack Notifications: Send daily summary to #windows-containers channel
- Trend Analysis: Track if failures are increasing/decreasing over time
- Auto-assignment: Assign tickets based on test ownership (CODEOWNERS)
- Integration with TestGrid: Cross-reference with OpenShift TestGrid data
- ML-based Grouping: Use machine learning to group similar failures
- Auto-close: Close tickets when test passes consistently for N days
Edit config.yaml:
tracking:
versions:
- "4.23" # Add new versiontracking:
failure_threshold: 5 # Require 5 failures instead of 3
lookback_days: 14 # Look back 14 days instead of 7Create a skip list in config.yaml:
skip_tests:
- "OCP-12345" # Known flaky, tracked elsewhere
- "OCP-67890" # Intentionally failing- Check ReportPortal API token is set:
echo $REPORTPORTAL_API_TOKEN - Check Jira credentials are set (only needed for creating tickets):
echo $JIRA_USERecho $JIRA_API_TOKEN
- For dry-run mode, only
REPORTPORTAL_API_TOKENis needed
- Verify job name patterns in config match actual Prow job names
- Check date range (expand lookback_days)
- Verify ReportPortal project name is correct
- Check JQL query is working:
jira issue list -q "project=WINC AND labels=ci-failure" - Verify error signature generation is stable
- Test changes locally with
--dry-run - Add unit tests for new features
- Update this README with configuration changes
- Submit PR to openshift-tests-private repo