Benchmark Flows

This directory contains all benchmark implementations organized by category.

Running Benchmarks

# Run a specific benchmark by file path, default is 5 iterations
yarn test:e2e:benchmark test/e2e/benchmarks/flows/user-journey/onboarding-import-wallet.ts

# Run a preset (group of benchmarks)
yarn test:e2e:benchmark --preset userJourneyOnboardingNew

# Run all benchmarks
yarn test:e2e:benchmark

# Run with custom iterations and retries
yarn test:e2e:benchmark --preset interactionUserActions --iterations 5 --retries 3

# Save results to file
yarn test:e2e:benchmark --preset userJourneyOnboardingNew --out results.json

Available Presets

Preset	Description	Benchmarks
`startupStandardHome`	Standard user cold-start	`standard-home.ts`
`startupPowerUserHome`	Power user cold-start	`power-user-home.ts`
`interactionUserActions`	Single-action interaction times	`load-new-account.ts`, `confirm-tx.ts`, `bridge-user-actions.ts`
`userJourneyOnboardingImport`	Import wallet onboarding	`onboarding-import-wallet.ts`
`userJourneyOnboardingNew`	New wallet onboarding	`onboarding-new-wallet.ts`
`userJourneyAssets`	Asset detail page loads	`asset-details.ts`, `solana-asset-details.ts`
`userJourneyAccountManagement`	Login from home (import SRP)	`import-srp-home.ts`
`userJourneyTransactions`	Send and swap transaction flows	`send-transactions.ts`, `swap.ts`
`pageLoadBenchmark`	Dapp page load (Playwright)	`dapp-page-load/`
`all`	All benchmarks	Everything above

User journey benchmarks

User journey presets run on Chrome and Firefox with the Webpack test build (build-test-webpack / build-test-mv2-webpack in CI), same as startup and interaction benchmarks in run-benchmarks.yml.

Special CI Requirements

Preset	Requirement
`userJourneyAccountManagement`	Requires `TEST_SRP_2` secret (12-word seed phrase). Set as a CI secret in GitHub.

Benchmark Categories and Types

Each benchmark belongs to a category and has a BENCHMARK_TYPE:

Category	Directory	BENCHMARK_TYPE	Description
Startup	`startup/`	`BENCHMARK`	Extension cold-start and initialization times
Interaction	`interaction/`	`USER_ACTION`	Single discrete user interaction timings
User Journey	`user-journey/`	`PERFORMANCE`	Multi-step E2E user flows with multiple timers
Dapp Page Load	`dapp-page-load/`	`PERFORMANCE`	Playwright-based dapp page load metrics (Core Web Vitals)

1. Create a new file in the appropriate subdirectory

Choose the category that best fits your benchmark:

startup/ - For measuring extension cold-start and page load times
interaction/ - For measuring single discrete user interaction timings
user-journey/ - For E2E multi-step user flows with multiple timers
dapp-page-load/ - For Playwright-based dapp page load benchmarks (Core Web Vitals); not a Selenium run() flow. The Playwright benchmark project sets testDir to this folder.

2. Export a `run` function

Every benchmark must export a run function. The signature depends on the benchmark type:

TimerHelper API (for user journey benchmarks)

User journey benchmarks use the TimerHelper class to measure operations:

// Constructor
const timer = new TimerHelper(
  id: string  // camelCase identifier (e.g., 'assetClickToPriceChart')
);

// Measure method - wraps async action
await timer.measure(async () => {
  // Your async actions to measure
  await page.load();
  await element.click();
});

Threshold Validation (mandatory):

Every benchmark must have a threshold entry in THRESHOLD_REGISTRY inside test/e2e/benchmarks/utils/thresholds.ts. Thresholds are validated after collecting statistics from all iterations.

To add thresholds for a new benchmark:

Define a threshold config constant in test/e2e/benchmarks/utils/thresholds.ts
Add it to BENCHMARK_THRESHOLDS with a camelCase key matching the filename

const MY_BENCHMARK: ThresholdConfig = {
  myTimerId: {
    p75: { warn: 1000, fail: 1500 },
    p95: { warn: 2000, fail: 3000 },
    ciMultiplier: CI_MULTIPLIER.DEFAULT,
  },
};

// Add to BENCHMARK_THRESHOLDS:
const BENCHMARK_THRESHOLDS = {
  myBenchmark: MY_BENCHMARK, // camelCase key matching filename (my-benchmark.ts → myBenchmark)
};

The key must be camelCase matching the filename: my-benchmark.ts → myBenchmark.

For startup benchmarks, use the startup prefix: standard-home.ts → startupStandardHome.

Quality Gate enforcement:

test/e2e/benchmarks/utils/gated-metrics.ts lists which metrics block PRs on regression. Metrics in that file fail the quality-gate CI job when their threshold is breached; metrics not listed surface as warnings in the PR comment but do not fail the gate. Edit that file to graduate or demote a metric — its module JSDoc covers the mechanics. Both that file and thresholds.ts are co-owned by @MetaMask/qa and @MetaMask/extension-platform; either team can approve.

3. Add to a preset (optional)

If your benchmark should run in CI, add its file path to the appropriate preset in run-benchmark.ts.

To create a new preset, update these locations:

test/e2e/benchmarks/utils/constants.ts — add a key to the relevant preset object (USER_JOURNEY_PRESETS for user journeys, INTERACTION_PRESETS for interactions, STARTUP_PRESETS for startup) or create a new one
test/e2e/benchmarks/run-benchmark.ts — add a PRESETS entry (using the constant key) mapping to benchmark file paths
.github/workflows/run-benchmarks.yml — add the preset to the CI matrix

Output Format

Benchmark results are output as JSON, either to stdout or to a file via --out:

{
  "onboardingImportWallet": {
    "testTitle": "benchmark-onboarding-import-wallet",
    "persona": "standard",
    "mean": {
      "importWalletToSocialScreen": 209,
      "srpButtonToForm": 53
    },
    "p75": { ... },
    "p95": { ... }
  }
}

Results are sent to Sentry in CI via send-to-sentry.ts for monitoring and analysis.

Each benchmark entry becomes a Sentry Structured Log (Sentry.logger.info):

Message: <benchmarkType>.<presetName> — e.g. performance.userJourneyOnboardingImport, userAction.interactionUserActions, benchmark.startupStandardHome
Attributes:
- ci.branch, ci.commitHash, ci.prNumber — Git/CI context
- ci.browser, ci.buildType — e.g. chrome / browserify
- ci.persona — standard or powerUser
- ci.testTitle — human-readable test name from the benchmark file
- Metric values, namespaced by stat type:
  - Statistical benchmarks (startup / user journey): <type>.mean.<metric>, <type>.p75.<metric>, <type>.p95.<metric> — e.g. performance.mean.uiStartup
  - Interaction benchmarks: flat numeric keys — e.g. loadNewAccount, confirmTx

Example Sentry log for a startup benchmark:

message:    "benchmark.startupStandardHome"
attributes: {
  "ci.branch":      "main",
  "ci.commitHash":  "abc1234",
  "ci.browser":     "chrome",
  "ci.buildType":   "browserify",
  "ci.persona":     "standard",
  "ci.testTitle":   "benchmark-standard-home",
  "benchmark.mean.uiStartup":          1443,
  "benchmark.mean.backgroundConnect":  210,
  "benchmark.p75.uiStartup":           1530,
  "benchmark.p95.uiStartup":           1620
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Flows

Running Benchmarks

Available Presets

User journey benchmarks

Special CI Requirements

Benchmark Categories and Types

1. Create a new file in the appropriate subdirectory

2. Export a `run` function

TimerHelper API (for user journey benchmarks)

3. Add to a preset (optional)

Output Format

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Benchmark Flows

Running Benchmarks

Available Presets

User journey benchmarks

Special CI Requirements

Benchmark Categories and Types

1. Create a new file in the appropriate subdirectory

2. Export a run function

TimerHelper API (for user journey benchmarks)

3. Add to a preset (optional)

Output Format

2. Export a `run` function