Skip to content

feat: add 'process-compose analyze critical-chain' subcommand#462

Open
ryantm wants to merge 1 commit intoF1bonacc1:mainfrom
ryantm:feat/analyze-critical-chain
Open

feat: add 'process-compose analyze critical-chain' subcommand#462
ryantm wants to merge 1 commit intoF1bonacc1:mainfrom
ryantm:feat/analyze-critical-chain

Conversation

@ryantm
Copy link
Copy Markdown
Contributor

@ryantm ryantm commented Apr 18, 2026

Note this code was generated with AI assistance.

Summary

Adds a new top-level analyze command, with a first subcommand
analyze critical-chain, inspired by systemd-analyze critical-chain.
It prints a tree of processes from the ones nothing depends on, down
through their dependencies, annotated with startup timings.

For each process two times are shown:

  • @<offset> — time after the project started that the process became ready
    (or was launched, for processes with no readiness signal).
  • +<duration> — time the process spent between launch and becoming ready.
    Only shown for processes with a readiness probe / liveness probe /
    ready_log_line.

Example

$ process-compose analyze critical-chain

The time when unit became ready is printed after the "@" character.
The time the unit took to become ready is printed after the "+" character.

Project: example/compose
Started: 2026-04-18T09:01:20-07:00
Up time: 4m48.517s

app (not started) [Pending]
├─migrations (not started) [Pending]
│ ├─postgres @4min 30.148s +270.145s
│ └─setup @4ms (not ready)
├─postgres @4min 30.148s +270.145s
├─redis @2min 10.446s +130.435s
└─setup @4ms (not ready)
...

You can also restrict the output to a specific process (and its sub-chain):

$ process-compose analyze critical-chain app

What changed

  • src/types/process.go: three optional *time.Time fields on
    ProcessState (process_start_time, process_ready_time,
    process_end_time). All are omitempty, so the JSON wire format is
    backward-compatible for existing clients.
  • src/app/process.go: populates the new timestamps at existing
    lifecycle hook points:
    • run() — sets ProcessStartTime. For processes with no readiness
      signal (no readiness probe and no ready_log_line) this is also used
      as ProcessReadyTime.
    • setProcHealth(Ready) — sets ProcessReadyTime when a readiness
      probe succeeds.
    • handleOutput() — sets ProcessReadyTime when ready_log_line
      matches.
    • onProcessEnd() — sets ProcessEndTime.
  • src/cmd/analyze.go — new parent analyze Cobra command.
  • src/cmd/analyze_critical_chain.gocritical-chain [process...]
    subcommand. Implemented purely on the client side: composes the
    existing /project/state, /processes, and /graph endpoints, walks
    the dependency graph, and sorts siblings by descending ready-time
    (same order systemd-analyze uses). No new API endpoints.

Motivation

When a process-compose project has many processes with deep dependency
chains, it can be hard to see which process is the startup bottleneck.
process list shows status and age, but doesn't show how long each
process took to reach the state that unblocks its dependents, and
doesn't walk the dependency graph.

analyze critical-chain answers the question "what was the slow path?"
at a glance.

Test plan

  • Built locally with go build -v -o process-compose . on Go 1.26.
  • go vet ./... clean.
  • Ran against a real project with many processes and a mixture of
    readiness-probe / ready_log_line / no-probe processes; confirmed:
    • Top-level processes are those nothing depends on.
    • Siblings sort by descending ready-time (slowest first), matching
      systemd-analyze.
    • @ and + values match the expected wall-clock deltas.
    • Processes that never became ready are annotated (not ready) /
      (not started) and don't crash the output.
    • [Skipped], [Error], [Restarting], [Launching], etc. status
      annotations are surfaced.
    • Passing explicit process names restricts the tree to those
      sub-chains.
    • Tested with a read-only unix-socket client as well as TCP.

Backward compatibility

  • JSON additions are all omitempty pointers, so existing API consumers
    see no difference.
  • No existing commands or flags were changed.
  • No new API endpoints are introduced.

Open questions

Happy to iterate on any of these:

  • Naming: analyze critical-chain mirrors systemd-analyze. Open to
    analyze chain, analyze timing, etc. if preferred.
  • Color output: currently uses fatih/color like other commands
    (state, list). Can add a --no-color flag / respect NO_COLOR if
    desired.
  • Output format: currently text-only. A -o json mode would be a
    straightforward follow-up if there's interest.

Adds a new 'analyze' top-level command with a 'critical-chain' subcommand
that prints a tree of processes from top-level (nothing depends on them)
down through their dependencies, annotated with startup timings -- in the
spirit of 'systemd-analyze critical-chain'.

For each process two times are shown:

  @<offset>   Time after the project started that the process became ready.
              For processes without a readiness signal this is the time the
              process was launched.
  +<duration> Time the process spent between launch and becoming ready.
              Only shown for processes with a readiness probe / liveness
              probe / 'ready_log_line'.

Example output:

  app (not started) [Pending]
  ├─migrations (not started) [Pending]
  │ ├─postgres @4min 30.148s +270.145s
  │ └─setup @4ms (not ready)
  ├─postgres @4min 30.148s +270.145s
  ├─redis @2min 10.446s +130.435s
  └─setup @4ms (not ready)

To support this, three optional timestamp fields are added to ProcessState:

  - process_start_time: when the process first entered a launched/running
    state.
  - process_ready_time: when the process became ready (readiness probe
    succeeded, ready_log_line matched, or -- for processes without any
    readiness signal -- equal to process_start_time).
  - process_end_time:   when the process ended (completed / errored /
    terminated / skipped).

These fields are populated in src/app/process.go from the existing
lifecycle hook points (run(), setProcHealth(), handleOutput() ready-log
match, and onProcessEnd()), are surfaced through the existing
GET /processes endpoint, and are JSON-omitempty so existing clients are
unaffected.

The critical-chain command itself is implemented purely on the client
side -- it composes the existing /project/state, /processes, and /graph
endpoints and walks the dependency graph (sorting siblings by descending
ready-time, like systemd does), so no new server-side endpoints are
introduced.

If process names are passed as positional arguments, only those processes
and their dependency sub-chains are printed; otherwise every top-level
process (plus any isolated processes) is printed.
@sonarqubecloud
Copy link
Copy Markdown

@ryantm ryantm marked this pull request as ready for review April 18, 2026 18:17
Copy link
Copy Markdown
Owner

@F1bonacc1 F1bonacc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryantm,

This can be a super useful feature for all PC users.
I left a few minor comments.
Please run make testrace after you address those, as there is one potential race condition that I flagged.
Do you think you can add some tests to cover the most critical sections?

BTW, I like it that you reused all the existing APIs, especially the graph!

Comment on lines +237 to +244
out := ""
for i, p := range parts {
if i > 0 {
out += " "
}
out += p
}
return out
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out := ""
for i, p := range parts {
if i > 0 {
out += " "
}
out += p
}
return out
return strings.Join(parts, " ")

Comment thread src/cmd/analyze.go
Comment on lines +19 to +24
Run: func(cmd *cobra.Command, args []string) {
if len(args) == 0 {
_ = cmd.Help()
os.Exit(0)
}
},
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Run: func(cmd *cobra.Command, args []string) {
if len(args) == 0 {
_ = cmd.Help()
os.Exit(0)
}
},

Cobra auto-prints help when a parent command has no Run and no subcommand is provided

Comment on lines +126 to +137
func readyOffsetForSort(s *types.ProcessState, projectStart time.Time) time.Duration {
if s == nil {
return time.Duration(1<<62 - 1)
}
if s.ProcessReadyTime != nil {
return s.ProcessReadyTime.Sub(projectStart)
}
if s.ProcessStartTime != nil {
return s.ProcessStartTime.Sub(projectStart)
}
return time.Duration(1<<62 - 1)
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func readyOffsetForSort(s *types.ProcessState, projectStart time.Time) time.Duration {
if s == nil {
return time.Duration(1<<62 - 1)
}
if s.ProcessReadyTime != nil {
return s.ProcessReadyTime.Sub(projectStart)
}
if s.ProcessStartTime != nil {
return s.ProcessStartTime.Sub(projectStart)
}
return time.Duration(1<<62 - 1)
}
func readyOffsetForSort(s *types.ProcessState, projectStart time.Time) time.Duration {
const unreadySortOrder = time.Duration(math.MaxInt64)
if s == nil {
return unreadySortOrder
}
if s.ProcessReadyTime != nil {
return s.ProcessReadyTime.Sub(projectStart)
}
if s.ProcessStartTime != nil {
return s.ProcessStartTime.Sub(projectStart)
}
return unreadySortOrder
}

for _, name := range args {
node, ok := graph.AllNodes[name]
if !ok {
fmt.Fprintf(os.Stderr, "unknown process (or process has no dependencies): %s\n", name)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for the graph, but if the user does name an existing isolated process, the CLI could still render a timings line for it (you already do this when len(args) == 0). Consider:

node, ok := graph.AllNodes[name]                                                                                                                                                                                                                                                       
  if !ok {                                                                                                                                                                                                                                                                               
      if _, exists := stateByName[name]; exists {
          node = &types.DependencyNode{Name: name}                                                                                                                                                                                                                                       
      } else {                                                                                                                                                                                                                                                                           
          fmt.Fprintf(os.Stderr, "unknown process: %s\n", name)
          os.Exit(1)                                                                                                                                                                                                                                                                     
      }                                                                
  }                                                                                                                                                                                                                                                                                      
  roots = append(roots, node)

Comment thread src/app/process.go
Comment on lines 750 to +754
p.procState.Health = types.ProcessHealthReady
if p.procState.ProcessReadyTime == nil {
now := time.Now()
p.procState.ProcessReadyTime = &now
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should protect this with the stateMtx, like you already do in other places, the easiest fix is just to reuse setProcHealth.

Suggested change
p.procState.Health = types.ProcessHealthReady
if p.procState.ProcessReadyTime == nil {
now := time.Now()
p.procState.ProcessReadyTime = &now
}
p.setProcHealth(types.ProcessHealthReady)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants