Skip to content

AI-6670 Add recommended monitors for NiFi integration#23303

Closed
philjlee wants to merge 1 commit intophilip.lee/AI-6668-nififrom
philip.lee/AI-6670-nifi-monitors
Closed

AI-6670 Add recommended monitors for NiFi integration#23303
philjlee wants to merge 1 commit intophilip.lee/AI-6668-nififrom
philip.lee/AI-6670-nifi-monitors

Conversation

@philjlee
Copy link
Copy Markdown
Contributor

@philjlee philjlee commented Apr 13, 2026

Jira: AI-6670
Epic: AI-6662
Stacks on: #23110

Summary

  • Add 5 recommended monitors for the NiFi integration covering key operational failure modes
  • Thresholds calibrated against a survey of 406 existing monitors across the repo

Monitors

Monitor Type Critical Warning Metric
Instance unreachable service check 2 consecutive failures nifi.can_connect
JVM heap utilization high query alert 85% 75% nifi.system.jvm.heap_utilization
Connection backpressure high query alert 80% 60% nifi.connection.percent_use_count
Content repo disk high query alert 85% 75% nifi.system.content_repo.utilization
Invalid processors detected query alert >0 nifi.flow.invalid_count

Threshold rationale

  • JVM heap 85/75: Matches JVM convention (sonatype_nexus uses 80%); NiFi heap is spikier so 80% would be noisy
  • Backpressure 80/60: NiFi backpressure at 100% is a hard stop (not graceful degradation like connection pools at 90/80), so earlier warning is needed
  • Content repo 85/75: 100% halts all flow processing; tighter than generic disk (80%) but less noisy than ActiveMQ's 95%
  • Invalid processors >0: Matches system-critical count pattern (k8s crashes, Kafka offline partitions); invalid processors don't self-heal
  • Service check critical=2: Matches Consul pattern; suppresses transient network blips

Test plan

  • ddev validate config -s nifi passes
  • ddev validate models -s nifi passes
  • All monitor JSON files are valid v2 schema with correct tags
  • All manifest.json paths resolve to existing files
  • APW validator passes (runs as GitHub check)

5 monitors covering key NiFi operational scenarios:
- can_connect: service check for API reachability
- jvm_heap_utilization: JVM heap > 85% (warn 75%)
- connection_backpressure: backpressure > 80% (warn 60%)
- content_repo_utilization: content repo disk > 85% (warn 75%)
- invalid_processors: any invalid processor count > 0

Thresholds calibrated against 406 existing monitors in the repo.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6d4b5f69d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +24 to +25
"critical": 0.85,
"warning": 0.75
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use percent-scale thresholds for NiFi utilization alerts

These thresholds are on the wrong scale for NiFi metrics: the check submits utilization/backpressure as 0..100 percentages (for example _parse_utilization('16.0%') -> 16.0 in nifi/datadog_checks/nifi/check.py, and unit tests assert nifi.system.jvm.heap_utilization=20.0 and nifi.connection.percent_use_count=10), but this monitor uses 0.85/0.75 (and similarly 0.8/0.6 in other new monitor files). That makes the alert trigger above 0.85%, so it will stay in alert for almost any healthy host and produce persistent false positives.

Useful? React with 👍 / 👎.

@AAraKKe
Copy link
Copy Markdown
Contributor

AAraKKe commented Apr 14, 2026

Hi @philjlee, since this is a new integration I don't think we want to handle assets here but instead we should handle assets through developer platform. Any reason we have added a manifest file to this integration and are handling assets here? See this for how to build them without the manifest.

@philjlee
Copy link
Copy Markdown
Contributor Author

Hi @philjlee, since this is a new integration I don't think we want to handle assets here but instead we should handle assets through developer platform. Any reason we have added a manifest file to this integration and are handling assets here? See this for how to build them without the manifest.

Wow, I got this very wrong! Let me close this PR and add the assets the correct way.

@philjlee philjlee closed this Apr 14, 2026
@philjlee philjlee deleted the philip.lee/AI-6670-nifi-monitors branch April 14, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants