Skip to content

Commit cef86c5

Browse files
committed
blog about check_vsphere cluster-health
1 parent 2eb64da commit cef86c5

1 file changed

Lines changed: 76 additions & 0 deletions

File tree

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: "Monitoring vSphere cluster health with check_vsphere"
3+
date: 2026-04-01
4+
---
5+
6+
## What's new?
7+
8+
The `cluster-health` command in **check_vsphere** looks at the members of a
9+
vSphere cluster, checks their state and decides whether the whole cluster is
10+
healthy. By default it treats nodes that are *disconnected* or *in maintenance*
11+
as faulty, but you can tweak that list. Use `--faulty` to customize what counts
12+
as a failure.
13+
14+
## How the threshold works
15+
16+
You tell the command when to raise a warning or a critical alert with the
17+
`--cluster-threshold` flag:
18+
19+
```
20+
[max_members:]warn_threshold:crit_threshold
21+
```
22+
23+
* `max_members` (optional) - Apply the rule to clusters with up to this many members.
24+
* `warn_threshold` – Number or percent of faulty nodes that triggers a **WARN**.
25+
* `crit_threshold` – Number or percent that triggers a **CRIT**.
26+
27+
You can give several `--cluster-threshold` flags for different cluster sizes.
28+
Rules apply to clusters up to their `max_members`; if multiple rules match, the
29+
smallest `max_members` wins. One rule must omit `max_members`; that one is the
30+
fallback.
31+
32+
## Quick examples
33+
34+
* `3:1:1` - For clusters up to 3 nodes: a single fault triggers a critical state (warning and critical equal).
35+
* `5:1:3` - For clusters up to 5 nodes: warn at >=1 faulty node, critical at >=3.
36+
* `10:2:5` - For clusters up to 10 nodes: warn at 2 faulty nodes, critical at 5
37+
* `50:5:15` - For clusters up to 50 nodes: warn at 5 faulty nodes, critical at 15.
38+
* `10%:20%` - Fallback for larger clusters: warning at 10% failures, critical at 20%.
39+
40+
## Usage snippet
41+
42+
```bash
43+
check_vsphere cluster-health \
44+
--host vcenter.example.com \
45+
-u naemon@vsphere.local \
46+
--cluster-threshold 3:1:1 \
47+
--cluster-threshold 5:1:3 \
48+
--cluster-threshold 10:2:5 \
49+
--cluster-threshold 50:5:15 \
50+
--cluster-threshold '10%:20%' \
51+
--cluster-name MyCluster
52+
```
53+
54+
## Naemon integration
55+
56+
```
57+
define command{
58+
command_name check_vsphere_cluster_health
59+
command_line VSPHERE_PASS=$ARG4$ $USER2$/check_vsphere cluster-health \
60+
-u $ARG3$ \
61+
--host $ARG1$ \
62+
--cluster-name $ARG2$ \
63+
--cluster-threshold 3:1:1 \
64+
--cluster-threshold 5:1:3 \
65+
--cluster-threshold 10:2:5 \
66+
--cluster-threshold 50:5:15 \
67+
--cluster-threshold '10%:20%'
68+
}
69+
70+
define service{
71+
use generic-service
72+
host_name vcenter.example.com
73+
service_description vSphere Cluster Health
74+
check_command check_vsphere_cluster_health!vcenter.example.com!MyCluster!user!pw
75+
}
76+
```

0 commit comments

Comments
 (0)