dolphindb exporter

用于将 DolphinDB 监控指标导出到 Prometheus 的工具，功能特性如下：

从 DolphinDB 集群并发地收集监控指标
支持多种指标类型
支持自定义指标
支持统计各个指标的抓取耗时
支持仅监控在本地运行的节点
支持监控导出器所在的机器
支持自监控

dolphindb exporter

依赖

Prometheus 任意版本，推荐最新版。
Prometheus Alertmanager 任意版本，推荐最新版。
Grafana 9 或以上版本，推荐最新版。
dolphindb-datasource-next 数据源插件最新版。

安装

下载二进制发布包

在 release 页面下载二进制发布包。

从源码构建

克隆仓库：

git clone https://dolphindb.net/aftersale/dolphindb_exporter.git

构建项目：
```
cd dolphindb_exporter
make build
```

使用 Docker 安装

使用源码编译，或者将项目根目录的 Dockerfile 放到解压后的发布包文件夹内，然后使用如下命令构建镜像：

docker build -t dolphindb_exporter .

注意启动时需要绑定端口和 config 文件夹：

docker run -p 8000:8000 -v /path/to/clusterDemo/config:/app/config localhost/dolphindb_exporter

配置

通过环境变量配置：

DOLPHINDB_USERNAME: DolphinDB 用户名。默认为 admin。
DOLPHINDB_PASSWORD: DolphinDB 密码。默认为 123456。

建议创建一个管理员用户专门给导出器使用，例如：createUser("dolphindb_exporter", "123456", isAdmin=true)

参数

-collect.<scraper_name>: 是否收集特定类别的指标（默认：true）
- <scraper_name>为指标类别，可通过 --help 查看所有指标类别
-config-dir: 配置文件目录路径（默认："config"）
- 包含 DolphinDB 集群配置的目录
- 配置文件应与 DolphinDB 集群配置一致
-listen-port: 监听端口号（默认：8000）
-local-ip-only: 仅监控本机节点（默认：false）
- 启用后只采集运行 exporter 的机器上的 DolphinDB 节点指标
-log-level: 日志等级（默认 info）
-pool-size: 连接池大小（默认 10）
-reconnect-num: 连接池重连次数（默认2）
-version: 显示版本信息并退出
-help: 显示帮助信息

使用方法

启动导出器

建议安装路径为 <dolphindb安装目录>/clusterDemo/dolphindb_exporter，此时可以直接使用发布包内的 startExporter.sh 实现后台启动：

cd clusterDemo/dolphindb_exporter
bash startExporter.sh
# 默认使用 8000 端口，如果需要指定端口号，可以添加参数
# bash startExporter.sh 8001

该启动脚本会根据 ../config 目录下的配置文件监控本机节点。若配置了集群使用其他配置文件夹，需要修改脚本内的 -config-dir 参数。

若监控单节点，需要修改 config 文件夹下的 cluster.nodes 文件，填写一行记录，其 mode 为 single，例如：

localSite,mode,computeGroup,zone
192.168.100.44:8848:local8848,single,,

启动后将在相同目录生成日志文件 dolphindb_exporter.log。

配置 Prometheus 采集指标

修改 Prometheus 配置文件（默认位于 /etc/prometheus/prometheus.yml），添加 rule_files 和 scrape_configs。

rule_files:
  - "/path/to/rules.yaml" # 需要修改为实际路径
  - "/path/to/alerts.yaml" # 需要修改为实际路径

scrape_configs:
  - job_name: "dolphindb_exporter_targets_<MACHINE-ID>" # 需要唯一                                                      
    http_sd_configs:                      
      - url: http://<DOLPHINDB-EXPORTER-HOSTNAME>:8000/targets # 需要修改为实际地址
        refresh_interval: 1m                         
    metrics_path: /probe           
    relabel_configs:    
      - source_labels: [__address__]      
        target_label: __param_target
      - source_labels: [__param_target]              
        target_label: instance            
      - target_label: __address__
        replacement: <DOLPHINDB-EXPORTER-HOSTNAME>:8000 # 需要修改为实际地址       
  - job_name: 'dolphindb_exporter_<MACHINE-ID>' # 需要唯一            
    static_configs:                 
      - targets: [<DOLPHINDB-EXPORTER-HOSTNAME>:8000] # 需要修改为实际地址
  - job_name: 'dolphindb_exporter_machine_<MACHINE-ID>' # 需要唯一            
    static_configs:                 
      - targets: [<DOLPHINDB-EXPORTER-HOSTNAME>:8000] # 需要修改为实际地址
    metrics_path: /machine_metrics

注意：

需要修改 <MACHINE-ID> 为自定义的机器 ID，使得多台机器的每个 job_name 唯一，例如 job_name: "dolphindb_exporter_targets_1"
需要修改 <DOLPHINDB-EXPORTER-HOSTNAME> 为 dolphindb_exporter 所在的服务器地址
需要修改 rules.yaml 和 alerts.yaml 的路径为发布包内的同名文件

配置 Grafana 展示指标

在 Grafana 添加 Prometheus 和 DolphinDB 任意数据节点的数据源后，在 Dashboards 页面导入发布包内的 json 模板文件。

配置 Alertmanager 告警

参考 Prometheus Alertmanager 官方文档，以开启 SMTP 服务的 Outlook 邮箱为例，配置步骤如下：

修改配置文件 alertmanager.yml

global:
  resolve_timeout: 1m
  smtp_smarthost: 'smtp-mail.outlook.com:587'
  smtp_from: 'alertsender@example.com'            # 需要改为发送人邮箱
  smtp_auth_username: 'alertsender@example.com'   # 需要改为发送人邮箱
  smtp_auth_password: 'password'                  # 发送人邮箱的 SMTP 授权码或密码
  smtp_require_tls: true

route:
  receiver: 'email'  
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 1h

receivers:
- name: 'email'
  email_configs:
  - to: 'alertreceiver@example.com'               # 需要改为接收人邮箱
    send_resolved: true

启动 Alertmanager

mkdir -p ./data
nohup ./alertmanager --config.file=./alertmanager.yml --storage.path=./data --web.listen-address=":<ALERTMANAGER-PORT>" --log.level=info >> alertmanager.log 2>&1 &  # 需要修改为实际端口号

配置 Prometheus

编辑 prometheus.yml，添加Alertmanager配置，重启 Prometheus

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - <ALERTMANAGER-HOST>:<ALERTMANAGER-PORT>  # 需要修改为实际的地址和端口号

指标说明

指标	类别	类型	含义	DolphinDB最低版本	默认启用
dolphindb_up	无	Gauge	DolphinDB是否在线	1.0	true
dolphindb_perf_cpu_usage	perf	Gauge	DolphinDB进程占用CPU的百分比（单位：无）	1.0	true
dolphindb_perf_memory_used	perf	Gauge	节点使用的内存（单位：字节）	1.0	true
dolphindb_perf_memory_alloc	perf	Gauge	节点中DolphinDB当前内存池的容量（单位：字节）	1.0	true
dolphindb_perf_disk_capacity	perf	Gauge	磁盘容量（单位：字节）	1.0	true
dolphindb_perf_disk_free_space	perf	Gauge	磁盘剩余空间（单位：字节）	1.0	true
dolphindb_perf_last_minute_write_volume	perf	Gauge	前一分钟写磁盘容量(单位：字节)	1.0	true
dolphindb_perf_last_minute_read_volume	perf	Gauge	前一分钟读磁盘容量（单位：字节）	1.0	true
dolphindb_perf_last_minute_network_recv	perf	Gauge	前一分钟网络接收字节数（单位：字节）	1.0	true
dolphindb_perf_last_minute_network_send	perf	Gauge	前一分钟网络发送字节数（单位：字节）	1.0	true
dolphindb_perf_disk_read_rate	perf	Gauge	磁盘读速率（单位：字节/秒）	1.0	true
dolphindb_perf_disk_write_rate	perf	Gauge	磁盘写速率（单位：字节/秒）	1.0	true
dolphindb_perf_network_send_rate	perf	Gauge	网络发送速率（单位：字节/秒）	1.0	true
dolphindb_perf_network_recv_rate	perf	Gauge	网络接收速率（单位：字节/秒）	1.0	true
dolphindb_perf_cum_msg_latency	perf	Gauge	流数据订阅节点所有已接收的消息的平均延时（单位：纳秒）	1.0	true
dolphindb_perf_last_msg_latency	perf	Gauge	流数据订阅节点最后收到的消息的延时（单位：纳秒）	1.0	true
dolphindb_perf_max_last10_query_time	perf	Gauge	前10个完成的查询执行所耗费时间的最大值（单位：纳秒）	1.0	true
dolphindb_perf_med_last10_query_time	perf	Gauge	前10个完成的查询执行所耗费时间的中间值（单位：纳秒）	1.0	true
dolphindb_perf_med_last100_query_time	perf	Gauge	前100个完成的查询执行所耗费时间的中间值（单位：纳秒）	1.0	true
dolphindb_perf_max_last100_query_time	perf	Gauge	前100个完成的查询执行所耗费时间的最大值（单位：纳秒	1.0	true
dolphindb_perf_max_running_query_time	perf	Gauge	当前正在执行的查询的耗费时间的最大值（单位：纳秒）	1.0	true
dolphindb_perf_avg_load	perf	Gauge	平均负载（单位：无）	1.0	true
dolphindb_perf_job_load	perf	Gauge	作业负载（单位：无）	1.0	true
dolphindb_perf_running_jobs	perf	Gauge	正在执行中的作业和任务数（单位：无）	1.0	true
dolphindb_perf_queued_jobs	perf	Gauge	队列中的作业和任务数（单位：无）	1.0	true
dolphindb_perf_connection_num	perf	Gauge	节点连接数	1.0	true
dolphindb_cluster_chunk_status_incomplete	cluster_chunk_status	Gauge	DolphinDB集群中状态为未完成的chunk的数量（单位：无）	1.0	false
dolphindb_cluster_chunk_status_count	cluster_chunk_status	Gauge	每个数据库的chunk数量（单位：无）	1.0	false
dolphindb_get_running_queries_total_count	get_running_queries	Gauge	当前正在执行的查询任务总数	1.0	true
dolphindb_get_running_queries_count	get_running_queries	Gauge	当前每个IP、每个用户的正在执行的查询任务数	1.0	true
dolphindb_get_recovery_task_status_total_count	get_recovery_task_status	Gauge	恢复任务总数	1.0	true
dolphindb_get_recovery_task_status_waiting_count	get_recovery_task_status	Gauge	等待中的恢复任务数量	1.0	true
dolphindb_get_recovery_task_status_in_progress_count	get_recovery_task_status	Gauge	进行中的恢复任务数量	1.0	true
dolphindb_get_recovery_task_status_finished_count	get_recovery_task_status	Gauge	已完成的恢复任务数量	1.0	true
dolphindb_get_recovery_task_status_aborted_count	get_recovery_task_status	Gauge	已中止的恢复任务数量	1.0	true
dolphindb_get_session_memory_stat_dimensional_table_bytes	get_session_memory_stat	Gauge	维度表占用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_shared_table_bytes	get_session_memory_stat	Gauge	共享表占用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_olap_tablet_bytes	get_session_memory_stat	Gauge	OLAP 表占用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_olap_cache_engine_bytes	get_session_memory_stat	Gauge	OLAP 缓存引擎使用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_olap_cached_symbol_base_bytes	get_session_memory_stat	Gauge	OLAP 缓存符号基础占用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_dfs_metadata_bytes	get_session_memory_stat	Gauge	DFS 元数据存储使用的内存（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_tsdb_cache_engine_bytes	get_session_memory_stat	Gauge	TSDB 缓存引擎使用的内存（单位：字节）	2.0	true
dolphindb_get_session_memory_stat_tsdb_level_file_index_bytes	get_session_memory_stat	Gauge	TSDB Level 文件索引占用的内存（单位：字节）	2.0	true
dolphindb_get_session_memory_stat_tsdb_cached_symbol_base_bytes	get_session_memory_stat	Gauge	TSDB 缓存符号基础占用的内存（单位：字节）	2.0	true
dolphindb_get_session_memory_stat_iotdb_latest_key_cache_bytes	get_session_memory_stat	Gauge	IoTDB 最新键缓存使用的内存（单位：字节）	3.0	true
dolphindb_get_session_memory_stat_iotdb_static_table_cache_bytes	get_session_memory_stat	Gauge	IoTDB 静态表缓存占用的内存（单位：字节）	3.0	true
dolphindb_get_session_memory_stat_streaming_pub_queue	get_session_memory_stat	Gauge	流数据发布队列深度	1.0	true
dolphindb_get_session_memory_stat_streaming_sub_queue	get_session_memory_stat	Gauge	流数据订阅队列深度	1.0	true
dolphindb_get_session_memory_stat_olap_cache_engine_capacity_bytes	get_session_memory_stat	Gauge	OLAP 缓存引擎的总容量（单位：字节）	1.0	true
dolphindb_get_session_memory_stat_tsdb_cache_engine_capacity_bytes	get_session_memory_stat	Gauge	TSDB 缓存引擎的总容量（单位：字节）	2.0	true
dolphindb_get_session_memory_stat_user_mem_bytes	get_session_memory_stat	Gauge	每个IP下用户使用的总内存（单位：字节）	1.0	true
dolphindb_get_console_jobs_total_count	get_console_jobs	Gauge	当前正在执行的终端任务总数	1.0	true
dolphindb_get_console_jobs_count	get_console_jobs	Gauge	当前每个IP、每个用户的正在执行的终端任务数	1.0	true
dolphindb_get_recent_jobs_total_count	get_recent_jobs	Gauge	当前正在执行的批处理任务总数	1.0	true
dolphindb_get_recent_jobs_count	get_recent_jobs	Gauge	当前每个IP、每个用户的正在执行的批处理任务数	1.0	true
dolphindb_get_recent_jobs_error_count	get_recent_jobs	Gauge	每个IP、每个用户的累计的出错的批处理任务数	1.0	true
dolphindb_get_transaction_status_total_count	get_transaction_status	Gauge	当前正在执行的事务总数	1.0	true
dolphindb_get_transaction_status_count	get_transaction_status	Gauge	当前正在执行的事务数，按事务类型分组	1.0	true
dolphindb_get_streaming_stat_sub_workers_total_count	get_streaming_stat	Gauge	订阅工作线程总数	1.0	true
dolphindb_get_streaming_stat_sub_workers_error_count	get_streaming_stat	Gauge	前一分钟有错误的订阅工作线程数	1.0	true
dolphindb_get_streaming_stat_sub_workers_queue_depth	get_streaming_stat	Gauge	每个工作线程ID的队列深度	1.0	true
dolphindb_get_streaming_stat_sub_conns_total_count	get_streaming_stat	Gauge	订阅连接总数	1.0	true
dolphindb_get_streaming_stat_sub_conns_last_msg_latency	get_streaming_stat	Gauge	每个发布队列的最后消息延迟（单位：纳秒）	1.0	true
dolphindb_get_streaming_stat_pub_conns_queue_depth	get_streaming_stat	Gauge	每个工作线程ID的发布队列深度	1.0	true
dolphindb_get_streaming_stat_pub_tables_total_count	get_streaming_stat	Gauge	发布流表总数	1.0	true
dolphindb_get_streaming_stat_pub_tables_count	get_streaming_stat	Gauge	每个订阅者IP的发布表数量	1.0	true
dolphindb_get_streaming_stat_udp_pub_tables_total_count	get_streaming_stat	Gauge	UDP发布表总数	1.0	true
dolphindb_get_stream_engine_stat_count	get_stream_engine_stat	Gauge	流引擎数量统计（按类型分组）	1.0	true
dolphindb_get_stream_engine_stat_error_count	get_stream_engine_stat	Gauge	流引擎错误数量统计（按类型分组）	1.0	true
dolphindb_get_stream_engine_stat_mem_bytes	get_stream_engine_stat	Gauge	流引擎内存使用量（按类型分组，单位：字节）	1.0	true
dolphindb_get_replication_status_is_master	get_replication_stat	Gauge	集群是否为异步复制主集群	2.00.9	true
dolphindb_get_replication_status_total_tasks	get_replication_stat	Gauge	前一分钟异步复制任务数	2.00.9	true
dolphindb_get_replication_status_truncated_tasks	get_replication_stat	Gauge	前一分钟主集群回收任务数	2.00.9	true
dolphindb_get_replication_status_completed_tasks	get_replication_stat	Gauge	前一分钟从集群完成任务数	2.00.9	true
dolphindb_get_replication_status_failed_tasks	get_replication_stat	Gauge	前一分钟从集群失败任务数	2.00.9	true
dolphindb_get_replication_status_waiting_tasks	get_replication_stat	Gauge	前一分钟从集群等待任务数	2.00.9	true
dolphindb_get_replication_status_executing_tasks	get_replication_stat	Gauge	前一分钟从集群执行中任务数	2.00.9	true
dolphindb_config_max_mem_gigabytes	config	Gauge	节点的最大内存（单位 GB）	1.0	true
dolphindb_config_max_cores	config	Gauge	节点的最大核数	1.0	true
dolphindb_config_max_connections	config	Gauge	节点的最大连接数	1.0	true
dolphindb_config_max_sub_queue_depth	config	Gauge	节点订阅端队列深度上限	1.0	true
dolphindb_config_max_pub_queue_depth	config	Gauge	节点发布端队列深度上限	1.0	true
dolphindb_config_license_expiration	config	Gauge	License 到期时间	1.0	true
dolphindb_get_cluster_volume_usage_bytes	get_cluster_volume	Gauge	节点的各个 volume 的使用量	3.00.4	true
dolphindb_get_cluster_volume_capacity_bytes	get_cluster_volume	Gauge	节点的各个 volume 的总容量	3.00.4	true
machine_up	无	Gauge	导出器所在机器是否在线	无要求	true
machine_system_perf_cpu_counts	system_perf	Gauge	导出器所在机器的CPU逻辑核数	无要求	true
machine_system_perf_cpu_percent	system_perf	Gauge	导出器所在机器的CPU使用率	无要求	true
machine_system_perf_disk_io_read_bytes	system_perf	Gauge	导出器所在机器的硬盘读字节总数	无要求	true
machine_system_perf_disk_io_write_bytes	system_perf	Gauge	导出器所在机器的硬盘写字节总数	无要求	true
machine_system_perf_disk_total_bytes	system_perf	Gauge	导出器所在机器的硬盘总容量	无要求	true
machine_system_perf_disk_usage_bytes	system_perf	Gauge	导出器所在机器的硬盘使用量	无要求	true
machine_system_perf_disk_usage_percent	system_perf	Gauge	导出器所在机器的硬盘使用率	无要求	true
machine_system_perf_network_io_read_bytes	system_perf	Gauge	导出器所在机器的网络读字节总数	无要求	true
machine_system_perf_network_io_write_bytes	system_perf	Gauge	导出器所在机器的网络写字节总数	无要求	true
machine_system_perf_virtual_memory_available_bytes	system_perf	Gauge	导出器所在机器的虚拟内存可用量	无要求	true
machine_system_perf_virtual_memory_total_bytes	system_perf	Gauge	导出器所在机器的虚拟内存总量	无要求	true
machine_system_perf_virtual_memory_used_bytes	system_perf	Gauge	导出器所在机器的虚拟内存已用量	无要求	true
dolphindb_running_log_error_count	running_log	Gauge	节点的最新100条日志中的ERROR日志数量	1.0	true
dolphindb_running_log_warning_count	running_log	Gauge	节点的最新100条日志中的WARNING日志数量	1.0	true

自定义指标

可以通过 YAML 配置文件添加自定义指标，配置文件需要放在 custom_metrics 目录下。

配置文件格式

name: "指标集名称"
help: "指标集描述"
type: ["指标类型"]  # 如["controller"]
versions: ["版本号"]
enabled: "默认是否启用"
dosfile: "脚本文件路径"  # 可选，与 script 必填其一
script: "直接脚本内容"   # 可选，与 dosfile 必填其一
metrics:
  - name: "指标名称"
    type: "指标类型"     # gauge/counter
    desc: "指标描述"
    labels: ["标签列表"] # 可选
    value_col_name: "指标列名" # 可选

脚本返回值说明

自定义指标脚本需要返回一个字典，其中：

键(key)为指标名称
值(value)可以是以下两种类型之一：
1. 数值类型：支持 DolphinDB 的 INT/LONG/FLOAT/DOUBLE 类型，将直接作为指标值使用
2. 表格类型：必须包含以下列：
  - 指标名称列：数值类型(INT/LONG/FLOAT/DOUBLE)，作为指标值，要求列名必须与 YAML 配置文件的对应指标的 value_col_name 配置项的值相同；若未配置 value_col_name 配置项，则要求列名与 name 配置项的值相同
  - 指标标签列：字符串类型，作为指标的标签值，要求列名必须与 YAML 配置文件的对应指标的 labels 配置项的值相同

示例

custom_metrics/dolphindb_get_recent_jobs_metrics.yaml:

name: "get_recent_jobs"
help: "metrics from getRecentJobs()"
type: ["datanode", "computenode", "controller"]
versions: ["1.0.0"]
enabled: true
dosfile: "./dolphindb_get_recent_jobs_metrics.dos"
metrics:
  - name: "total_count"
    type: "gauge"
    desc: "Number of running jobs"
  - name: "count"
    type: "gauge"
    desc: "Number of running jobs per user id and per client ip"
    labels: ["user_id", "client_ip"]
    value_col_name: "count"
  - name: "error_count"
    type: "counter"
    desc: "Number of finished jobs with error per user id and per client ip"
    labels: ["user_id", "client_ip"]
    value_col_name: "count"

custom_metrics/dolphindb_get_recent_jobs_metrics.dos:

t = getRecentJobs()

{
    // 数值类型返回值
    "total_count": (exec count(*) from t where endTime == NULL),
    // 表格类型返回值
    "count": (select count(*) from t where endTime == NULL group by userID as user_id, clientIp as client_ip),
    "error_count": (select count(*) from t where errorMsg != NULL group by userID as user_id, clientIp as client_ip)
}

使用说明

创建 YAML 配置文件并放入 custom_metrics 目录
编写脚本，有 2 种配置方式:
- 直接在配置中使用 script 字段
- 或通过 dosfile 指定脚本文件路径
重启 dolphindb_exporter 加载新配置
在浏览器访问 <导出器所在服务器地址>:8000/probe?target=<节点IP>:<节点端口号> 页面查看添加的指标

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.vscode		.vscode
config		config
custom_metrics		custom_metrics
docs		docs
dolphindb-mixin		dolphindb-mixin
exporter		exporter
scripts		scripts
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
util.go		util.go
util_test.go		util_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dolphindb exporter

依赖

安装

下载二进制发布包

从源码构建

使用 Docker 安装

配置

参数

使用方法

启动导出器

配置 Prometheus 采集指标

配置 Grafana 展示指标

配置 Alertmanager 告警

指标说明

自定义指标

配置文件格式

脚本返回值说明

示例

使用说明

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dolphindb exporter

依赖

安装

下载二进制发布包

从源码构建

使用 Docker 安装

配置

参数

使用方法

启动导出器

配置 Prometheus 采集指标

配置 Grafana 展示指标

配置 Alertmanager 告警

指标说明

自定义指标

配置文件格式

脚本返回值说明

示例

使用说明

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages