Nginx 主动健康检查实战指南
主动健康检查是 Nginx Plus 的专有功能,开源版 Nginx 需通过第三方模块或搭配其他工具实现。以下是两种方案的详细指南:
一、Nginx Plus 原生方案
1.
核心配置
upstream backend {
zone backend_servers 64k;
server backend1.example.com:80 resolve;
server backend2.example.com:80 resolve;
# 主动健康检查配置
health_check interval=5s
passes=3
fails=2
uri=/health
match=status_ok;
}
# 健康检查匹配条件
match status_ok {
status 200;
body ~ "healthy";
header Content-Type = text/html;
}
2.
完整配置示例
http {
upstream myapp {
zone myapp_zone 64k;
least_conn;
server 10.0.0.1:8080 slow_start=30s;
server 10.0.0.2:8080 max_fails=3 fail_timeout=30s;
server backup.example.com:8080 backup;
# 主动健康检查
health_check interval=10s
jitter=2s
fails=2
passes=1
uri=/api/health
port=8080
match=health_check;
}
match health_check {
status 200-399;
header Cache-Control ~ "no-cache";
body !~ "maintenance";
}
server {
listen 80;
location / {
proxy_pass http://myapp;
proxy_set_header Host $host;
proxy_next_upstream error timeout http_500;
}
# 健康状态页面(可选)
location /upstream_status {
status_zone upstream_status;
proxy_pass http://myapp;
}
}
}
3.
高级参数说明
| 参数 |
说明 |
示例值 |
|---|
interval |
检查间隔 |
5s, 10s |
jitter |
随机延迟 |
2s |
fails |
失败次数标记为不健康 |
2 |
passes |
成功次数恢复健康 |
1 |
uri |
检查端点 |
/health |
port |
指定端口 |
8080 |
mandatory |
强制检查 |
persistent |
match |
匹配条件 |
自定义 match 块 |
二、开源 Nginx 替代方案
1.
nginx_upstream_check_module
# 编译安装
cd nginx-1.20.1
patch -p1 < /path/to/nginx_upstream_check_module/check_1.20.1+.patch
./configure --add-module=/path/to/nginx_upstream_check_module
make && make install
upstream backend {
server 192.168.1.100:80;
server 192.168.1.101:80;
check interval=3000
rise=2
fall=3
timeout=1000
type=http;
check_http_send "GET /health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx http_3xx;
}
2.
nginx_upstream_hc_module(动态版本)
upstream backend {
server 10.0.0.1:80 max_fails=1 fail_timeout=10s;
server 10.0.0.2:80;
hc interval=5s
timeout=1s
type=http
port=80
uri=/health
status=200
up_status=up
down_status=down;
}
三、实战场景配置
场景1:微服务健康检查
# 微服务专用匹配条件
match microservice_health {
status 200;
header Content-Type ~ "application/json";
body ~ '"status":"UP"';
body !~ '"outOfService"';
}
upstream account_service {
zone account_zone 128k;
server account-svc-1:8080;
server account-svc-2:8080;
server account-svc-3:8080;
health_check interval=3s
uri=/actuator/health
match=microservice_health
fails=1
passes=2;
}
场景2:数据库连接池检查
stream {
upstream db_backend {
zone db_zone 64k;
server db1.example.com:3306;
server db2.example.com:3306;
health_check interval=30s
port=3306
passes=1
fails=2
match=mysql_check;
}
match mysql_check {
send "\x00\x00\x00\x0a\x40\x00\x00\x00\x00\x00\x00\x00";
expect ~ "MySQL";
}
server {
listen 3306;
proxy_pass db_backend;
}
}
四、监控与告警
1.
状态监控端点
# Nginx Plus 状态 API
location /api/status {
api write=on;
allow 10.0.0.0/8;
deny all;
}
location = /status.html {
root /usr/share/nginx/html;
status_format html;
}
2.
Prometheus 监控配置
# nginx-prometheus-exporter 配置
scrape_configs:
- job_name: 'nginx-plus'
static_configs:
- targets: ['nginx-host:8080']
metrics_path: /api/6/http/upstreams
params:
upstream: ['backend']
3.
告警规则示例
groups:
- name: nginx_alerts
rules:
- alert: NginxUpstreamUnhealthy
expr: nginxplus_upstream_peer_unavail > 0
for: 2m
labels:
severity: critical
annotations:
summary: "{{ $labels.upstream }} upstream has unhealthy nodes"
五、最佳实践建议
检查端点设计
- 专用健康检查端点(如
/health)
- 避免检查主业务接口
- 包含依赖服务状态(DB、Redis等)
参数调优
# 生产环境推荐值
health_check interval=5s # 不宜过短,避免压力
fails=3 # 避免抖动误判
passes=2 # 确保稳定恢复
timeout=2s # 根据业务调整
灰度切换策略
upstream backend {
server new-version weight=10 slow_start=60s;
server old-version weight=90;
health_check uri=/health gradual_start=on;
}
故障处理策略
server {
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_timeout 2s;
proxy_next_upstream_tries 3;
}
六、常见问题排查
检查不生效
# 验证配置
nginx -t
# 查看日志
tail -f /var/log/nginx/error.log | grep health_check
# 检查共享内存
nginx -V 2>&1 | grep zone
性能优化
- 调整
zone 大小:zone backend 1M;
- 合理设置检查间隔,避免频繁请求
- 使用
jitter 分散检查时间
测试命令
# 手动触发检查
curl http://nginx/api/3/http/upstreams/backend/peer/1/health
# 查看状态
curl http://nginx/status
七、集成方案对比
| 方案 |
优点 |
缺点 |
适用场景 |
|---|
| Nginx Plus |
原生支持、功能完整 |
商业收费 |
企业生产环境 |
| check_module |
开源免费、功能较强 |
需重新编译 |
技术团队自维护 |
| lua-resty-upstream |
动态灵活 |
需 Lua 环境 |
OpenResty 用户 |
| 外部探针+API |
解耦独立 |
架构复杂 |
多云/混合云 |
八、安全注意事项
访问控制
location /health {
internal; # 限制内部访问
allow 10.0.0.0/8;
deny all;
}
敏感信息保护
# 健康检查接口不应泄露敏感信息
location = /health {
return 200 "OK";
add_header Content-Type text/plain;
}
通过以上配置和实践,可以构建健壮的主动健康检查机制,确保服务的高可用性。根据实际环境选择合适的方案,并做好监控告警,形成完整的健康管理闭环。