Wiki LogoWiki - The Power of Many

HPA: 自动扩缩容的原理与实践

深入解析 Kubernetes Horizontal Pod Autoscaler (HPA) 的工作机制, 指标采集链路, 扩缩容算法及行为控制.

在高并发业务场景下, 依靠人工手动调整副本数往往不够及时且浪费资源. HPA (Horizontal Pod Autoscaler) 实现了基于负载的自动伸缩, 是实现云原生弹性架构的关键.


1. HPA 工作架构

1.1 指标 API 层次

API路径提供者
Resource Metricsmetrics.k8s.ioMetrics Server
Custom Metricscustom.metrics.k8s.ioPrometheus Adapter
External Metricsexternal.metrics.k8s.io云厂商, KEDA

2. HPA 配置

2.1 基础配置 (v2)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi

2.2 Target 类型

类型描述示例
Utilization相对于 requests 的百分比CPU 利用率 70%
AverageValue每个 Pod 的平均值内存 500Mi
Value所有 Pod 的总和总请求数 1000

3. 扩缩容算法

3.1 核心公式

desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)]

示例:

  • 当前副本: 2
  • 当前 CPU 使用: 100m (平均)
  • 目标 CPU: 50m
  • 计算: ceil[2 × (100/50)] = ceil[4] = 4

3.2 多指标处理

当配置多个指标时:

  1. 分别计算每个指标的期望副本数
  2. 取最大值作为最终决策
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 50  # → 4 replicas
- type: Pods
  pods:
    metric:
      name: requests_per_second
    target:
      type: AverageValue
      averageValue: "100"     # → 6 replicas
# 最终: max(4, 6) = 6 replicas

4. 行为控制 (Behavior)

4.1 防抖与稳定窗口

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 分钟稳定窗口
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 4
        periodSeconds: 60
      selectPolicy: Min  # 取最小值, 更保守
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max  # 取最大值, 更激进

4.2 策略说明

参数描述
stabilizationWindowSeconds回溯窗口, 取期间最大/最小值
policies.typePods (绝对数) 或 Percent (百分比)
policies.periodSeconds策略生效时间段
selectPolicyMax, Min, Disabled

4.3 禁用缩容

spec:
  behavior:
    scaleDown:
      selectPolicy: Disabled

5. 自定义指标 HPA

5.1 Prometheus Adapter

# prometheus-adapter 配置
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  name:
    matches: "^(.*)_total$"
    as: "${1}_per_second"
  metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

5.2 使用自定义指标

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

5.3 外部指标

spec:
  metrics:
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: worker
      target:
        type: Value
        value: "30"

6. 指标采集链路

6.1 Metrics Server 要求

# 检查 Metrics Server
kubectl top pods
kubectl top nodes

# 验证 API
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

7. VPA (Vertical Pod Autoscaler)

7.1 HPA vs VPA

特性HPAVPA
扩展方向水平 (副本数)垂直 (资源配额)
适用场景流量波动资源调优
生产使用广泛谨慎 (需重启 Pod)

7.2 VPA 配置

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

7.3 HPA + VPA 协作

场景建议
CPU 扩缩仅使用 HPA
内存调优使用 VPA (Recommend 模式)
两者结合HPA 管 CPU, VPA 管 Memory

8. KEDA (Kubernetes Event-driven Autoscaling)

8.1 超越 HPA 的场景

  • 基于消息队列深度扩缩
  • 缩容到 0 副本
  • 基于 Cron 定时扩缩

8.2 ScaledObject

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaler
spec:
  scaleTargetRef:
    name: consumer
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
  - type: rabbitmq
    metadata:
      queueName: tasks
      queueLength: "10"

9. 最佳实践

9.1 资源 Requests 设置

HPA 的百分比计算基于 requests, 不是 limits:

# 正确: 设置合理的 requests
resources:
  requests:
    cpu: 200m    # HPA 基于此值计算
  limits:
    cpu: 1000m

9.2 应用启动时间

慢启动应用需要配合:

spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 120  # 等待新 Pod 就绪

9.3 Pod Disruption Budget

保护最小可用副本:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  minAvailable: 2  # 或 maxUnavailable: 1
  selector:
    matchLabels:
      app: web

10. 调试与监控

# 查看 HPA 状态
kubectl get hpa web-hpa -o wide

# 查看事件
kubectl describe hpa web-hpa

# 查看指标
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods" | jq

# 模拟负载
kubectl run -it load-generator --rm --image=busybox \
  -- /bin/sh -c "while true; do wget -q -O- http://web; done"

自动扩缩容不仅是节省成本的手段, 更是系统高可用防线的最后一道关卡.

On this page