[쿠버네티스] - Health Probes (Startup, Liveness, Readiness)

2025-12-26

2025-12-27

health-check, k8s, liveness, probe, readiness, startup

개요

쿠버네티스는 컨테이너의 상태를 모니터링하고 관리하기 위해 세 가지 유형의 Health Probe를 제공합니다:

Startup Probe: 컨테이너가 시작되었는지 확인
Liveness Probe: 컨테이너가 정상적으로 실행 중인지 확인
Readiness Probe: 컨테이너가 트래픽을 받을 준비가 되었는지 확인

Probe 메커니즘

모든 Probe는 다음 세 가지 방식 중 하나로 동작합니다:

1. HTTP GET 요청

httpGet:
  path: /healthz
  port: 8080
  httpHeaders:
  - name: Custom-Header
    value: Awesome
  scheme: HTTP  # or HTTPS

지정된 경로로 HTTP GET 요청을 보냅니다
200-399 범위의 상태 코드를 반환하면 성공
그 외의 코드는 실패로 간주

2. TCP Socket 연결

tcpSocket:
  port: 8080

지정된 포트로 TCP 연결을 시도합니다
연결이 성공하면 성공
연결이 실패하면 실패로 간주

3. Exec 명령 실행

exec:
  command:
  - cat
  - /tmp/healthy

컨테이너 내부에서 명령을 실행합니다
Exit code 0을 반환하면 성공
그 외의 코드는 실패로 간주

Probe 공통 설정

모든 Probe에서 사용할 수 있는 공통 파라미터:

initialDelaySeconds: 0      # 컨테이너 시작 후 probe 시작 전 대기 시간 (기본: 0초)
periodSeconds: 10           # probe 실행 간격 (기본: 10초)
timeoutSeconds: 1           # probe 타임아웃 시간 (기본: 1초)
successThreshold: 1         # 실패 후 성공으로 간주하기 위한 연속 성공 횟수 (기본: 1)
failureThreshold: 3         # 성공 후 실패로 간주하기 위한 연속 실패 횟수 (기본: 3)

📍 1. Startup Probe - 기동은 끝났니?

목적

컨테이너 애플리케이션이 시작되었는지 확인
시작이 느린 애플리케이션(레거시 애플리케이션 등)을 보호

특징

Startup Probe가 성공하기 전까지 Liveness와 Readiness Probe는 비활성화됩니다
Startup Probe가 실패하면 kubelet이 컨테이너를 종료하고 재시작 정책에 따라 처리합니다
Startup Probe는 한 번만 성공하면 더 이상 실행되지 않습니다

사용 시나리오

시작 시간이 오래 걸리는 애플리케이션 (예: JVM 기반 애플리케이션)
초기화 작업이 많은 애플리케이션
데이터베이스 마이그레이션이 필요한 애플리케이션

예제

apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-example
spec:
  containers:
  - name: app
    image: myapp:latest
    ports:
    - containerPort: 8080
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      failureThreshold: 30    # 최대 300초(10초 * 30회) 동안 시작 대기
      successThreshold: 1
      timeoutSeconds: 1

위 예제에서:

10초마다 /startup 엔드포인트를 확인
최대 300초(30번 실패)까지 시작을 기다립니다
1번 성공하면 애플리케이션이 시작된 것으로 간주

📍 2. Liveness Probe - 살아있니?

목적

컨테이너가 정상적으로 실행 중인지 확인
데드락, 무한 루프 등으로 응답하지 않는 컨테이너를 감지

특징

Liveness Probe가 실패하면 kubelet이 컨테이너를 종료하고 재시작 정책에 따라 처리합니다
컨테이너를 재시작하여 문제를 해결할 수 있는 경우에만 사용해야 합니다

사용 시나리오

데드락 상태에 빠질 수 있는 애플리케이션
메모리 누수로 인해 응답 불가 상태가 될 수 있는 애플리케이션
내부 상태가 손상되어 재시작이 필요한 경우

주의사항

너무 민감하게 설정하면 불필요한 재시작이 발생할 수 있습니다
Liveness Probe는 복구 가능한 상황이 아닌 경우에만 실패해야 합니다
외부 의존성(DB, 외부 API 등)을 체크하지 않는 것이 좋습니다

예제

apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-example
spec:
  containers:
  - name: app
    image: myapp:latest
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20
      failureThreshold: 3
      timeoutSeconds: 5

# TCP Socket 예제
livenessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10

# Exec 예제
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - pgrep -f myapp
  initialDelaySeconds: 15
  periodSeconds: 10

📍 3. Readiness Probe - 트래픽 받아도 되니?

목적

컨테이너가 트래픽을 받을 준비가 되었는지 확인
요청을 처리할 준비가 안 된 Pod로 트래픽이 전달되는 것을 방지

특징

Readiness Probe가 실패하면 Pod의 IP 주소가 Service의 엔드포인트에서 제거됩니다
컨테이너는 종료되지 않고 계속 실행됩니다
Probe가 다시 성공하면 엔드포인트에 다시 추가됩니다

사용 시나리오

초기 데이터 로딩이 필요한 애플리케이션
외부 서비스와의 연결이 필요한 애플리케이션
웜업 시간이 필요한 애플리케이션
일시적으로 과부하 상태인 경우 트래픽을 받지 않도록 설정

예제

apiVersion: v1
kind: Pod
metadata:
  name: readiness-probe-example
spec:
  containers:
  - name: app
    image: myapp:latest
    ports:
    - containerPort: 8080
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3
      successThreshold: 1

Readiness Gate

더 복잡한 준비 상태 로직이 필요한 경우 Readiness Gate를 사용할 수 있습니다:

apiVersion: v1
kind: Pod
metadata:
  name: pod-with-readiness-gate
spec:
  readinessGates:
  - conditionType: "www.example.com/feature-1"
  containers:
  - name: app
    image: myapp:latest

세 가지 Probe 조합 사용

실제 프로덕션 환경에서는 세 가지 Probe를 함께 사용하는 것이 권장됩니다:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        ports:
        - containerPort: 8080

        # Startup Probe: 애플리케이션이 시작될 때까지 최대 5분 대기
        startupProbe:
          httpGet:
            path: /startup
            port: 8080
          periodSeconds: 10
          failureThreshold: 30

        # Liveness Probe: 애플리케이션이 살아있는지 확인
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 0  # startup probe가 끝나면 바로 시작
          periodSeconds: 10
          failureThreshold: 3
          timeoutSeconds: 5

        # Readiness Probe: 트래픽을 받을 준비가 되었는지 확인
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 0
          periodSeconds: 5
          failureThreshold: 3
          successThreshold: 1

Probe 동작 흐름

컨테이너 시작
    ↓
Startup Probe 실행 (성공할 때까지)
    ↓
Liveness Probe 활성화 (주기적으로 실행)
Readiness Probe 활성화 (주기적으로 실행)
    ↓
┌─────────────────────────────────────┐
│ Liveness 실패 → 컨테이너 재시작     │
│ Readiness 실패 → Service에서 제거   │
└─────────────────────────────────────┘

Spring Boot 애플리케이션 예제

Spring Boot 2.3+ 버전은 기본적으로 Health Endpoint를 제공합니다:

1. 의존성 추가

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

2. application.yml 설정

management:
  endpoint:
    health:
      probes:
        enabled: true
      show-details: always
  health:
    livenessState:
      enabled: true
    readinessState:
      enabled: true

3. Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spring-app
  template:
    metadata:
      labels:
        app: spring-app
    spec:
      containers:
      - name: spring-app
        image: spring-app:latest
        ports:
        - containerPort: 8080

        startupProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          periodSeconds: 10
          failureThreshold: 30

        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          periodSeconds: 10
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          periodSeconds: 5
          failureThreshold: 3

4. 커스텀 Health Indicator

import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

@Component
public class CustomHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        // 커스텀 헬스 체크 로직
        boolean isHealthy = checkDatabaseConnection() && checkExternalService();

        if (isHealthy) {
            return Health.up()
                .withDetail("database", "connected")
                .withDetail("externalService", "available")
                .build();
        }

        return Health.down()
            .withDetail("reason", "Database or external service unavailable")
            .build();
    }

    private boolean checkDatabaseConnection() {
        // DB 연결 체크 로직
        return true;
    }

    private boolean checkExternalService() {
        // 외부 서비스 체크 로직
        return true;
    }
}

Best Practices

1. Startup Probe

시작 시간이 긴 애플리케이션에만 사용
failureThreshold * periodSeconds = 최대 시작 시간보다 크게 설정
한 번 성공하면 더 이상 실행되지 않으므로 충분한 여유를 둘 것

2. Liveness Probe

가벼운 체크를 수행 (데이터베이스 연결 등 외부 의존성 제외)
재시작으로 해결할 수 있는 문제만 감지
initialDelaySeconds는 Startup Probe 사용 시 0으로 설정
failureThreshold를 너무 낮게 설정하지 말 것 (false positive 방지)

3. Readiness Probe

외부 의존성(DB, 캐시, 외부 API) 체크 포함 가능
periodSeconds를 짧게 설정하여 빠른 복구 가능하도록 설정
배포 중 무중단 배포를 위해 필수적

4. 공통

/health 엔드포인트는 인증 없이 접근 가능하도록 설정
Probe 엔드포인트는 로깅을 제외하거나 레벨을 낮게 설정 (로그 과다 생성 방지)
프로덕션 환경에서는 세 가지 Probe를 모두 설정하는 것을 권장

트러블슈팅

CrashLoopBackOff

# Pod 상태 확인
kubectl describe pod <pod-name>

# 로그 확인
kubectl logs <pod-name> --previous

원인:

Liveness Probe 실패로 인한 반복적인 재시작
Startup Probe의 failureThreshold 설정이 너무 낮음

해결:

Probe 설정을 완화 (failureThreshold, timeoutSeconds 증가)
애플리케이션 시작 시간 최적화

Pod가 Service에 등록되지 않음

# Service 엔드포인트 확인
kubectl get endpoints <service-name>

원인:

Readiness Probe 실패

해결:

Readiness Probe 엔드포인트 확인
애플리케이션 로그에서 에러 확인
외부 의존성(DB, API) 연결 상태 확인

Kubernetes워크로드

개요

Probe 메커니즘

1. HTTP GET 요청

2. TCP Socket 연결

3. Exec 명령 실행

Probe 공통 설정

📍 1. Startup Probe - 기동은 끝났니?

목적

특징

사용 시나리오

예제

📍 2. Liveness Probe - 살아있니?

목적

특징

사용 시나리오

주의사항

예제

📍 3. Readiness Probe - 트래픽 받아도 되니?

목적

특징

사용 시나리오

예제

Readiness Gate

세 가지 Probe 조합 사용

Probe 동작 흐름

Spring Boot 애플리케이션 예제

1. 의존성 추가

2. application.yml 설정

3. Kubernetes Deployment

4. 커스텀 Health Indicator

Best Practices

1. Startup Probe

2. Liveness Probe

3. Readiness Probe

4. 공통

트러블슈팅

CrashLoopBackOff

Pod가 Service에 등록되지 않음

참고 자료