본문 바로가기
클라우드(Cloud)/쿠버네티스(Kubernetes)

[Kubernetes] 쿠버네티스 리소스 제한으로 인한 OOM killed 트러블슈팅 (Resource Requirements)

by virusuk 2023. 5. 3.
반응형

컨테이너는 다음과 같이 리소스의 requests 및 limits 설정되어 있습니다.

  • limits - 10Mi
  • requests - 5Mi

하지만, 컨테이너에서 15M 메모리를 사용하고 있어 OOM이 발생할 수 있습니다.

apiVersion: v1
kind: Pod
metadata:
  name: elephant
  namespace: default
spec:
  containers:
  - args:
    - --vm
    - "1"
    - --vm-bytes
    - 15M
    - --vm-hang
    - "1"
    command:
    - stress
    image: polinux/stress
    imagePullPolicy: Always
    name: mem-stress
    resources:
      limits:
        memory: 10Mi
      requests:
        memory: 5Mi

 

"kubectl describe po elephant" 명령으로 elephant 파드를 점검합니다.

controlplane ~ ➜  kubectl describe po elephant 
Name:             elephant
Namespace:        default
Priority:         0
Service Account:  default
Node:             controlplane/172.25.0.22
Start Time:       Wed, 03 May 2023 00:57:29 +0000
Labels:           <none>
Annotations:      <none>
Status:           Running
IP:               10.42.0.10
IPs:
  IP:  10.42.0.10
Containers:
  mem-stress:
    Container ID:  containerd://d36dbf6e953fdcc45352f9e5216ff391240f4aafea2e7bf061eba576ef009bc7
    Image:         polinux/stress
    Image ID:      docker.io/polinux/stress@sha256:b6144f84f9c15dac80deb48d3a646b55c7043ab1d83ea0a697c09097aaad21aa
    Port:          <none>
    Host Port:     <none>
    Command:
      stress
    Args:
      --vm
      1
      --vm-bytes
      15M
      --vm-hang
      1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    1
      Started:      Wed, 03 May 2023 00:59:02 +0000
      Finished:     Wed, 03 May 2023 00:59:02 +0000
    Ready:          False
    Restart Count:  4
    Limits:
      memory:  10Mi
    Requests:
      memory:     5Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fbjt7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-fbjt7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m26s                default-scheduler  Successfully assigned default/elephant to controlplane
  Normal   Pulled     2m20s                kubelet            Successfully pulled image "polinux/stress" in 5.22084839s (5.220872444s including waiting)
  Normal   Pulled     2m18s                kubelet            Successfully pulled image "polinux/stress" in 415.016959ms (415.033096ms including waiting)
  Normal   Pulled     2m4s                 kubelet            Successfully pulled image "polinux/stress" in 491.98717ms (492.03185ms including waiting)
  Normal   Pulled     99s                  kubelet            Successfully pulled image "polinux/stress" in 483.63568ms (483.661985ms including waiting)
  Normal   Started    98s (x4 over 2m20s)  kubelet            Started container mem-stress
  Warning  BackOff    69s (x6 over 2m17s)  kubelet            Back-off restarting failed container mem-stress in pod elephant_default(60b4af6c-4277-4dc1-ba4c-52a2b7874f5b)
  Normal   Pulling    54s (x5 over 2m25s)  kubelet            Pulling image "polinux/stress"
  Normal   Pulled     53s                  kubelet            Successfully pulled image "polinux/stress" in 634.685952ms (634.711668ms including waiting)
  Normal   Created    53s (x5 over 2m20s)  kubelet            Created container mem-stress

 

리소스 제한으로 인한 OOMkilled 에러가 발생했습니다.

해결책은 메모리 limit을 20mi 으로 상향시켜줍니다.

apiVersion: v1
kind: Pod
metadata:
  name: elephant
  namespace: default
spec:
  containers:
  - args:
    - --vm
    - "1"
    - --vm-bytes
    - 15M
    - --vm-hang
    - "1"
    command:
    - stress
    image: polinux/stress
    imagePullPolicy: Always
    name: mem-stress
    resources:
      limits:
        memory: 20Mi
      requests:
        memory: 5Mi

 

리소스 제한으로 인한 OOM killed 이슈 관련 트러블슈팅을 해보았습니다.

반응형