IT/Kubernetes

로그로 확인한 EKS autoscaler 동작

엘티엘 2022. 11. 13. 10:19

EKS 에서 필수로 설정하는 Cluster Autoscaler 의 동작을 자세히 살펴보자. 마땅한 소개문서가 없어서 출력되는 로그를 보면서 동작을 분석해봤다.

Cluster Autoscaler Logs 확인

Cluster Autoscaler를 정상적으로 설치했다면 아래 명령어로 log를 확인할수 있다.

$ kubectl logs -f --tail=100 -n kube-system deployment.apps/cluster-autoscaler

로그를 확인해보면, 10초마다 출력된다. 즉, 10초마다 Node의 상태를 확인하고 scale-up/down 을 판단하여 동작한다. 주기를 변경하고 싶다면, "--scan-interval" 옵션으로 변경 가능하다. (아래 help 참고)

Logs 를 살펴보자

전체 로그중 일부를 추려보았다.  로그내용과 파일명을 확인하면 대략적인 의미를 파악할수 있다.

I1113 00:15:10.817120       1 flags.go:57] FLAG: --add-dir-header="false"
I1113 00:15:10.817156       1 flags.go:57] FLAG: --address=":8085"
I1113 00:15:10.817159       1 flags.go:57] FLAG: --alsologtostderr="false"
I1113 00:15:10.817163       1 flags.go:57] FLAG: --aws-use-static-instance-list="true"
I1113 00:15:10.817165       1 flags.go:57] FLAG: --balance-similar-node-groups="false"
I1113 00:15:10.817168       1 flags.go:57] FLAG: --balancing-ignore-label="[]"
I1113 00:15:10.817172       1 flags.go:57] FLAG: --cloud-config=""
I1113 00:15:10.817175       1 flags.go:57] FLAG: --cloud-provider="aws"
...
I1113 00:15:33.314721       1 auto_scaling_groups.go:378] Regenerating instance to ASG map for ASGs: [asg-xxxx]
I1113 00:15:34.218094       1 auto_scaling_groups.go:152] Registering ASG asg-xxxx
...
I1113 00:15:36.514816       1 auto_scaling_groups.go:426] Extracted autoscaling options from "asg-xxxxx" ASG tags: map[]
...
W1113 00:15:46.716934       1 clusterstate.go:432] AcceptableRanges have not been populated yet. Skip checking
...
I1113 00:15:46.717187       1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
...
I1113 00:15:46.717634       1 klogx.go:86] Pod pod-xxxxx is unschedulable
...
I1113 00:15:46.717945       1 scale_up.go:300] Pod pod-xxxxx, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I1113 00:15:46.717999       1 scale_up.go:449] No pod can fit to nodegroup-xxxx
...
I1113 00:15:57.814604       1 scale_down.go:444] Node node-xxxxx is not suitable for removal - cpu utilization too big (0.606230)
I1113 00:15:57.814639       1 scale_down.go:444] Node node-xxxxx1 is not suitable for removal - memory utilization too big (0.818781)
...
I1113 00:15:57.816128       1 cluster.go:148] Fast evaluation: node-xxxx for removal
I1113 00:15:57.816317       1 cluster.go:360] Pod pod-xxxx can be moved to node-xxxx1
...
I1113 00:15:57.818303       1 scale_down.go:613] 1 nodes found to be unremovable in simulation, will re-check them at 2022-11-13 00:20:57.524027474 +0000 UTC m=+347.106527308
I1113 00:15:57.818342       1 static_autoscaler.go:509] node-xxxxx is unneeded since 2022-11-13 00:15:46.715549789 +0000 UTC m=+36.298049691 duration 10.808477617s
I1113 00:15:57.818359       1 static_autoscaler.go:509] node-xxxxx1 is unneeded since 2022-11-13 00:15:46.715549789 +0000 UTC m=+36.298049691 duration 10.808477617s
I1113 00:15:57.818382       1 static_autoscaler.go:520] Scale down status: unneededOnly=false lastScaleUpTime=2022-11-12 23:15:36.71435708 +0000 UTC m=-3573.703143078 lastScaleDownDeleteTime=2022-11-12 23:15:36.71435708 +0000 UTC m=-3573.703143078 lastScaleDownFailTime=2022-11-12 23:15:36.71435708 +0000 UTC m=-3573.703143078 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1113 00:15:57.818397       1 static_autoscaler.go:533] Starting scale down
I1113 00:15:57.818503       1 scale_down.go:829] node-xxxx was unneeded for 10.808477617s
I1113 00:15:57.818517       1 scale_down.go:829] node-xxxx was unneeded for 10.808477617s
I1113 00:15:57.818545       1 scale_down.go:918] No candidates for scale down
...

 

참고

cluster-autoscaler help

링크는 cluster autoscaler에서 사용할수 있는 Flag 와 default 값이 정리되어 있다.

not working scaling up / down

Scale 예상처럼 동작하지 않는다면 unremovable node 이거나, scale down status가 만족하지 않았을 경우이다. 해당 로그를 살펴보자.

 

반응형