[자동화] AI 개발환경 구축기 #4-Kubernetes Cluster(automate with Ansible)

머신러닝과 딥러닝, 그리 AI(인공지능)!!!

AI 서비스를 개발하기 위한 환경을 만들어본다.

최종 목표는 Kubeflow 기반의 AutoML 환경 구성이고, IaC(Infra as Code) 기반으로 Automation 하는것이 목적이다.

4편 Ansible를 활용한 Kubernetes Cluster 자동화

지금까지 VirtualBox의 가상머신 Resource 관리를 위해 Vagrant를 적용하였고, 가상머신 내 여러 SW와 Application를 배포, 실행하기 위해 Ansible 을 적용하여 기본적인 IaC(Infrastructure as Code) 환경을 구성하는 작업을 진행하고 있다. Bigdata, Machine Leaning, AI 등 4차 산업에서 이슈가 되고있는 최신 IT 환경은 점점 더 많은 컴퓨팅 리소스와 머신 간 자유로운 커뮤니케이션 환경이 요구되고 있다.

여러 다양한 SW와 Application은 MSA(Microservice Architecture)를 통해 기능이 단순화되는 대신에 기능 모듈화와 모듈간 통신의 복잡성이 증가하고 있으며, SW 역시 안정성과 복구성을 위해 복잡한 Clustering 방식이 적용되고 있다.

특히 빅-데이터와 인공지능과 관련된 Appilcation이나 Service를 개발하기 위해서는 더 많은 SW와 모듈들을 유기적으로 사용할 수 있는 환경이 필요하다.

이러한 환경을 위해 Kubernetes(이하 k8s)를 활용하고자 하며, k8s 클러스터의 생성과 관리를 Code Base로 가능하도록 Ansible로 자동화해 보았다.

현재까지의 구성

1. Vagrantfile

Vagrant.configure("2") do |config|
common = <<-SCRIPT ----------------------------------------------------------------------- A
if ! grep -q ansible /etc/hosts; then sudo echo "192.168.2.21 ansible" >> /etc/hosts; fi
if ! grep -q node01 /etc/hosts; then sudo echo "192.168.2.22 node01" >> /etc/hosts; fi
if ! grep -q node02 /etc/hosts; then sudo echo "192.168.2.23 node02" >> /etc/hosts; fi
if ! grep -q node03 /etc/hosts; then sudo echo "192.168.2.24 node03" >> /etc/hosts; fi
if ! grep -q node04 /etc/hosts; then sudo echo "192.168.2.25 node04" >> /etc/hosts; fi

if ! id vagrant > /dev/null 2>&1 ; then sudo useradd -G 10 -m -p $(openssl passwd -1 vagrant) vagrant; fi

if [ ! -f /etc/sudoers.d/vagrant ] ; then sudo echo "vagrant ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/vagrant ; fi

sudo apt-get -y install vim tree net-tools telnet git python3-pip
sudo ufw status
sudo ufw disable
SCRIPT

controlz_setup = <<-SCRIPT ----------------------------------------------------------------------- B
KUB_PATH=/home/vagrant/kubespray
sudo apt-get install -y ansible

if [ ! -f "$KUB_PATH/ansible.cfg" ]; then git clone https://github.com/kubernetes-sigs/kubespray.git; fi
if ! grep -q remote_user /home/vagrant/kubespray/ansible.cfg; then LINE=$(grep -n defaults /home/vagrant/kubespray/ansible.cfg | cut -d: -f1); echo 'LINE===>'$LINE; sudo perl -p -i -e '$.=='$LINE+1' and print "remote_user = vagrant\nprivate_key_file = /home/vagrant/.ssh/id_rsa\n"' /home/vagrant/kubespray/ansible.cfg; echo "[privilege_escalation]" >> $KUB_PATH/ansible.cfg; echo "become=true" >> $KUB_PATH/ansible.cfg; echo "become_method=sudo" >> $KUB_PATH/ansible.cfg; echo "become_user=root" >> $KUB_PATH/ansible.cfg; echo "become_ask_pass=false" >> $KUB_PATH/ansible.cfg; cd $KUB_PATH && pip3 install -r requirements.txt; fi
SCRIPT

nodes_setup = <<-SCRIPT ---------------------------------------------------------------------- C
sed -i -e 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config
sed -i -e 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config
service ssh restart
SCRIPT

config.vm.box = "ubuntu/bionic64"

config.vm.define "node01" do |node1| ------------------------------------------------------------- D
node1.vm.hostname = "node01"
node1.vm.network "private_network", ip: "192.168.2.22"
node1.vm.provider "virtualbox" do |v|
v.customize [ "modifyvm", :id, "--cpus", "2" ]
v.customize [ "modifyvm", :id, "--memory", "4096" ]
end
node1.vm.provision:shell, :inline => nodes_setup
end
config.vm.define "node02" do |node2|
...
# 노드 1번 부터 N번 까지는 실제 Worker 노드임
# Worker 노드는 원하는 만큼 추가하여 사용 가능
end
config.vm.define "ansible" do |ansible| -------------------------------------------------------------- E
ansible.vm.hostname = "ansible"
ansible.vm.network "private_network", ip: "192.168.2.21"
ansible.vm.provision:shell, :inline => common
ansible.vm.provision:shell, :inline => controlz_setup
ansible.vm.provision:file, source: "ansible-control-play.yml", destination: "ansible-control-play.yml"
ansible.vm.provision:shell, inline: "ansible-playbook -i /home/vagrant/kubespray/inventory/my-k8s/hosts ansible-control-play.yml"
ansible.vm.provision:file, source: "ansible-all-play.yml", destination: "ansible-all-play.yml"
ansible.vm.provision:shell, inline: "ansible-playbook -i /home/vagrant/kubespray/inventory/my-k8s/hosts ansible-all-play.yml"
ansible.vm.provision:file, source: "k8s-control-play.yml", destination: "k8s-control-play.yml"
ansible.vm.provision:shell, inline: "ansible-playbook -i /home/vagrant/kubespray/inventory/my-k8s/hosts k8s-control-play.yml"
ansible.vm.provision:file, source: "k8s-all-play.yml", destination: "k8s-all-play.yml"
ansible.vm.provision:shell, inline: "cd ./kubespray && ansible-playbook -i ./inventory/my-k8s/hosts ../k8s-all-play.yml"
ansible.vm.provision:shell, inline: "cd /home/vagrant/kubespray && ansible-playbook --flush-cache -i /home/vagrant/kubespray/inventory/my-k8s/hosts cluster.yml -v"
ansible.vm.provision:file, source: "k8s-kubectl-play.yml", destination: "k8s-kubectl-play.yml"
ansible.vm.provision:shell, inline: "cd ./kubespray && ansible-playbook -i ./inventory/my-k8s/hosts ../k8s-kubectl-play.yml"
end
end

[주요설명]

A> common : 모든 노드를 대상으로 공통적인 설정을 위한 스크립트

호스트정보를 등록한다.
vagrant 계정에 대한 sudo 권한을 할당한다.
방화벽을 disable 한다.

B> controlz_setup : ansible contol 노드 설정을 위한 스크립트

ansible을 설치한다.
k8s 설치 패키지(kubespray)를 다운로드한다.
ansible 사용을 위한 remote_user, private_key_file, privilege escalation 설정을 추가하여 ansible.cfg 파일을 수정한다.

C> node_setup : worker 노드의 공통적인 설정을 위한 스크립트

SSH 접속을 환경을 설정한다.

D> worker 노드(node 1번 부터 N번 까지)를 생성한다.

E> ansible control 노드를 생성한다.

Ansible Playbook "ansible-control-play.yml"을 실행한다.
Ansible Playbook "ansible-all-play.yml"을 실행한다.
Ansible Playbook "k8s-control-play.yml"을 실행한다.
Ansible Playbook "k8s-all-play.yml"을 실행한다.
Ansible Playbook "cluster.yml"을 실행한다. (cluster.yml을 Kubespray가 제공하는 플레이북임)
Ansible Playbook "k8s-kubectl-play.yml"을 실행한다.

2. [플레이북] ansible-control-play.yml

ansible control 노드에만 적용한다.

---
- name: Setup ansible control node
hosts: localhost
become: no
become_user: vagrant
gather_facts: no
vars:
ansible_password: vagrant
ansible_python_interpreter: /usr/bin/python3

tasks:
- debug: var=ansible_host

# 다운로드한 kubespray의 sample을 복사하여 별도의 inventory로 사용
- name: copy sample
copy:
src: /home/vagrant/kubespray/inventory/sample/
dest: /home/vagrant/kubespray/inventory/my-k8s
- name: empty inventory file(hosts)
file:
state: file
path: /home/vagrant/kubespray/inventory/my-k8s/hosts
state: touch
run_once: true

# 구축할 k8s 클러스터의 노드정보(inventory)를 작성
- name: write inventory file(hosts)
blockinfile:
path: /home/vagrant/kubespray/inventory/my-k8s/hosts
block: |
[all]
node01 ansible_host=node01 ip=192.168.2.22 etcd_member_name=etcd1
node02 ansible_host=node02 ip=192.168.2.23 etcd_member_name=etcd2
node03 ansible_host=node03 ip=192.168.2.24 etcd_member_name=etcd3
node04 ansible_host=node04 ip=192.168.2.25

[kube-master]
node01

[etcd]
node01
node02
node03

[kube-node]
node02
node03
node04

[k8s-cluster:children]
kube-master kube-node

[all:vars]
ansible_python_interpreter=/usr/bin/python3
- name: set bashrc file
lineinfile:
path: /home/vagrant/.bashrc
line: "{{ item }}"
with_items:
- "alias ans='ansible'"
- "alias anp='ansible-playbook'"
- "alias ang='ansible-galaxy'"

# SSH 접속을 위한 인증키 생성 및 known_host에 공개키 정보 등록
# 인증키는 vagrant 사용자로 생성
# 공개키는 ip와 hostname 모두 등록
- name: install sshpass
command: "apt install -y sshpass"
run_once: true
- name: ssh-keygen for vagrant user
become: true
command: "ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N ''"
ignore_errors: true
run_once: true
- name: key-scan with hostname
command: /usr/bin/ssh-keyscan -t ecdsa {{ ansible_host }}
register: key_hostname
- name: put key_hostname on known_hosts
lineinfile:
path: ~/ssh/known_hosts
line: "{{ item }}"
create: true
with_items:
- "{{ key_hostname.stdout_lines }}"
- name: key-scan with hostip
command: /usr/bin/ssh-keyscan -t ecdsa {{ ansible_host }}
register: key_hostip
- name: put key_hostip on known_hosts
lineinfile:
path: ~/ssh/known_hosts
line: "{{ item }}"
create: true
with_items:
- "{{ key_hostip.stdout_lines }}"

3. [플레이북] ansible-all-play.yml

ansible control 노드를 제외한 모든 노드에 적용한다.

---
- name: Setup Ansible to each node
hosts: all
connection: local
become: no
become_user: vagrant
serial: 1
gather_facts: no
vars:
ansible_password: vagrant
ansible_python_interpreter: /usr/bin/python3 tasks:

- debug: var=ansible_host

# Vagrantfile의 node_setup script를 통해 모든 노드의 /etc/ssh/sshd_config 파일을 수정하였음
# 모든 노드의 ssh를 재기동하여 ssh 설정을 적용함
- name: restart ssh service
service:
name: ssh
state: restarted
run_once: true

# vagrant 사용자의 공개키를 모든 노드에 배포
- name: copy public key to each node
become: true
command: sshpass -p {{ ansible_password }} ssh-copy-id -i ~/.ssh/id_rsa.pub vagrant@{{ item }} -f -o StrictHostKeyChecking=no
with_items:
- "{{ ansible_host }}"
run_once: true

4. [플레이북] k8s-control-play.yml

k8s 마스터 노드로 node01을 사용한다.(ansible-control-play.yml 에서 inventory 생성함) kubespray의 플레이북은 root 계정으로 실행되기 때문에

---
- name: Setup Kuberenetes Master
hosts: kube-master
connection: local
gather_facts: no
vars:
ansible_password: vagrant
ansible_python_interpreter: /usr/bin/python3
k8s_work_home: /home/vagrant/kubespray/inventory/my-k8s

tasks:

- debug: var=ansible_host

# Root 사용자의 ssh 인증키를 생성하고, k8s 설치를 위한 설저을 추가
- name: ssh-keygen for root user
command: "ssh-keygen -f ~/.ssh/id_rsa -q -N ''"
ignore_errors: true
run_once: true
- name: use flannel interface for k8s' network
lineinfile:
path: "{{ k8s_work_home }}/group_vars/k8s-cluster/k8s-net-flannel.yml"
regexp: '^# flannel_interface:$'
line: 'flannel_interface: enp0s8'
- name: enable addons helm metrics_server ingress_controller
lineinfile:
path: "{{ k8s_work_home }}/group_vars/k8s-cluster/addons.yml"
regexp: '{{ item.from }}'
line: '{{ item.to }}'
state: present
with_items:
- { from: 'helm_enabled: false', to: 'helm_enabled: true' }
- { from: 'metrics_server_enabled: false', to: 'metrics_server_enabled: true' }
- { from: 'ingress_nginx_enabled: false', to: 'ingress_nginx_enabled: true' }
- name: change proxy mode ipvs -> iptables
lineinfile:
path: "{{ k8s_work_home }}/group_vars/k8s-cluster/k8s-cluster.yml"
regexp: '^kube_proxy_mode: ipvs'
line: 'kube_proxy_mode: iptables'

5. [플레이북] k8s-all-play.yml

root 계정으로 kubespray를 실행하기 위하여 모든 노드에 root의 공개키를 등록한다.

---
- name: Setup Kubenetes to each node
hosts: all
connection: local
serial: 1
gather_facts: no
vars:
ansible_password: vagrant
ansible_python_interpreter: /usr/bin/python3

tasks:

- debug: var=ansible_host
- name: copy public key of root to each node
become: false
command: "sudo sshpass -p {{ ansible_password }} ssh-copy-id -i ~/.ssh/id_rsa.pub vagrant@{{ item }} -f -o StrictHostKeyChecking=no"
with_items:
- "{{ ansible_host }}"
run_once: true

6. [플레이북] cluster.yml

cluster.yml을 kubespray가 제공하는 k8s 설치 플레이북이다. git을 통해 다운로드한 kubespray의 sample을 이용하여 새로운 inventory를 생성하였다. 새롭게 생성한 inventory를 사용하여 cluster.yml 플레이북을 실행하면 쿠버네티스 클러스터가 생성된다.

7. [플레이북] k8s-kubectl-play.yml

k8s 클러스터 생성 후 관리를 위한 kubectl 패키지를 설치하고, node01(Kubernetes Master)노드에서 kubectl 명령이 수행되도록 설정한다.

---
- name: Setup kubectl
hosts: kube-master
gather_facts: no

tasks:

- debug: var=ansible_host
- name: create .kube dir
file:
state: directory
path: /home/vagrant/.kube
- name: copy admin.conf -> .kube/config
copy:
src: /etc/kubernetes/admin.conf
remote_src: true
dest: /home/vagrant/.kube/config
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
mode: 0644

설치 결과

이제 쿠버네티스 생성은 자동화가 가능하게 되었다. 물론 k8s의 pod를 생성하고 관리하기 위해서는 docker에 대한 관리 자동화가 추가되어야 할 것이다.

k8s 설치에 대한 자세한 설명은 "3편"에 정리하였기 때문에 "4편"에서는 소스 위주로 정리한다.

우선 k8s 사용이 가능하게 되었으니, AI 서비스 개발을 위한 학습/배포 환경을 계속해서 자동화할 예정이다.

AI 개발환경 구축 Git 소스(전체)

5편 k8s 클러스터에 Kubeflow 자동화

'Automation > system' 카테고리의 다른 글

[Tools] Windows Terminal 설치 (0)	2022.03.18
[자동화] AI 개발환경 구축기 #5-Kubernetes Cluster(Dashboard) (0)	2021.04.26
[자동화] AI 개발환경 구축기 #3-Kubernetes Cluster(with kubespray) (0)	2021.03.31
[자동화] AI 개발환경 구축기 #2-IaC Automation(Ansible) (0)	2021.03.11
[자동화] AI 개발환경 구축기 #1-가상환경 만들기(VirtualBox & Vagrant) (0)	2021.03.09

겨울나기 바캉스(by 전성재)

[자동화] AI 개발환경 구축기 #4-Kubernetes Cluster(automate with Ansible)

'Automation > system' 카테고리의 다른 글

티스토리툴바

[자동화] AI 개발환경 구축기 #4-Kubernetes Cluster(automate with Ansible)

'Automation > system' 카테고리의 다른 글

'Automation/system' Related Articles

티스토리툴바