airflow搭建
由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行。以 root 用户运行。阿里云的源头,如果用 豆瓣的。的会出现安装时版本依赖的问题。
1. 本地部署
1. 依赖
2. 升级Python3.7
以 root 用户运行
#!/bin/bash
# File: upgrade_python37.sh
# User: root
# Os: CentOS 7.9
# 1. Install required package
yum install -y gcc gcc-c++ python-devel openssl-devel zlib-devel readline-devel libffi-devel sqlite-devel libffi-devel wget
# 2. Install Python-3.7
#wget https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/Python-3.7.10.tar.xz
tar xf Python-3.7.10.tar.xz
cd Python-3.7.10
#./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl --enable-loadable-sqlite-extensions
./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl
make -j 4
make install
# 3. Link python3.7 to python
unlink /usr/bin/python
ln -sv /usr/local/python37/bin/python3.7 /usr/bin/python
# 4. Add pip.conf file
#cat > /etc/pip.conf << EOF
#[global]
#trusted-host = pypi.douban.com
#index-url = http://pypi.douban.com/simple
#[list]
#format=columns
#EOF
cat > /etc/pip.conf << EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/
[list]
format=columns
EOF
# 5. Add local env
echo export PATH="/usr/local/python37/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
pip3.7 install --upgrade pip==20.2.4 # fix https://github.com/apache/airflow/issues/12838
# 6. View version
python --version
pip3.7 --version
注意事项:
由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行
"""
1. vi /usr/bin/yum
2. vi /usr/libexec/urlgrabber-ext-down
3. vi /usr/bin/yum-config-manager
"""
3. 部署MySQL 5.7数据库
1. 安装
#!/bin/bash
# File: install_mysql57.sh
# User: root
# Os: CentOS 7.9
# Reference: https://tecadmin.net/install-mysql-5-7-centos-rhel/
# 1. Install yum source
yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm
# 2. Install mysql
#yum install -y mysql-community-server
#yum localinstall *.rpm # yum install --downloadonly --downloaddir=./ mysql-community-server
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/mysql-yum-57.tar.gz
tar xf mysql-yum-57.tar.gz
cd mysql-yum-57
yum localinstall -y *.rpm
2. 启动数据库
# 1. Start mysql
systemctl start mysqld.service
# 2. view mysql login password
grep 'A temporary password' /var/log/mysqld.log |tail -1
# 3. set secure option
/usr/bin/mysql_secure_installation
# 4. view version
mysql -V
"""
echo explicit_defaults_for_timestamp=1 >> /etc/my.cnf
systemctl restart mysqld.service
"""
3. 创建数据库
> mysql -uroot -p<xxx>
set global validate_password_policy=LOW;
set global validate_password_length=6;
alter user user() identified by "123456";
CREATE DATABASE `airflow` /*!40100 DEFAULT CHARACTER SET utf8 */;
CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'localhost';
FLUSH PRIVILEGES;
4. 部署Redis 6.x数据库
1. 安装
# 1. Install remi yum repo
yum install -y epel-release yum-utils
yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum-config-manager --enable remi
# 2. Install redis latest version
yum install -y redis
2. 配置
# vi /etc/redis.conf
bind 0.0.0.0
3. 启动
# 1. Start redis
systemctl start redis && systemctl enable redis
systemctl status redis
# 2. View redis
ps -ef |grep redis
# 3. Test
redis-cli ping
# 4. View version
redis-cli --version
5. 部署airflow
1. 安装
# 1. Set env
export AIRFLOW_HOME=~/airflow
# 2. Install apache-airflow 2.1.0
AIRFLOW_VERSION=2.1.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.6.txt # 有可能网络不通,见<<3. 注意事项>>
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
2. 初始化数据库
# 1. Set up database
## https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#
pip3.7 install pymysql
airflow config get-value core sql_alchemy_conn # Error, but create ~/airflow directory
# 2. Initialize the database
"""
# vi ~/airflow/airflow.cfg
[core]
sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@localhost:3306/airflow
"""
airflow db init
"""
...
Initialization done
"""
3. 创建用户
# Create superuser
airflow users create \
--username admin \
--firstname zheng \
--lastname yansheng \
--role Admin \
--email zhengyansheng@gmail.com
4. 启动服务
# start the web server, default port is 8080 -D
airflow webserver --port 8080
# start the scheduler
# open a new terminal or else run webserver with ``-D`` option to run it as a daemon
airflow scheduler
# visit localhost:8080 in the browser and use the admin account you just
# created to login. Enable the example_bash_operator dag in the home page
5. 管理后台
Web admin
Dashboard
6. 分布式部署
1. 安装
pip install 'apache-airflow[celery]'
pip install celery[redis]
2. 设置executor
[core]
# The executor class that airflow should use. Choices include
# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, ``DaskExecutor``,
# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
# full import path to the class when using a custom executor.
# executor = SequentialExecutor
executor = CeleryExecutor
[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://localhost:6379/0
# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://localhost:6379/0
3. 启动
# 1. Start webserver
airflow webserver -p 8000
# 2. Start scheduler
airflow scheduler
# 3. Start celery worker
airflow celery worker
# 4. Start celery flower
airflow celery flower
4. 管理页面
Webserver
flower
7. 演示
启动Dag
5. HA 安装airflow
1master1worker总体安装脚本示例
#####airflow高可用搭建(airflow version 2.4.0 )
#####前提:纯净centos7系统
# User: root
# Os: CentOS 7.9
#workdir: /server/
#安装airflow需要的系统依赖
#yum install -y mysql-devel gcc gcc-c++ gcc-devel python-devel openssl-devel zlib-devel readline-devel libffi-devel wget cyrus-sasl-lib python3-devel cyrus-sasl cyrus-sasl-devel libffi-devel yum-utils
#一、升级python为python3
# 1. Install required package
yum install -y mysql-devel gcc gcc-c++ gcc-devel python-devel openssl-devel zlib-devel readline-devel libffi-devel wget cyrus-sasl-lib python3-devel cyrus-sasl cyrus-sasl-devel libffi-devel vim yum-utils
# 2. Install Python-3.7
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/Python-3.7.10.tar.xz
tar xf Python-3.7.10.tar.xz
cd Python-3.7.10
./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl
make -j 4
make install
# 3. Link python3.7 to python
unlink /usr/bin/python
ln -sv /usr/local/python37/bin/python3.7 /usr/bin/python
# 4. Add pip.conf file
cat > /etc/pip.conf << EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/
[list]
format=columns
EOF
# 5. Add local env
echo export PATH="/usr/local/python37/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
pip3.7 install --upgrade pip==20.2.4 # fix https://github.com/apache/airflow/issues/12838
# 6. View version
python --version
pip3.7 --version
注意事项:
由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行 python >> python2.7
"""
1. vi /usr/bin/yum
2. vi /usr/libexec/urlgrabber-ext-down
3. vi /usr/bin/yum-config-manager
"""
##二、部署MySQL 5.7数据库
# 1. Install yum source
yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm
# 2. Install mysql
#yum install -y mysql-community-server
#yum localinstall *.rpm # yum install --downloadonly --downloaddir=./ mysql-community-server
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/mysql-yum-57.tar.gz
tar xf mysql-yum-57.tar.gz
cd mysql-yum-57
yum localinstall -y *.rpm
#3.install
sudo yum install mysql-server --nogpgcheck
#4.start mysql
# 1. Start mysql
systemctl start mysqld.service
# 2. view mysql login password
grep 'A temporary password' /var/log/mysqld.log |tail -1
# 3. set secure option
/usr/bin/mysql_secure_installation
# 4. view version
mysql -V
echo explicit_defaults_for_timestamp=1 >> /etc/my.cnf
systemctl restart mysqld.service
#5创建数据库
> mysql -uroot -p<xxx>
set global validate_password_policy=LOW;
set global validate_password_length=6;
alter user user() identified by "123456";
CREATE DATABASE `airflow` /*!40100 DEFAULT CHARACTER SET utf8 */;
CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'localhost';
#可供其他worker访问连接数据库
CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'%';
FLUSH PRIVILEGES;
#三、部署Redis 6.x数据库
# 1. Install remi yum repo
yum install -y epel-release yum-utils
yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum-config-manager --enable remi
# 2. Install redis latest version
yum install -y redis
#3.配置
# vi /etc/redis.conf
bind 0.0.0.0
#redis的保护模式关闭
# 受保护模式, 默认是开启的
# protected-mode yes
protected-mode no
#4.启动
# 1. Start redis
systemctl start redis && systemctl enable redis
systemctl status redis
# 2. View redis
ps -ef |grep redis
# 3. Test
redis-cli ping
# 4. View version
redis-cli --version
#四、安装airflow2.4.0
# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow
# Install Airflow using the constraints file
AIRFLOW_VERSION=2.4.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}"
#五、初始化数据库
# 1. Set up database
## https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#
pip3.7 install pymysql
airflow config get-value core sql_alchemy_conn # Error, but create ~/airflow directory
# 2. Initialize the database
"""
# vi ~/airflow/airflow.cfg
[core]
executor = CeleryExecutor
sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@localhost:3306/airflow
[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://localhost:6379/0
# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://localhost:6379/0
"""
airflow db init
"""
...
Initialization done
"""
五、1 部署高可用需要安装celery(master、worker节点都需要安装)
pip3 install 'apache-airflow[mysql]'
pip3 install 'apache-airflow[celery]'
pip3 install 'apache-airflow[redis]'
#六、创建用户
# Create superuser
airflow users create \
--username admin \
--firstname admin \
--lastname admin \
--role Admin \
--email admin@admin.com
#七、启动服务(master)
# 1. Start webserver
#airflow webserver -p 8000 #default port 8000
airflow webserver
# 2. Start scheduler
airflow scheduler
# 3. Start celery worker
airflow celery worker
# 4. Start celery flower
airflow celery flower
#八、配置airflow.cfg并启动服务(worker 内存最少5G)
## worker 安装的对应airflow版本和依赖与master相同
##airflow.cfg $MASTER_IP 为主ip(因为主上安装的mysql和redis,若mysql和redis安装其他机器,填其他机器ip即可)
"""
# vi ~/airflow/airflow.cfg
[core]
executor = CeleryExecutor
sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@$MASTER_IP:3306/airflow
[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://$MASTER_IP:6379/0
# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://$MASTER_IP:6379/0
"""
#Start celery worker
airflow celery worker
5.1 检查master 与worker是否正常
ps -aux |grep airflow
ps -axu | grep celeryd
ps -axu | grep webserver
ps -axu | grep scheduler
ps -axu | grep flower
master 节点:
airflow
celeryd
webserver
scheduler
flower
worker节点:
5.2 kill airflow进程
ps -axu | grep celeryd | awk '{print $2}' | xargs kill -9
ps -axu | grep webserver | awk '{print $2}' | xargs kill -9
ps -axu | grep scheduler | awk '{print $2}' | xargs kill -9
ps -axu | grep flower | awk '{print $2}' | xargs kill -9
注意事项
1. 本地部署
pip 版本
# https://github.com/apache/airflow/issues/12838
# pip和airflow 2.x兼容性问题,要降级pip的版本,否则在通过pip安装时有异常
pip3.7 install --upgrade pip==20.2.4
pip.conf
pip 建议用 mirrors.aliyun.com 阿里云的源头,如果用 豆瓣的 mirrors.douban.com 的会出现安装时版本依赖的问题。
[root@k8s ~]# pip config set global.index-url https://mirrors.aliyun.com/pypi/simple
[root@k8s ~]# pip config set install.trusted-host mirrors.aliyun.com[root@k8s ~]# pip config list
global.index-url='http://mirrors.aliyun.com/pypi/simple/'
global.trusted-host='mirrors.aliyun.com'
list.format='columns'
https://raw.githubusercontent.com/ 网络不通
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.6.txt # 有可能网络不通,见<<3. 注意事项>>
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
# 如果上个命令网络不通,导致无法反问 raw.githubusercontent.com 可以替换以下命令
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint https://zhengyansheng.oss-cn-beijing.aliyuncs.com/constraints-3.7.txt
2.故障问题
2.1执行airflow db init失败 Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
File "/usr/python3.7/lib/python3.7/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py", line 44, in upgrade
raise Exception("Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql")
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
解决方法:
进入mysql airflow 数据库,设置global explicit_defaults_for_timestamp
SHOW GLOBAL VARIABLES LIKE '%timestamp%';
SET GLOBAL explicit_defaults_for_timestamp =1;
mysql> SHOW GLOBAL VARIABLES LIKE '%timestamp%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| explicit_defaults_for_timestamp | OFF |
| log_timestamps | UTC |
+---------------------------------+-------+
2 rows in set (0.00 sec)
mysql> SET GLOBAL explicit_defaults_for_timestamp =1;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW GLOBAL VARIABLES LIKE '%timestamp%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| explicit_defaults_for_timestamp | ON |
| log_timestamps | UTC |
+---------------------------------+-------+
2 rows in set (0.00 sec)
2.2执行ariflow相关命令报错 error: sqlite C library version too old (< {min_sqlite_version}).
详细报错如下:
Traceback (most recent call last):
File "/usr/python3.7/bin/airflow", line 5, in <module>
from airflow.__main__ import main
File "/usr/python3.7/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>
from airflow import settings
File "/usr/python3.7/lib/python3.7/site-packages/airflow/settings.py", line 35, in <module>
from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf # NOQA F401
File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 1114, in <module>
conf.validate()
File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 202, in validate
self._validate_config_dependencies()
File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 243, in _validate_config_dependencies
f"error: sqlite C library version too old (< {min_sqlite_version}). "
airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.1.1/howto/set-up-database.rst#setting-up-a-sqlite-database
原因: airflow
默认使用sqlite
作为metastore
,但我们使用的是mysql
,实际上用不到sqlite
解决方案:修改{AIRFLOW_HOME}/airflow.cfg
,
将元数据库信息sql_alchemy_conn
修改为
sql_alchemy_conn = mysql+pymysql://airflow:yourpassword@hostname:3306/airflow`
2.3 安装airflow包错误
出现“xxx setup command: use_2to3 is invalid.”错误是因为 setuptools在构建期间删除了对 use_2to3 的支持。要解决该错误,请setuptools在安装软件包之前将您的版本固定为 57.5.0。
版本 58.0.0中 发生了 重大变化。
该setuptools软件包在构建期间删除了对 use_2to3 的支持。
您可以通过将您的setuptools版本固定到之前的最后一个版本来解决错误58.0.0。
打开终端并运行以下命令。
pip install "setuptools<58.0"
pip3 install "setuptools<58.0"
python -m pip install "setuptools<58.0"
python3 -m pip install "setuptools<58.0"
py -m pip install "setuptools<58.0"
重新安装即可成功。
2.4
error: sqlite C library version too old
yum update
yum install sqlite
从https://sqlite.org/下载源代码,在本地制作并安装。
1)下载源码
[root@stg-airflow001 ~]$ wget https://www.sqlite.org/2019/sqlite-autoconf-3290000.tar.gz
2) 编译
[root@stg-airflow001 ~]$ tar zxvf sqlite-autoconf-3290000.tar.gz
[root@stg-airflow001 ~]$ cd sqlite-autoconf-3290000/
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ ./configure --prefix=/usr/local
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ make && make install
3)替换系统低版本 sqlite3
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ cd
[root@stg-airflow001 ~]$ mv /usr/bin/sqlite3 /usr/bin/sqlite3_old
[root@stg-airflow001 ~]$ ln -s /usr/local/bin/sqlite3 /usr/bin/sqlite3
[root@stg-airflow001 ~$ echo "/usr/local/lib" > /etc/ld.so.conf.d/sqlite3.conf
[root@stg-airflow001 ~]$ ldconfig
[root@stg-airflow001 ~]$ sqlite3 -version
3.29.0 2019-07-10 17:32:03 fc82b73eaac8b36950e527f12c4b5dc1e147e6f4ad2217ae43ad82882a88bfa6
AirFlow_使用_sqlite c library version too old-CSDN博客
2.5 worker 执行task日志报错
Failed to fetch log file from worker. [Errno -2] Name or service not known
机器的ip/hostname映射都在/etc/hosts文件中加上
2.6
启动运行一段时间发现网站就起不来了,报错日志满了,翻查日志目录,发现里面的schedule目录里的日志达到44G,清理了以后就好了,为了避免再次发生,调整日志等级
vi airflow.cfg
设置日志等级,等级信息参考[logging level级别]
然后重启服务
[core]
#logging_level = INFO
logging_level = WARNING
NOTSET < DEBUG < INFO < WARNING < ERROR < CRITICAL
如果把log的级别设置为INFO, 那么小于INFO级别的日志都不输出,
大于等于INFO级别的日志都输出。也就是说,日志级别越高,
打印的日志越不详细。默认日志级别为WARNING。
注意: 如果将logging_level改为WARNING或以上级别,
则不仅仅是日志,命令行输出明细也会同样受到影响,也只会输出大于等于指定级别的信息,
所以如果命令行输出信息不全且系统无错误日志输出,那么说明是日志级别过高导致的。
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)