Centos7 Greenplum6.1开源版本集群部署
目录1.前言1.1参照文档1.2部署包1.3服务器环境2 准备工作2.1 Linux用户2.2 主机名和hosts配置2.3 防火墙2.4 系统资源配置2.5 暂时启用gpadmin sudo2.6 复制配置文件到所有节点上3 安装Greenplum DB3.1 在Master节点上安装Greenplum DB3.2 在Master节点...
目录
1.前言
1.1参照文档
https://github.com/greenplum-db/gpdb
1.2部署包
基于开源版本编译的源码,
编译参见《centos7 greenplum6.1开源版本编译》
tar czvf greenplum-db.tar.gz /usr/local/gpdb
1.3服务器环境
操作系统centos7
角色 | 数量 | 主机名 | IP |
Greenplum Master | 1 | gp1 | 192.168.81.154 |
Greenplum Standby | 无 | 无 | 无 |
Greenplum Segment | 3 | gp1、gp2、gp3 | 192.168.81.154、192.168.81.155、192.168.81.156 |
2 准备工作
2.1 Linux用户
在所有节点上创建greenplum管理员用户。
groupadd -g 530 gpadmin
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
chown -R gpadmin:gpadmin /home/gpadmin
echo "gpadmin" | passwd --stdin gpadmin
2.2 主机名和hosts配置
相同的配置先在一个节点上配置,配置完成后在2.6小节中复制到其它节点上。
vi /etc/hosts
192.168.81.154 gp1
192.168.81.155 gp2
192.168.81.156 gp3
分别对应每一台主机修改主机名;
依次修改所有节点 gp[1-3]上分别执行
hostnamectl set-hostname gp1
2.3 防火墙
禁用防火墙;
vi /etc/selinux/config
SELINUX=disabled
所有机器执行
systemctl stop firewalld.service
systemctl disable firewalld.service
2.4 系统资源配置
sudo bash -c 'cat >> /etc/sysctl.conf <<-EOF
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 500 1024000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
EOF'
sudo bash -c 'cat >> /etc/security/limits.conf <<-EOF
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
EOF'
cat >> /etc/security/limits.d/90-nproc.conf <<-EOF
* soft nproc 131072
root soft nproc unlimited
EOF
2.5 暂时启用gpadmin sudo
因为后面的集群节点上安装greenplum时会涉及到创建目录和文件操作,在此临时启用sudo,安装成功后撤销。
visudo
gpadmin ALL=(ALL) ALL
gpadmin ALL=(ALL) NOPASSWD:ALL
2.6 复制配置文件到所有节点上
scp /etc/hosts gp2:/etc
scp /etc/sysctl.conf gp2:/etc
scp /etc/security/limits.d/90-nproc.conf gp2:/etc/security/limits.d
scp /etc/selinux/config gp2:/etc/selinux
scp /etc/hosts gp3:/etc
scp /etc/sysctl.conf gp3:/etc
scp /etc/security/limits.d/90-nproc.conf gp3:/etc/security/limits.d
scp /etc/selinux/config gp3:/etc/selinux
重启操作系统。
3 安装Greenplum DB
3.1 在Master节点上安装Greenplum DB
三台上
mkdir -p /opt/greenplum
将部署包解压到
/opt/greenplum/greenplum-db
cd /opt/greenplum/
tar xzvf greenplum-db.tar.gz
修改目录权限和所有者为gpadmin;
chown -R gpadmin:gpadmin /opt/greenplum/
chown -R gpadmin:gpadmin /opt/greenplum/greenplum-db
3.2 在Master节点上配置集群host
su - gpadmin mkdir -p /opt/greenplum/greenplum-db/conf
vi /opt/greenplum/greenplum-db/conf/hostlist
gp1
gp2
gp3
创建一个 seg_hosts ,包含所有的Segment Host的主机名;
vi /opt/greenplum/greenplum-db/conf/seg_hosts
gp1
gp2
gp3
3.3 配置SSH免密连接
使用root账号
1.在集群master的 /etc/ssh/sshd_config 文件去掉以下选项的注释
vi /etc/ssh/sshd_config
RSAAuthentication yes #开启私钥验证 PubkeyAuthentication yes #开启公钥验证
2.将集群master 修改后的 /etc/ssh/sshd_config 通过 scp 命令复制发送到集群的每一个节点
for a in {2..3} ; do scp /etc/ssh/sshd_config gp$a:/etc/ssh/sshd_config ; done
3.生成公钥、私钥
1.在集群的每一个节点节点输入命令 ssh-keygen -t rsa -P '',生成 key,一律回车
su - gpadmin
cd ~
ssh-keygen -t rsa -P ''
4.在集群的master 节点输入命令
将集群每一个节点的公钥id_rsa.pub放入到自己的认证文件中authorized_keys;
for a in {1..3}; do ssh root@gp$a cat /home/gpadmin/.ssh/id_rsa.pub >> /home/gpadmin/.ssh/authorized_keys; done
5.在集群的master 节点输入命令
将自己的认证文件 authorized_keys 通过 scp 命令复制发送到每一个节点上去: /root/.ssh/authorized_keys`
for a in {2..3}; do scp /home/gpadmin/.ssh/authorized_keys root@gp$a:/home/gpadmin/.ssh/authorized_keys ; done
6.在集群的每一个节点节点输入命令
接重启ssh服务(使用root用户)
sudo systemctl restart sshd.service
7.验证 ssh 无密登录
开一个其他窗口测试下能否免密登陆
例如:在gp2
ssh gp3
exit 退出
测试ssh gp1,不需要密码即可登录。
多数情况下,可以登录成功。但是也会出现配置不正确,导致失败的时候。
检查authorized_keys文件权限,并设置为700
chmod 700 .ssh/authorized_keys
3.4 依赖环境安装
每台机器都需要执行
yum install -y epel-release
yum install -y \
apr-devel \
bison \
bzip2-devel \
cmake3 \
flex \
gcc \
gcc-c++ \
krb5-devel \
libcurl-devel \
libevent-devel \
libkadm5 \
libyaml-devel \
libxml2-devel \
libzstd-devel \
openssl-devel \
perl-ExtUtils-Embed \
python-devel \
python-pip \
readline-devel \
xerces-c-devel \
zlib-devel
cat >> python-dependencies.txt <<-EOF
argparse==1.2.1
behave==1.2.4
epydoc==3.0.1
lockfile==0.9.1
logilab-astng==0.20.1
logilab-common==0.50.1
MarkupSafe==1.0
mock==1.0.1
parse==1.8.2
psutil==4.0.0
setuptools==36.6.0
unittest2==0.5.1
EOF
pip install -r python-dependencies.txt
#添加gp编译的python包
echo "/opt/greenplum/greenplum-db/lib/python" > /usr/lib/python2.7/site-packages/greenplum.pth
vi /etc/ld.so.conf
添加
/opt/greenplum/greenplum-db/lib
#配置生效
ldconfig
如果报错ldconfig: xxxx 不是符号连接,说明不是正常压缩、拷贝、解压过来的(直接scp文件不会带符号连接)
vi /opt/greenplum/greenplum-db/greenplum_path.sh
修改
GPHOME=/opt/greenplum/greenplum-db
3.5 Segment节点上安装Greenplum DB
root用户创建Segment节点所需的目录,并更改目录权限和所有者为gpadmin;
sudo mkdir -p /opt/greenplum && sudo chown gpadmin:gpadmin -R /opt/greenplum
登录gp1,拷贝
su - gpadmin
scp -r /opt/greenplum/greenplum-db gp2:/opt/greenplum/
scp -r /opt/greenplum/greenplum-db gp3:/opt/greenplum/
检查每个节点安装和目录情况;
su - gpadmin source /opt/greenplum/greenplum-db/greenplum_path.sh /opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e ls -l $GPHOME
[gp1] ls -l /opt/greenplum/greenplum-db
[gp1] 总用量 20
[gp1] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 10:41 bin
[gp1] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 11:13 conf
[gp1] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 10:41 docs
[gp1] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:38 greenplum_path.sh
[gp1] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 10:41 include
[gp1] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 10:41 lib
[gp1] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:38 python-dependencies.txt
[gp1] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 10:41 sbin
[gp1] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 10:41 share
[gp2] ls -l /opt/greenplum/greenplum-db
[gp2] 总用量 20
[gp2] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 13:38 bin
[gp2] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 13:38 conf
[gp2] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 13:38 docs
[gp2] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:38 greenplum_path.sh
[gp2] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 13:38 include
[gp2] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 13:38 lib
[gp2] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:38 python-dependencies.txt
[gp2] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 13:38 sbin
[gp2] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 13:38 share
[gp3] ls -l /opt/greenplum/greenplum-db
[gp3] 总用量 20
[gp3] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 13:43 bin
[gp3] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 13:43 conf
[gp3] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 13:43 docs
[gp3] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:43 greenplum_path.sh
[gp3] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 13:43 include
[gp3] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 13:43 lib
[gp3] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:43 python-dependencies.txt
[gp3] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 13:43 sbin
[gp3] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 13:43 share
创建数据存储区域目录;
su - gpadmin
source /opt/greenplum/greenplum-db/greenplum_path.sh
/opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e 'mkdir -p /opt/greenplum/data'
在master上创建master数据存储区域;
su - gpadmin
source /opt/greenplum/greenplum-db/greenplum_path.sh
/opt/greenplum/greenplum-db/bin/gpssh -h gp1 -e 'mkdir -p /opt/greenplum/data/master'
在Segment节点上创建数据存储区域
su - gpadmin
source /opt/greenplum/greenplum-db/greenplum_path.sh
/opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/seg_hosts -e 'mkdir -p /opt/greenplum/data/primary && mkdir -p /opt/greenplum/data/mirror'
3.6 环境变量配置
source /opt/greenplum/greenplum-db/greenplum_path.sh
gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e -v "cat >> /home/gpadmin/.bashrc <<EOF
source /opt/greenplum/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
export GPPORT=5432
export PGDATABASE=gp_sydb
EOF"
4 初始化Greenplum DB
4.1 初始化前检查
检查主机名配置;
su gpadmin
source /opt/greenplum/greenplum-db/greenplum_path.sh
gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e hostname
[gp1] hostname
[gp1] gp1
[gp3] hostname
[gp3] gp3
[gp2] hostname
[gp2] gp2
检查节点与节点之间文件读取(可以不检查);
gpcheckperf -h gp1 -h gp2 -d /tmp -r d -D -v
gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -d /tmp -r d -D -v
$ gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -r N -d /tmp
/opt/greenplum/greenplum-db/./bin/gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -r N -d /tmp
-------------------
-- NETPERF TEST
-------------------
====================
== RESULT
====================
Netperf bisection bandwidth test
gp-master -> gp-sdw1 = 72.220000
gp-sdw2 -> gp-sdw3 = 21.470000
gp-sdw1 -> gp-master = 43.510000
gp-sdw3 -> gp-sdw2 = 44.200000
Summary:
sum = 181.40 MB/sec
min = 21.47 MB/sec
max = 72.22 MB/sec
avg = 45.35 MB/sec
median = 44.20 MB/sec
[Warning] connection between gp-sdw2 and gp-sdw3 is no good
[Warning] connection between gp-sdw1 and gp-master is no good
[Warning] connection between gp-sdw3 and gp-sdw2 is no good
4.2 初始化
初始化 Greenplum 配置文件模板都在/opt/greenplum/greenplum-db/docs/cli_help/gpconfigs目录下,gpinitsystem_config是初始化 Greenplum 的模板,此模板中 Mirror Segment的配置都被注释;创建一个副本,对其修改;
cd /opt/greenplum/greenplum-db/docs/cli_help/gpconfigs
cp gpinitsystem_config initgp_config
vi initgp_config
declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary)
MASTER_HOSTNAME=gp1
MASTER_DIRECTORY=/opt/greenplum/data/master
declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror)
DATABASE_NAME=gp_sydb
MACHINE_LIST_FILE=/opt/greenplum/greenplum-db/conf/seg_hosts
执行初始化;
gpinitsystem -c initgp_config
若初始化失败,需要删除数据目录重新初始化;
chmod 777 /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20191212_144741
/home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20191212_144741
5 后续操作
5.1 停止和启动集群
gpstop -a gpstart -a
5.2 登录数据库
$ psql -d postgres
postgres=# \l # 查询数据库
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+---------------------
gp_sydb | gpadmin | UTF8 |
postgres | gpadmin | UTF8 |
template0 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
(4 rows)
postgres=# \l # 查询数据库表
5.3 集群状态
gpstate -e #查看mirror的状态
gpstate -f #查看standby master的状态
gpstate -s #查看整个GP群集的状态
gpstate -i #查看GP的版本
gpstate --help #帮助文档,可以查看gpstate更多用法
5.4 测试GPDB集群状态
gpstate -e
5.5 设置gpadmin远程访问密码
psql postgres gpadmin
alter user gpadmin encrypted password 'gpadmin';
\q
5.6 查询测试
psql -hgp1 -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'
psql -h gp1 -p 5432 -d postgres -U gpadmin -c '\l+'
5.7修改master的pg_hba.conf
pg_hba.conf是控制数据库访问的文件
vi $MASTER_DATA_DIRECTORY/pg_hba.conf
host all gpadmin 0.0.0.0/0 md5
执行gpstop -u使配置生效
此时可以使用客户端进行连接(postgres连接方式,连接到master的5432)
$MASTER_DATA_DIRECTORY/postgresql.conf是集群的配置文件,可以根据实际情况进行配置。主要有两个配置:
监听地址
连接数
拷贝过来一张数据量上万的表,然后查询数据分布情况
select gp_segment_id,count(*) from gis_person_info group by 1 order by 1;
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)