目录

 

1.前言

1.1参照文档

1.2部署包

1.3服务器环境

2 准备工作

2.1 Linux用户

2.2 主机名和hosts配置

2.3 防火墙

2.4 系统资源配置

2.5 暂时启用gpadmin sudo

2.6 复制配置文件到所有节点上

3 安装Greenplum DB

3.1 在Master节点上安装Greenplum DB

3.2 在Master节点上配置集群host

3.3 配置SSH免密连接

3.4 依赖环境安装

3.5 Segment节点上安装Greenplum DB

3.6 环境变量配置

4 初始化Greenplum DB

4.1 初始化前检查

4.2 初始化

5 后续操作

5.1 停止和启动集群

5.2 登录数据库

5.3 集群状态

5.4 测试GPDB集群状态

5.5 设置gpadmin远程访问密码

5.6 查询测试

5.7修改master的pg_hba.conf


 

1.前言

1.1参照文档

https://github.com/greenplum-db/gpdb

 

1.2部署包

基于开源版本编译的源码,

编译参见《centos7 greenplum6.1开源版本编译

tar czvf greenplum-db.tar.gz /usr/local/gpdb

 

1.3服务器环境

操作系统centos7

角色

数量

主机名

IP

Greenplum Master

1

gp1

192.168.81.154

Greenplum Standby

Greenplum Segment

3

gp1、gp2、gp3

192.168.81.154、192.168.81.155、192.168.81.156

2 准备工作

2.1 Linux用户

在所有节点上创建greenplum管理员用户。

groupadd -g 530 gpadmin
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
chown -R gpadmin:gpadmin /home/gpadmin
echo "gpadmin" | passwd --stdin gpadmin

2.2 主机名和hosts配置

相同的配置先在一个节点上配置,配置完成后在2.6小节中复制到其它节点上。

vi /etc/hosts

192.168.81.154 gp1
192.168.81.155 gp2
192.168.81.156 gp3

分别对应每一台主机修改主机名;

依次修改所有节点 gp[1-3]上分别执行

hostnamectl set-hostname gp1

2.3 防火墙

禁用防火墙;

vi /etc/selinux/config

SELINUX=disabled

 

所有机器执行

systemctl stop firewalld.service

systemctl disable firewalld.service

2.4 系统资源配置

sudo bash -c 'cat >> /etc/sysctl.conf <<-EOF

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 500 1024000 200 4096

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.ip_forward = 0

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.ipv4.ip_local_port_range = 1025 65535

net.core.netdev_max_backlog = 10000

net.core.rmem_max = 2097152

net.core.wmem_max = 2097152

vm.overcommit_memory = 2

 

EOF'

 

sudo bash -c 'cat >> /etc/security/limits.conf <<-EOF

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

 

EOF'

 

cat >> /etc/security/limits.d/90-nproc.conf <<-EOF

* soft nproc 131072

root soft nproc unlimited

 

EOF

2.5 暂时启用gpadmin sudo

因为后面的集群节点上安装greenplum时会涉及到创建目录和文件操作,在此临时启用sudo,安装成功后撤销。

visudo

gpadmin ALL=(ALL) ALL

gpadmin ALL=(ALL) NOPASSWD:ALL

2.6 复制配置文件到所有节点上

 

scp /etc/hosts gp2:/etc
scp /etc/sysctl.conf gp2:/etc
scp /etc/security/limits.d/90-nproc.conf gp2:/etc/security/limits.d
scp /etc/selinux/config gp2:/etc/selinux

scp /etc/hosts gp3:/etc
scp /etc/sysctl.conf gp3:/etc
scp /etc/security/limits.d/90-nproc.conf gp3:/etc/security/limits.d
scp /etc/selinux/config gp3:/etc/selinux

 

重启操作系统。

 

3 安装Greenplum DB

3.1 在Master节点上安装Greenplum DB

三台上

mkdir -p /opt/greenplum

将部署包解压到

/opt/greenplum/greenplum-db

cd /opt/greenplum/

tar xzvf greenplum-db.tar.gz

 

修改目录权限和所有者为gpadmin;

chown -R gpadmin:gpadmin /opt/greenplum/

chown -R gpadmin:gpadmin /opt/greenplum/greenplum-db

3.2 在Master节点上配置集群host

su - gpadmin mkdir -p /opt/greenplum/greenplum-db/conf

vi /opt/greenplum/greenplum-db/conf/hostlist

gp1

gp2

gp3

创建一个 seg_hosts ,包含所有的Segment Host的主机名;

vi /opt/greenplum/greenplum-db/conf/seg_hosts

gp1

gp2

gp3

3.3 配置SSH免密连接

使用root账号

 

1.在集群master的 /etc/ssh/sshd_config  文件去掉以下选项的注释

vi /etc/ssh/sshd_config

RSAAuthentication yes #开启私钥验证 PubkeyAuthentication yes #开启公钥验证

2.将集群master 修改后的 /etc/ssh/sshd_config  通过 scp 命令复制发送到集群的每一个节点

for a in {2..3} ; do scp /etc/ssh/sshd_config gp$a:/etc/ssh/sshd_config ; done

3.生成公钥、私钥

1.在集群的每一个节点节点输入命令 ssh-keygen -t rsa -P '',生成 key,一律回车

su - gpadmin

cd ~

ssh-keygen -t rsa -P ''

4.在集群的master 节点输入命令

将集群每一个节点的公钥id_rsa.pub放入到自己的认证文件中authorized_keys;

for a in {1..3}; do ssh root@gp$a cat /home/gpadmin/.ssh/id_rsa.pub >> /home/gpadmin/.ssh/authorized_keys; done

5.在集群的master 节点输入命令

将自己的认证文件 authorized_keys  通过 scp 命令复制发送到每一个节点上去: /root/.ssh/authorized_keys`

for a in {2..3}; do scp /home/gpadmin/.ssh/authorized_keys root@gp$a:/home/gpadmin/.ssh/authorized_keys ; done

6.在集群的每一个节点节点输入命令

接重启ssh服务(使用root用户)

sudo systemctl restart sshd.service

7.验证 ssh 无密登录

开一个其他窗口测试下能否免密登陆

例如:在gp2

ssh gp3

exit 退出

测试ssh gp1,不需要密码即可登录。

 

多数情况下,可以登录成功。但是也会出现配置不正确,导致失败的时候。

检查authorized_keys文件权限,并设置为700

chmod 700 .ssh/authorized_keys

 

3.4 依赖环境安装

每台机器都需要执行

yum install -y epel-release

 

yum install -y \

apr-devel \

bison \

bzip2-devel \

cmake3 \

flex \

gcc \

gcc-c++ \

krb5-devel \

libcurl-devel \

libevent-devel \

libkadm5 \

libyaml-devel \

libxml2-devel \

libzstd-devel \

openssl-devel \

perl-ExtUtils-Embed \

python-devel \

python-pip \

readline-devel \

xerces-c-devel \

zlib-devel

 

 

cat >> python-dependencies.txt <<-EOF

argparse==1.2.1

behave==1.2.4

epydoc==3.0.1

lockfile==0.9.1

logilab-astng==0.20.1

logilab-common==0.50.1

MarkupSafe==1.0

mock==1.0.1

parse==1.8.2

psutil==4.0.0

setuptools==36.6.0

unittest2==0.5.1

EOF

 

 

pip install -r python-dependencies.txt

 

#添加gp编译的python包

echo "/opt/greenplum/greenplum-db/lib/python" > /usr/lib/python2.7/site-packages/greenplum.pth

 

vi /etc/ld.so.conf

添加

/opt/greenplum/greenplum-db/lib

 

#配置生效

ldconfig

如果报错ldconfig: xxxx 不是符号连接,说明不是正常压缩、拷贝、解压过来的(直接scp文件不会带符号连接)

 

 

vi /opt/greenplum/greenplum-db/greenplum_path.sh

修改

GPHOME=/opt/greenplum/greenplum-db

 

 

3.5 Segment节点上安装Greenplum DB

root用户创建Segment节点所需的目录,并更改目录权限和所有者为gpadmin;

sudo mkdir -p /opt/greenplum && sudo chown gpadmin:gpadmin -R /opt/greenplum

 

登录gp1,拷贝

su - gpadmin

scp -r /opt/greenplum/greenplum-db gp2:/opt/greenplum/

scp -r /opt/greenplum/greenplum-db gp3:/opt/greenplum/

 

检查每个节点安装和目录情况;

su - gpadmin source /opt/greenplum/greenplum-db/greenplum_path.sh /opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e ls -l $GPHOME

[gp1] ls -l /opt/greenplum/greenplum-db

[gp1] 总用量 20

[gp1] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 10:41 bin

[gp1] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 11:13 conf

[gp1] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 10:41 docs

[gp1] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:38 greenplum_path.sh

[gp1] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 10:41 include

[gp1] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 10:41 lib

[gp1] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:38 python-dependencies.txt

[gp1] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 10:41 sbin

[gp1] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 10:41 share

[gp2] ls -l /opt/greenplum/greenplum-db

[gp2] 总用量 20

[gp2] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 13:38 bin

[gp2] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 13:38 conf

[gp2] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 13:38 docs

[gp2] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:38 greenplum_path.sh

[gp2] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 13:38 include

[gp2] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 13:38 lib

[gp2] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:38 python-dependencies.txt

[gp2] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 13:38 sbin

[gp2] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 13:38 share

[gp3] ls -l /opt/greenplum/greenplum-db

[gp3] 总用量 20

[gp3] drwxr-xr-x 7 gpadmin gpadmin 4096 12月 12 13:43 bin

[gp3] drwxrwxr-x 2 gpadmin gpadmin 39 12月 12 13:43 conf

[gp3] drwxr-xr-x 3 gpadmin gpadmin 22 12月 12 13:43 docs

[gp3] -rw-r--r-- 1 gpadmin gpadmin 749 12月 12 13:43 greenplum_path.sh

[gp3] drwxr-xr-x 4 gpadmin gpadmin 4096 12月 12 13:43 include

[gp3] drwxr-xr-x 5 gpadmin gpadmin 4096 12月 12 13:43 lib

[gp3] -rw-r--r-- 1 gpadmin gpadmin 196 12月 12 13:43 python-dependencies.txt

[gp3] drwxr-xr-x 2 gpadmin gpadmin 176 12月 12 13:43 sbin

[gp3] drwxr-xr-x 4 gpadmin gpadmin 41 12月 12 13:43 share

 

 

 

创建数据存储区域目录;

su - gpadmin

source /opt/greenplum/greenplum-db/greenplum_path.sh

/opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e 'mkdir -p /opt/greenplum/data'

在master上创建master数据存储区域;

su - gpadmin

source /opt/greenplum/greenplum-db/greenplum_path.sh

/opt/greenplum/greenplum-db/bin/gpssh -h gp1 -e 'mkdir -p /opt/greenplum/data/master'

在Segment节点上创建数据存储区域

su - gpadmin

source /opt/greenplum/greenplum-db/greenplum_path.sh

/opt/greenplum/greenplum-db/bin/gpssh -f /opt/greenplum/greenplum-db/conf/seg_hosts -e 'mkdir -p /opt/greenplum/data/primary && mkdir -p /opt/greenplum/data/mirror'

3.6 环境变量配置

source /opt/greenplum/greenplum-db/greenplum_path.sh
gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e -v "cat >> /home/gpadmin/.bashrc <<EOF

source /opt/greenplum/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
export GPPORT=5432
export PGDATABASE=gp_sydb
EOF"

4 初始化Greenplum DB

4.1 初始化前检查

检查主机名配置;

su gpadmin
source /opt/greenplum/greenplum-db/greenplum_path.sh
gpssh -f /opt/greenplum/greenplum-db/conf/hostlist -e hostname

[gp1] hostname
[gp1] gp1
[gp3] hostname
[gp3] gp3
[gp2] hostname
[gp2] gp2

检查节点与节点之间文件读取(可以不检查);

 

gpcheckperf -h gp1 -h gp2 -d /tmp -r d -D -v
gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -d /tmp -r d -D -v
$ gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -r N -d /tmp
/opt/greenplum/greenplum-db/./bin/gpcheckperf -f /opt/greenplum/greenplum-db/conf/hostlist -r N -d /tmp

-------------------
--  NETPERF TEST
-------------------

====================
==  RESULT
====================
Netperf bisection bandwidth test
gp-master -> gp-sdw1 = 72.220000
gp-sdw2 -> gp-sdw3 = 21.470000
gp-sdw1 -> gp-master = 43.510000
gp-sdw3 -> gp-sdw2 = 44.200000

Summary:
sum = 181.40 MB/sec
min = 21.47 MB/sec
max = 72.22 MB/sec
avg = 45.35 MB/sec
median = 44.20 MB/sec

[Warning] connection between gp-sdw2 and gp-sdw3 is no good
[Warning] connection between gp-sdw1 and gp-master is no good
[Warning] connection between gp-sdw3 and gp-sdw2 is no good

 

4.2 初始化

初始化 Greenplum 配置文件模板都在/opt/greenplum/greenplum-db/docs/cli_help/gpconfigs目录下,gpinitsystem_config是初始化 Greenplum 的模板,此模板中 Mirror Segment的配置都被注释;创建一个副本,对其修改;

 

cd /opt/greenplum/greenplum-db/docs/cli_help/gpconfigs
cp gpinitsystem_config initgp_config
vi initgp_config        

declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary)
MASTER_HOSTNAME=gp1
MASTER_DIRECTORY=/opt/greenplum/data/master
declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror)
DATABASE_NAME=gp_sydb
MACHINE_LIST_FILE=/opt/greenplum/greenplum-db/conf/seg_hosts

执行初始化;

gpinitsystem -c initgp_config

若初始化失败,需要删除数据目录重新初始化;

chmod 777 /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20191212_144741

/home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20191212_144741

 

5 后续操作

5.1 停止和启动集群

gpstop -a gpstart -a

5.2 登录数据库

$ psql -d postgres

postgres=# \l # 查询数据库
                 List of databases
   Name    |  Owner  | Encoding |  Access privileges  
-----------+---------+----------+---------------------
 gp_sydb   | gpadmin | UTF8     | 
 postgres  | gpadmin | UTF8     | 
 template0 | gpadmin | UTF8     | =c/gpadmin          
                                : gpadmin=CTc/gpadmin
 template1 | gpadmin | UTF8     | =c/gpadmin          
                                : gpadmin=CTc/gpadmin
(4 rows)

postgres=# \l # 查询数据库表

5.3 集群状态

gpstate -e #查看mirror的状态
gpstate -f #查看standby master的状态
gpstate -s #查看整个GP群集的状态
gpstate -i #查看GP的版本
gpstate --help #帮助文档,可以查看gpstate更多用法

 

 

5.4 测试GPDB集群状态

gpstate -e

5.5 设置gpadmin远程访问密码

psql postgres gpadmin

alter user gpadmin encrypted password 'gpadmin';

\q

5.6 查询测试

psql -hgp1 -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'

psql -h gp1 -p 5432 -d postgres -U gpadmin -c '\l+'

 

5.7修改master的pg_hba.conf

pg_hba.conf是控制数据库访问的文件

 

vi $MASTER_DATA_DIRECTORY/pg_hba.conf

host all gpadmin 0.0.0.0/0 md5

执行gpstop -u使配置生效

 

此时可以使用客户端进行连接(postgres连接方式,连接到master的5432)

 

$MASTER_DATA_DIRECTORY/postgresql.conf是集群的配置文件,可以根据实际情况进行配置。主要有两个配置:

监听地址

连接数

 

 

拷贝过来一张数据量上万的表,然后查询数据分布情况

select gp_segment_id,count(*) from gis_person_info group by 1 order by 1;

 

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐