Greenplum利用gpload,gpfist实现数据入库
1.python版本要求2.4.4以上[root@test install]# pythonPython 2.6.2 (r262:71600, May 14 2009, 10:46:21)[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2Type "help", "copyright", "credits" or "license"
1.python版本要求2.4.4以上
[root@test install]# python
Python 2.6.2 (r262:71600, May 14 2009, 10:46:21)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
2.PyYAML 3.10包配置
YAML parser and emitter for Python
YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for Python.
PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages. PyYAML supports standard YAML tags and provides Python-specific tags that allow to represent an arbitrary Python object.
PyYAML is applicable for a broad range of tasks from complex configuration files to object serialization and persistance.
[root@test dataload]# tar -zxvf PyYAML-3.10.tar.gz
[root@test PyYAML-3.10]# python setup.py install
3.yaml-0.1.4配置
下载源代码包:http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz。编译和安装LibYAML
[root@test dataload]# tar -zxvf yaml-0.1.4.tar.gz
[root@test dataload]# cd yaml-0.1.4
[root@test yaml-0.1.4]# ./configure
[root@test yaml-0.1.4]# make && make install
4.gpload.gpfdist工具配置
下载:greenplum-loaders-4.2.1.0-build-2-RHEL5-x86_64.zip
[root@test dataload]# ./greenplum-loaders-4.2.1.0-build-2-RHEL5-x86_64.bin
********************************************************************************
Do you accept the Greenplum Loaders license agreement? [yes | no]
********************************************************************************
选择:yes 到安装完成
修改greenplum_loaders_path.sh中GPHOME_LOADERS改为你安装的路径
.bash_profile中添加环境GP变量:
export PGDATABASE=gptest
export PGHOST=127.0.0.1
export PGPORT=5432
export PGUSER=gpadmin
export PGPASSWORD=gpadmin
source ~/.bash_profile
5.编写数据入库yaml控制文件
[root@test bin]# more gpload.yml
---
VERSION: 1.0.0.1
DATABASE: gptest
USER: gpadmin
HOST: 127.0.0.1
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- test
PORT: 55555
FILE:
- /home/tmp/test1
- COLUMNS:
- id: int
- name: text
- aa: text
- time: timestamp without time zone
- bb: text
- cc: text
- dd: int
- ee: int
- ff: text
- gg: text
- hh: text
- ii: text
- jj: text
- kk: text
- ll: text
- FORMAT: text
- DELIMITER: ','
- ERROR_LIMIT: 25
OUTPUT:
- TABLE: test_gpload
- MODE: INSERT
注:COLUMNS中字段应与数据库中表字段及数据类型匹配
6.执行gpload
[root@test bin]# gpload -f gpload.yml
2012-10-30 00:06:42|INFO|gpload session started 2012-10-30 00:06:42
2012-10-30 00:06:43|INFO|started gpfdist -p 55555 -P 55556 -f "/home/tmp/test" -t 30
2012-10-30 00:06:50|INFO|running time: 7.92 seconds
2012-10-30 00:06:50|INFO|rows Inserted = 205092
2012-10-30 00:06:50|INFO|rows Updated = 0
2012-10-30 00:06:50|INFO|data formatting errors = 0
2012-10-30 00:06:50|INFO|gpload succeeded
[root@test bin]#
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)