使用go-mysql-elasticsearch开源工具同步数据到ES

使用go-mysql-elasticsearch开源工具同步数据到ESgo-mysql-elasticsearch是用于同步mysql数据到ES集群的一个开源工具，项目github地址：https://github.com/siddontang/go-mysql-elasticsearchgo-mysql-elasticsearch的基本原理是：如果是第一次启动该程序，首先使用mysqld...

forever_insist

5626人浏览 · 2018-09-19 15:16:20

forever_insist · 2018-09-19 15:16:20 发布

使用go-mysql-elasticsearch开源工具同步数据到ES

go-mysql-elasticsearch是用于同步mysql数据到ES集群的一个开源工具，
项目github地址：https://github.com/siddontang/go-mysql-elasticsearch

go-mysql-elasticsearch的基本原理是：如果是第一次启动该程序，首先使用mysqldump工具对源mysql数据库进行一次全量同步，通过elasticsearch client执行操作写入数据到ES；然后实现了一个mysql client,作为slave连接到源mysql,源mysql作为master会将所有数据的更新操作通过binlog event同步给slave， 通过解析binlog event就可以获取到数据的更新内容，之后写入到ES.

另外，该工具还提供了操作统计的功能，每当有数据增删改操作时，会将对应操作的计数加1，程序启动时会开启一个http服务，通过调用http接口可以查看增删改操作的次数。

使用事项：
1. mysql binlog必须是ROW模式
    2. 要同步的mysql数据表必须包含主键，否则直接忽略，这是因为如果数据表没有主键，UPDATE和DELETE操作就会因为在ES中找不到对应的document而无法进行同步
    3. 不支持程序运行过程中修改表结构
    4. 要赋予用于连接mysql的账户RELOAD权限以及REPLICATION权限, SUPER权限：
       GRANT REPLICATION SLAVE ON *.* TO 'elastic'@'172.16.32.44';
       GRANT RELOAD ON *.* TO 'elastic'@'172.16.32.44';
       UPDATE mysql.user SET Super_Priv='Y' WHERE user='elastic' AND host='172.16.32.44';

使用方法

git clone https://github.com/siddontang/go-mysql-elasticsearch
cd go-mysql-elasticsearch/src/github.com/siddontang/go-mysql-elasticsearch
vi etc/river.toml, 修改配置文件，同步172.16.0.101:3306数据库中的webservice.building表到ES集群172.16.32.64:9200的building index(更详细的配置文件说明可以参考项目文档)

# MySQL address, user and password
    # user must have replication privilege in MySQL.
    my_addr = "172.16.0.101:3306"
    my_user = "bellen"
    my_pass = "Elastic_123"
    my_charset = "utf8"
    
    # Set true when elasticsearch use https
    #es_https = false
    # Elasticsearch address
    es_addr = "172.16.32.64:9200"
    # Elasticsearch user and password, maybe set by shield, nginx, or x-pack
    es_user = ""
    es_pass = ""
    
    # Path to store data, like master.info, if not set or empty,
    # we must use this to support breakpoint resume syncing.
    # TODO: support other storage, like etcd.
    data_dir = "./var"
    
    # Inner Http status address
    stat_addr = "127.0.0.1:12800"
    
    # pseudo server id like a slave
    server_id = 1001
    
    # mysql or mariadb
    flavor = "mariadb"
    
    # mysqldump execution path
    # if not set or empty, ignore mysqldump.
    mysqldump = "mysqldump"
    
    # if we have no privilege to use mysqldump with --master-data,
    # we must skip it.
    #skip_master_data = false
    
    # minimal items to be inserted in one bulk
    bulk_size = 128
    
    # force flush the pending requests if we don't have enough items >= bulk_size
    flush_bulk_time = "200ms"
    
    # Ignore table without primary key
    skip_no_pk_table = false
    
    # MySQL data source
    [[source]]
    schema = "webservice"
    tables = ["building"]
    [[rule]]
    schema = "webservice"
    table = "building"
    index = "building"
    type = "buildingtype"

在ES集群中创建building index, 因为该工具并没有使用ES的auto create index功能，如果index不存在会报错

执行命令：./bin/go-mysql-elasticsearch -config=./etc/river.toml

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

“小满”安全车控操作系统正式在AtomGit开源

10月24日，由中国汽车工业协会指导，普华基础软件股份有限公司主办的“小满”安全车控操作系统开源发布会暨共建计划说明会成功举行。普华基础软件宣布将安全车控操作系统“小满”（简称“小满”）V24.10源代码正式在开放原子开源基金会（简称“基金会”）旗下AtomGit开源协作平台开源，并在AtomGit平