Streamsets 很好用,功能齐全,但是不开源了。Cloudera也是,很忧伤啊。

在接触Streamsets的时候,已经需要注册下载了,但是呢,注册不上。官方没有扼杀所有,可以自己编译。下载地址:

https://codeload.github.com/designmind/datacollector-plugin-api/zip/refs/heads/master
https://codeload.github.com/designmind/datacollector/zip/refs/heads/master
https://codeload.github.com/designmind/datacollector-api/zip/refs/heads/master

##下载好的源码和编译好的tgz 可以取网盘获取:
链接: https://pan.baidu.com/s/1SqD7jNPYDi_dxQ42V9x9qA?pwd=9jn6 提取码: 9jn6 

编译着实要急死人,很多地址都失效了。

服务器上需要的环境:

java maven nodes js jbower npm 自己弄哈。

maven修改配置文件

<?xml version="1.0" encoding="UTF-8"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!--
 | This is the configuration file for Maven. It can be specified at two levels:
 |
 |  1. User Level. This settings.xml file provides configuration for a single user,
 |                 and is normally provided in ${user.home}/.m2/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -s /path/to/user/settings.xml
 |
 |  2. Global Level. This settings.xml file provides configuration for all Maven
 |                 users on a machine (assuming they're all using the same Maven
 |                 installation). It's normally provided in
 |                 ${maven.conf}/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -gs /path/to/global/settings.xml
 |
 | The sections in this sample file are intended to give you a running start at
 | getting the most out of your Maven installation. Where appropriate, the default
 | values (values used when the setting is not specified) are provided.
 |
 |-->
<settings xmlns="http://maven.apache.org/SETTINGS/1.2.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd">
  <!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ${user.home}/.m2/repository
  <localRepository>/path/to/local/repo</localRepository>
  -->

  <!-- interactiveMode
   | This will determine whether maven prompts you when it needs input. If set to false,
   | maven will use a sensible default value, perhaps based on some other setting, for
   | the parameter in question.
   |
   | Default: true
  <interactiveMode>true</interactiveMode>
  -->

  <!-- offline
   | Determines whether maven should attempt to connect to the network when executing a build.
   | This will have an effect on artifact downloads, artifact deployment, and others.
   |
   | Default: false
  <offline>false</offline>
  -->

  <!-- pluginGroups
   | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.
   | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers
   | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.
   |-->
  <pluginGroups>
    <!-- pluginGroup
     | Specifies a further group identifier to use for plugin lookup.
    <pluginGroup>com.your.plugins</pluginGroup>
    -->
  </pluginGroups>

  <!-- proxies
   | This is a list of proxies which can be used on this machine to connect to the network.
   | Unless otherwise specified (by system property or command-line switch), the first proxy
   | specification in this list marked as active will be used.
   |-->
  <proxies>
    <!-- proxy
     | Specification for one proxy, to be used in connecting to the network.
     |
    <proxy>
      <id>optional</id>
      <active>true</active>
      <protocol>http</protocol>
      <username>proxyuser</username>
      <password>proxypass</password>
      <host>proxy.host.net</host>
      <port>80</port>
      <nonProxyHosts>local.net|some.host.com</nonProxyHosts>
    </proxy>
    -->
  </proxies>

  <!-- servers
   | This is a list of authentication profiles, keyed by the server-id used within the system.
   | Authentication profiles can be used whenever maven must make a connection to a remote server.
   |-->
  <servers>
    <!-- server
     | Specifies the authentication information to use when connecting to a particular server, identified by
     | a unique name within the system (referred to by the 'id' attribute below).
     |
     | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are
     |       used together.
     |
    <server>
      <id>deploymentRepo</id>
      <username>repouser</username>
      <password>repopwd</password>
    </server>
    -->

    <!-- Another sample, using keys to authenticate.
    <server>
      <id>siteServer</id>
      <privateKey>/path/to/private/key</privateKey>
      <passphrase>optional; leave empty if not used.</passphrase>
    </server>
    -->
  </servers>

  <!-- mirrors
   | This is a list of mirrors to be used in downloading artifacts from remote repositories.
   |
   | It works like this: a POM may declare a repository to use in resolving certain artifacts.
   | However, this repository may have problems with heavy traffic at times, so people have mirrored
   | it to several places.
   |
   | That repository definition will have a unique id, so we can create a mirror reference for that
   | repository, to be used as an alternate download site. The mirror site will be the preferred
   | server for that repository.
   |-->
  <mirrors>
    <!-- mirror
     | Specifies a repository mirror site to use instead of a given repository. The repository that
     | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
     | for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
     |
    <mirror>
      <id>mirrorId</id>
      <mirrorOf>repositoryId</mirrorOf>
      <name>Human Readable Name for this Mirror.</name>
      <url>http://my.repository.com/repo/path</url>
    </mirror>
     -->
    <mirror>
      <id>apache snapshots</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/apache-snapshots</url>
      <blocked>true</blocked>
    </mirror>
    <mirror>
      <id>central</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/central</url>
      <blocked>true</blocked>
    </mirror>
    <mirror>
      <id>staging-alpha-group</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/staging-alpha-group</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>staging-alpha</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/staging-alpha</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>mapr-public</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/mapr-public</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>grails-core</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/grails-core</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>snapshots</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/snapshots</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>releases</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/releases</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>public</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/public</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>spring-plugin</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/spring-plugin</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>spring</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/spring</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>jcenter</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/jcenter</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>gradle-plugin</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/gradle-plugin</url>
      <blocked>true</blocked>
    </mirror>
            <mirror>
      <id>google</id>
      <mirrorOf>external:https:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>https://maven.aliyun.com/repository/google</url>
      <blocked>true</blocked>
    </mirror>
  </mirrors>

  <!-- profiles
   | This is a list of profiles which can be activated in a variety of ways, and which can modify
   | the build process. Profiles provided in the settings.xml are intended to provide local machine-
   | specific paths and repository locations which allow the build to work in the local environment.
   |
   | For example, if you have an integration testing plugin - like cactus - that needs to know where
   | your Tomcat instance is installed, you can provide a variable here such that the variable is
   | dereferenced during the build process to configure the cactus plugin.
   |
   | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles
   | section of this document (settings.xml) - will be discussed later. Another way essentially
   | relies on the detection of a system property, either matching a particular value for the property,
   | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a
   | value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'.
   | Finally, the list of active profiles can be specified directly from the command line.
   |
   | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact
   |       repositories, plugin repositories, and free-form properties to be used as configuration
   |       variables for plugins in the POM.
   |
   |-->
  <profiles>
    <!-- profile
     | Specifies a set of introductions to the build process, to be activated using one or more of the
     | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>
     | or the command line, profiles have to have an ID that is unique.
     |
     | An encouraged best practice for profile identification is to use a consistent naming convention
     | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc.
     | This will make it more intuitive to understand what the set of introduced profiles is attempting
     | to accomplish, particularly when you only have a list of profile id's for debug.
     |
     | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.
    <profile>
      <id>jdk-1.4</id>

      <activation>
        <jdk>1.4</jdk>
      </activation>

      <repositories>
        <repository>
          <id>jdk14</id>
          <name>Repository for JDK 1.4 builds</name>
          <url>http://www.myhost.com/maven/jdk14</url>
          <layout>default</layout>
          <snapshotPolicy>always</snapshotPolicy>
        </repository>
      </repositories>
    </profile>
    -->

    <!--
     | Here is another profile, activated by the system property 'target-env' with a value of 'dev',
     | which provides a specific path to the Tomcat instance. To use this, your plugin configuration
     | might hypothetically look like:
     |
     | ...
     | <plugin>
     |   <groupId>org.myco.myplugins</groupId>
     |   <artifactId>myplugin</artifactId>
     |
     |   <configuration>
     |     <tomcatLocation>${tomcatPath}</tomcatLocation>
     |   </configuration>
     | </plugin>
     | ...
     |
     | NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to
     |       anything, you could just leave off the <value/> inside the activation-property.
     |
    <profile>
      <id>env-dev</id>

      <activation>
        <property>
          <name>target-env</name>
          <value>dev</value>
        </property>
      </activation>

      <properties>
        <tomcatPath>/path/to/tomcat/instance</tomcatPath>
      </properties>
    </profile>
    -->
  </profiles>

  <!-- activeProfiles
   | List of profiles that are active for all builds.
   |
  <activeProfiles>
    <activeProfile>alwaysActiveProfile</activeProfile>
    <activeProfile>anotherAlwaysActiveProfile</activeProfile>
  </activeProfiles>
  -->
</settings>

1、解压下载好的三个zip。

2、编译datacollector-api-master

cd datacollector-api-master
mvn install -DskipTests

3、编译datacollector-plugin-api-master

cd datacollector-plugin-api-master
mvn install -DskipTests

4、编译datacollector-master

这里面现在很多网址用不了,而且很多包也下架了。要更新pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<!--

    Copyright 2019 StreamSets Inc.

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

--><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.
org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.streamsets</groupId>
  <artifactId>streamsets-datacollector</artifactId>
  <version>3.23.0-SNAPSHOT</version>
  <description>StreamSets Data Collector</description>
  <name>StreamSets Data Collector</name>
  <packaging>pom</packaging>
  <url>http://www.streamsets.com</url>
  <scm>
    <url>https://github.com/streamsets/datacollector</url>
  </scm>

  <organization>
    <name>StreamSets</name>
    <url>http://www.streamsets.com</url>
  </organization>

  <licenses>
    <license>
      <name>Apache License, Version 2.0</name>
      <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
    </license>
  </licenses>

  <developers>
    <!-- TODO add rest of team-->
    <developer>
      <id>brock</id>
      <name>Brock Noland</name>
      <email>brock@streamsets.com</email>
      <timezone>America/Chicago</timezone>
    </developer>
  </developers>

  <properties>
    <rat-plugin.version>0.12</rat-plugin.version>

    <!--
         Stage libraries that always built with the data collector (they dont have protolibs)

         IMPORTANT: keep this in alphabetical order

         IMPORTANT: define a property matching the module directory name
         for every stage library here with the directory name as value
    -->

    <aerospike-lib>aerospike-lib</aerospike-lib>
    <aws-lib>aws-lib</aws-lib>
    <aws-secrets-manager-credentialstore-lib>aws-secrets-manager-credentialstore-lib</aws-secrets-manager-credentialstore-lib>
    <azure-keyvault-credentialstore-lib>azure-keyvault-credentialstore-lib</azure-keyvault-credentialstore-lib>
    <basic-lib>basic-lib</basic-lib>
    <bigtable-lib>bigtable-lib</bigtable-lib>
    <crypto-lib>crypto-lib</crypto-lib>
    <cyberark-credentialstore-lib>cyberark-credentialstore-lib</cyberark-credentialstore-lib>
    <dev-lib>dev-lib</dev-lib>
    <dataformats-lib>dataformats-lib</dataformats-lib>
    <google-cloud-lib>google-cloud-lib</google-cloud-lib>
    <influxdb_0_9-lib>influxdb_0_9-lib</influxdb_0_9-lib>
    <jks-credentialstore-lib>jks-credentialstore-lib</jks-credentialstore-lib>
    <jdbc-lib>jdbc-lib</jdbc-lib>
    <jms-lib>jms-lib</jms-lib>
    <kinesis-lib>kinesis-lib</kinesis-lib>
    <mleap-lib>mleap-lib</mleap-lib>
    <mysql-binlog-lib>mysql-binlog-lib</mysql-binlog-lib>
    <omniture-lib>omniture-lib</omniture-lib>
    <orchestrator-lib>orchestrator-lib</orchestrator-lib>
    <rabbitmq-lib>rabbitmq-lib</rabbitmq-lib>
    <redis-lib>redis-lib</redis-lib>
    <salesforce-lib>salesforce-lib</salesforce-lib>
    <sap_hana-lib>sap_hana-lib</sap_hana-lib>
    <stats-lib>stats-lib</stats-lib>
    <tensorflow-lib>tensorflow-lib</tensorflow-lib>
    <vault-credentialstore-lib>vault-credentialstore-lib</vault-credentialstore-lib>
    <wholefile-transformer-lib>wholefile-transformer-lib</wholefile-transformer-lib>
    <windows-lib>windows-lib</windows-lib>
    <rootProject>true</rootProject>
    <datacollector-api.version>3.23.0-SNAPSHOT</datacollector-api.version>
    <datacollector-spark-api.version>3.23.0-SNAPSHOT</datacollector-spark-api.version>
    <thycotic-credentialstore-lib>thycotic-credentialstore-lib</thycotic-credentialstore-lib>
  </properties>

  <!-- StreamSets Data Collector API being used -->
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>com.streamsets</groupId>
        <artifactId>streamsets-datacollector-api</artifactId>
        <version>${datacollector-api.version}</version>
      </dependency>
      <dependency>
        <groupId>com.streamsets</groupId>
        <artifactId>streamsets-datacollector-spark-api</artifactId>
        <version>${datacollector-spark-api.version}</version>
      </dependency>
      <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>javax.servlet-api</artifactId>
        <version>3.1.0</version>
      </dependency>
    </dependencies>
  </dependencyManagement>

  <modules>

    <!-- IMPORTANT: The main section of the POM must not include any stage library module -->

    <module>rbgen-maven-plugin</module>
    <module>root-proto</module>
    <module>root</module>
    <module>testing</module>
    <module>bootstrap</module>
    <module>utils</module>
    <module>sso</module>
    <module>aster-client</module>
    <module>common</module>
    <module>upgrader</module>
    <module>container-common</module>
    <module>metadata-generator</module>
    <module>google-common</module>
    <module>google-connection</module>
    <module>json-dto</module>
    <module>messaging-client</module>
    <module>container</module>
    <module>miniSDC</module>
    <module>sdk</module>
    <module>stage-lib-archetype</module>
    <module>hadoop-common</module>
    <module>mapr-common</module>
    <module>jks-common</module>
    <module>aws-support</module>
    <module>aws-s3-connection</module>
    <module>aws-kinesis-connection</module>
    <module>jdbc-connection</module>
    <module>aws-sqs-connection</module>
    <module>salesforce-connection</module>
    <module>kafka-connection</module>
    <module>elasticsearch-connection</module>
    <module>aws-shared</module>

    <!-- cluster connections -->
    <module>cluster-connections/emr-cluster-connection</module>

    <module>root-lib</module>

    <module>stagesupport</module>
    <module>guavasupport</module>
    <module>commonlib</module>
    <module>httpcommonlib</module>
    <module>net-commonlib</module>

    <module>aws-secrets-manager-credentialstore-protolib</module>
    <module>azure-keyvault-credentialstore-protolib</module>
    <module>cyberark-credentialstore-protolib</module>
    <module>lookup-protolib</module>

    <module>hdfs-protolib</module>
    <module>mapreduce-protolib</module>
    <module>maprfs-protolib</module>
    <module>maprdb-protolib</module>
    <module>mapr_json-protolib</module>
    <module>mapr_json-5_2-protolib</module>
    <module>mapr_json-6_0-protolib</module>
    <module>hive-protolib</module>
    <module>jks-credentialstore-protolib</module>

    <module>dir-spooler-protolib</module>

    <module>sdc-kafka-api</module>
    <module>sdc-kafka_0_8</module>
    <module>sdc-kafka_0_9-common</module>
    <module>sdc-kafka_0_9</module>
    <module>sdc-kafka_0_9_mapr_5_1</module>
    <module>sdc-kafka_0_9_mapr_5_2</module>
    <module>sdc-kafka_0_10</module>
    <module>sdc-kafka_0_11-common</module>
    <module>sdc-kafka_0_11</module>
    <module>sdc-kafka_1_0</module>
    <module>sdc-kafka_2_0</module>
    <module>sdc-kafka_0_11_mapr_6_1</module>
    <module>kafka-common</module>
    <module>kafka_source-protolib</module>
    <module>kafka_multisource-protolib</module>
    <module>kafka_multisource-0_9-protolib</module>
    <module>kafka_multisource-0_10-protolib</module>
    <module>kafka_target-protolib</module>
    <module>maprstreams-common</module>
    <module>maprstreams-target-protolib</module>
    <module>maprstreams-source-protolib</module>
    <module>maprstreams-multisource-protolib</module>
    <module>jython-protolib</module>
    <module>groovy-protolib</module>
    <module>kinetica-protolib</module>
    <module>kinetica-6_2-protolib</module>
    <module>couchbase-protolib</module>

    <module>snowflake-connection</module>

    <module>elasticsearch-protolib</module>

    <module>solr-protolib</module>

    <module>cassandra-protolib</module>

    <module>mongodb-protolib</module>

    <module>flume-protolib</module>

    <module>cluster-hdfs-protolib</module>
    <module>sdc-hbase-0_98</module>
    <module>sdc-hbase-2_0</module>
    <module>sdc-hbase-api</module>
    <module>hbase-protolib</module>
    <module>kudu-protolib</module>
    <module>cluster-common</module>
    <module>cluster-kafka-protolib</module>
    <module>cluster-bootstrap-api</module>
    <module>cluster-bootstrap</module>
    <module>mapr-cluster-bootstrap</module>
    <module>mapr-cluster-bootstrap_2_2</module>
    <module>mesos-bootstrap</module>
    <module>client-api</module>
    <module>cli</module>
    <module>sdc-solr-api</module>
    <module>sdc-solr_cdh_4</module>
    <module>sdc-solr_6</module>
    <module>sdc-solr_7</module>
    <module>sdc-solr_8</module>
    <module>spark-executor-protolib</module>
    <module>spark-processor-protolib</module>
    <module>scripting-protolib</module>
    <module>wholefile-converter-protolib</module>
    <module>emr-protolib</module>

    <!--
         Stage libraries that always built with the data collector (they dont have protolibs)

         IMPORTANT: keep this in alphabetical order

         IMPORTANT: define a property matching the module directory name
         for every stage library here with the directory name as value
    -->
    <module>aerospike-lib</module>
    <module>aws-lib</module>
    <module>aws-secrets-manager-credentialstore-lib</module>
    <module>azure-lib</module>
    <module>azure-keyvault-credentialstore-lib</module>
    <module>basic-lib</module>
    <module>file-transfer-connection</module>
    <module>bigtable-lib</module>
    <module>crypto-lib</module>
    <module>cyberark-credentialstore-lib</module>
    <module>dev-lib</module>
    <module>dataformats-lib</module>
    <module>google-cloud-lib</module>
    <module>influxdb_0_9-lib</module>
    <module>jks-credentialstore-lib</module>
    <module>jdbc-lib</module>
    <module>jdbc-protolib</module>
    <module>jms-lib</module>
    <module>kinesis-lib</module>
    <module>mleap-lib</module>
    <module>mysql-binlog-lib</module>
    <module>omniture-lib</module>
    <module>orchestrator-lib</module>
    <module>rabbitmq-lib</module>
    <module>redis-lib</module>
    <module>salesforce-lib</module>
    <module>sap_hana-lib</module>
    <module>stats-lib</module>
    <module>tensorflow-lib</module>
    <module>vault-credentialstore-lib</module>
    <module>wholefile-transformer-lib</module>
    <module>windows-lib</module>
    <module>pulsar-protolib</module>
    <module>thycotic-credentialstore-lib</module>
    <module>google-cloud-support</module>
    <module>apache-kudu-connection</module>
    <module>azure-connection</module>
    <module>jms-connection</module>
  </modules>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-deploy-plugin</artifactId>
        <version>2.8.2</version>
        <configuration>
          <skip>false</skip>
        </configuration>
      </plugin>

      <plugin>
        <groupId>org.apache.rat</groupId>
        <artifactId>apache-rat-plugin</artifactId>
        <version>${rat-plugin.version}</version>
        <configuration>
          <excludeSubProjects>false</excludeSubProjects>
          <excludes>
            <exclude>CHANGES.txt</exclude>
            <!-- Eclipse -->
            <exclude>**/.settings/*</exclude>
            <exclude>**/.classpath</exclude>
            <exclude>**/.project</exclude>
            <!-- IntelliJ IDE -->
            <exclude>**/.idea/**</exclude>
            <exclude>**/*.iml</exclude>

            <!-- Git -->
            <exclude>**/.gitignore</exclude>
            <exclude>.gitreview</exclude>
            <exclude>.git/**</exclude>

            <!-- Jenkins -->
            <exclude>**/buildInfo*.properties</exclude>

            <!-- Maven -->
            <exclude>**/target/**</exclude>
            <exclude>.m2/**</exclude> <!-- maven on jenkins -->

            <!-- Node.js Modules -->
            <exclude>**/node_modules/**</exclude>

            <!-- Bower -->
            <exclude>**/.bowerrc</exclude>

            <!-- Rat doesn't properly recognize header in some files -->
            <exclude>salesforce-connection/src/main/java/com/streamsets/pipeline/lib/salesforce/connection/mutualauth/ClientSSLTransport.java</exclude>

            <!-- Files that do not support comments, cannot have Licence header -->
            <exclude>**/META-INF/services/**</exclude>
            <exclude>sdk/src/main/services/**</exclude>
            <exclude>**/*.conf</exclude>
            <exclude>**/*.svg</exclude>
            <exclude>**/MANIFEST.MF</exclude>
            <exclude>**/service.sdl</exclude>
            <exclude>**/*.avro</exclude>
            <exclude>**/*.db</exclude>
            <exclude>**/*.csv</exclude>
            <exclude>**/*.txt</exclude>
            <exclude>**/*.json</exclude>
            <exclude>**/*.log</exclude>
            <exclude>**/*.html</exclude>
            <exclude>common/src/main/resources/*</exclude>
            <exclude>**/*.desc</exclude>
            <exclude>**/*.proto</exclude>
            <exclude>**/*.md</exclude>
            <exclude>**/*.xls</exclude>
            <exclude>**/*.xlsx</exclude>
            <exclude>**/*.properties</exclude>

            <!-- Protobuf generated files which are checked in for testing-->
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/EmployeeProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/EngineerProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/ExecutiveProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/ExtensionsProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/PersonProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/RepeatedProto.java</exclude>
            <exclude>commonlib/src/test/java/com/streamsets/pipeline/lib/util/OneofProto.java</exclude>
            <exclude>commonlib/src/test/resources/*.ser</exclude>
            <exclude>commonlib/src/main/resources/*</exclude>
            <exclude>basic-lib/src/test/resources/*.ser</exclude>
            <exclude>**/id_rsa_test</exclude>
            <exclude>**/id_rsa_test_unencrypted</exclude>
            <exclude>**/*.pub</exclude>

            <!-- TestOverrunStreamingXmlParser relies on this file being small, cannot add license header -->
            <exclude>common/src/test/resources/TestStreamingXmlParser-records.xml</exclude>

            <!-- Modules to exclude (for now), not java stuff -->
            <exclude>python/**</exclude>
            <exclude>docs/**</exclude>
            <exclude>cloudera-integration/csd/**</exclude>
            <exclude>datacollector-ui/src/main/webapp/common/directives/**</exclude>

            <!-- Test Databricks ML Model files -->
            <exclude>databricks-ml-protolib/src/test/resources/**</exclude>

            <!-- Example scripts for scripting stages -->
            <exclude>jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_init_script.py</exclude>
            <exclude>jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_script.py</exclude>
            <exclude>jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_destroy_script.py</exclude>
            <exclude>basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_init_script.js</exclude>
            <exclude>basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_script.js</exclude>
            <exclude>basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_destroy_script.js</exclude>
            <exclude>groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_init_script.groovy</exclude>
            <exclude>groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_script.groovy</exclude>
            <exclude>groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_destroy_script.groovy</exclude>
            <exclude>groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/origin/groovy/GeneratorOriginScript.groovy</exclude>
            <exclude>basic-lib/src/main/resources/com/streamsets/pipeline/stage/origin/javascript/GeneratorOriginScript.js</exclude>
            <exclude>jython-protolib/src/main/resources/com/streamsets/pipeline/stage/origin/jython/GeneratorOriginScript.py</exclude>

          </excludes>
          <goal>run</goal>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.owasp</groupId>
        <artifactId>dependency-check-maven</artifactId>
        <version>3.1.2</version>
        <inherited>false</inherited>
        <configuration>
          <!-- The plugin's report is not manageable since we don't control what stage libraries ship. The
            GitHub interface is a bit more useful than this one.
          -->
          <skip>true</skip>

          <!-- We don't have any .NET code -->
          <assemblyAnalyzerEnabled>false</assemblyAnalyzerEnabled>
          <!-- skip non-bundled jars -->
          <skipProvidedScope>true</skipProvidedScope>
          <skipRuntimeScope>true</skipRuntimeScope>
          <!-- We want HTML for easy viewing, but XML for reporting via SonarQube -->
          <format>ALL</format>
          <suppressionFile>${basedir}/dependency-check-suppression.xml</suppressionFile>
        </configuration>
        <reportSets>
          <reportSet>
            <id>aggregate</id>
            <inherited>false</inherited>
            <reports>
              <report>aggregate</report>
            </reports>
          </reportSet>
        </reportSets>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-project-info-reports-plugin</artifactId>
        <version>2.8</version>
        <configuration>
          <dependencyLocationsEnabled>false</dependencyLocationsEnabled>
          <dependencyDetailsEnabled>false</dependencyDetailsEnabled>
        </configuration>
        <reportSets>
          <reportSet>
            <reports>
              <report>dependencies</report>
            </reports>
          </reportSet>
        </reportSets>
      </plugin>
    </plugins>
  </reporting>

  <profiles>
    <profile>
      <id>rat-check</id>
      <activation>
        <activeByDefault>true</activeByDefault>
        <property>
          <name>!skipRat</name>

        </property>
      </activation>
      <build>
        <plugins>
          <plugin>
            <inherited>false</inherited>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>1.6.0</version>
            <executions>
              <execution>
                <id>rat-check</id>
                <phase>generate-sources</phase>
                <goals>
                  <goal>exec</goal>
                </goals>
              </execution>
            </executions>
            <configuration>
              <executable>mvn</executable>
              <workingDirectory>${basedir}</workingDirectory>
              <arguments>
                <argument>apache-rat:check</argument>
                <argument>-N</argument>
              </arguments>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </profile>

    <profile>
      <id>all-libs</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <properties>
        <!--
             IMPORTANT: keep this in alphabetical order

             IMPORTANT: define a property matching the module directory name
             for every stage library here with the directory name as value
        -->
        <apache-kafka_2_1-lib>apache-kafka_2_1-lib</apache-kafka_2_1-lib>
        <apache-kudu_1_7-lib>apache-kudu_1_7-lib</apache-kudu_1_7-lib>
        <apache-pulsar_2-lib>apache-pulsar_2-lib</apache-pulsar_2-lib>
        <apache-solr_6_1_0-lib>apache-solr_6_1_0-lib</apache-solr_6_1_0-lib>
        <cassandra_3-lib>cassandra_3-lib</cassandra_3-lib>
        <cdh-spark_2_3-lib>cdh-spark_2_3-lib</cdh-spark_2_3-lib>
        <cdh_6_2-lib>cdh_6_2-lib</cdh_6_2-lib>
        <cdh_kafka_2_1-lib>cdh_kafka_2_1-lib</cdh_kafka_2_1-lib>
        <couchbase_5-lib>couchbase_5-lib</couchbase_5-lib>
        <elasticsearch_7-lib>elasticsearch_7-lib</elasticsearch_7-lib>
        <groovy_2_4-lib>groovy_2_4-lib</groovy_2_4-lib>
        <jython_2_7-lib>jython_2_7-lib</jython_2_7-lib>
        <kinetica_7_0-lib>kinetica_7_0-lib</kinetica_7_0-lib>
        <mapr_6_1-lib>mapr_6_1-lib</mapr_6_1-lib>
        <mongodb_4-lib>mongodb_4-lib</mongodb_4-lib>
      </properties>
      <modules>
        <!--
             IMPORTANT: keep this in alphabetical order

             IMPORTANT: the modules for all stage libraries
        -->
        <module>apache-kafka_2_1-lib</module>
        <module>apache-kudu_1_7-lib</module>
        <module>apache-pulsar_2-lib</module>
        <module>apache-solr_6_1_0-lib</module>
        <module>cassandra_3-lib</module>
        <module>cdh_6_2-lib</module>
        <module>cdh_kafka_2_1-lib</module>
        <module>cdh-spark_2_3-lib</module>
        <module>couchbase_5-lib</module>
        <module>elasticsearch_7-lib</module>
        <module>groovy_2_4-lib</module>
        <module>kinetica_7_0-lib</module>
        <module>jython_2_7-lib</module>
        <module>mapr_6_1-lib</module>
        <module>mongodb_4-lib</module>
      </modules>
    </profile>

    <profile>
      <id>sample-dev-libs</id>
      <activation>
        <property>
          <name>!protolibs-only</name>
        </property>
      </activation>
      <properties>
        <!--
             Use only the latest versions here. If new version needs to be added, move the older version in all-libs

             IMPORTANT: keep this in alphabetical order

             IMPORTANT: define a property matching the module directory name
             for all the sample dev stage libraries here with the directory name as value
        -->
        <apache-kafka_2_7-lib>apache-kafka_2_7-lib</apache-kafka_2_7-lib>
        <apache-pulsar_2-lib>apache-pulsar_2-lib</apache-pulsar_2-lib>
        <apache-solr_6_1_0-lib>apache-solr_6_1_0-lib</apache-solr_6_1_0-lib>
        <cassandra_3-lib>cassandra_3-lib</cassandra_3-lib>
        <couchbase_5-lib>couchbase_5-lib</couchbase_5-lib>
        <elasticsearch_7-lib>elasticsearch_7-lib</elasticsearch_7-lib>
        <groovy_2_4-lib>groovy_2_4-lib</groovy_2_4-lib>
        <jython_2_7-lib>jython_2_7-lib</jython_2_7-lib>
        <kinetica_7_0-lib>kinetica_7_0-lib</kinetica_7_0-lib>
        <mongodb_4-lib>mongodb_4-lib</mongodb_4-lib>
        <azure-lib>azure_lib</azure-lib>
      </properties>
      <modules>
        <!--
             Use only the latest versions here. If new version needs to be added, move the older version in all-libs

             IMPORTANT: keep this in alphabetical order

             IMPORTANT: the modules for the sample dev stage libraries
        -->
        <module>apache-kafka_2_7-lib</module>
        <module>apache-pulsar_2-lib</module>
        <module>apache-solr_6_1_0-lib</module>
        <module>cassandra_3-lib</module>
        <module>couchbase_5-lib</module>
        <module>elasticsearch_7-lib</module>
        <module>groovy_2_4-lib</module>
        <module>kinetica_7_0-lib</module>
        <module>jython_2_7-lib</module>
        <module>mongodb_4-lib</module>
      </modules>
    </profile>

    <profile>
      <id>archetype</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <modules>
        <module>stage-lib-archetype</module>
      </modules>
    </profile>

    <profile>
      <id>sign</id>
      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-gpg-plugin</artifactId>
            <executions>
              <execution>
                <phase>verify</phase>
                <goals>
                  <goal>sign</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>

    <!-- IMPORTANT: keep the following profiles at the end -->

    <!--
         Some maven plugins don't work properly unless the stage-lib-parent is being
         build as well. Example is mvn version:set command.
     -->
    <profile>
      <id>stage-lib-parent</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <modules>
        <module>stage-lib-parent</module>
      </modules>
    </profile>

    <profile>
      <!-- include all "base" modules, where a set of stagelibs needs a common parent pom that doesn't necessarily
           need to be imported as a module in its own right; this is necessary in order to run the version update
           properly -->
      <id>all-poms</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <modules>
        <module>hdp-stagelib-base</module>
        <module>cdh_6-stagelib-base</module>
      </modules>
    </profile>

    <profile>
      <id>ui</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <modules>
        <module>datacollector-ui</module>
      </modules>
    </profile>

    <profile>
      <id>docs</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <modules>
        <module>docs</module>
      </modules>
    </profile>

    <profile>
      <id>dist</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <modules>
        <module>dist</module>
        <module>cloudera-integration</module>
      </modules>
    </profile>

    <profile>
      <id>release</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <modules>
        <module>release</module>
      </modules>
    </profile>

    <profile>
      <id>rpm</id>
      <activation>
        <activeByDefault>false</activeByDefault>
        <property>
          <name>release</name>
        </property>
      </activation>
      <modules>
        <module>rpm</module>
      </modules>
    </profile>

    <profile>
      <id>java-src</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <!--<version>2.6</version>-->
            <inherited>false</inherited>
            <configuration>
              <appendAssemblyId>false</appendAssemblyId>
              <attach>false</attach>
              <tarLongFileMode>gnu</tarLongFileMode>
              <finalName>streamsets-datacollector-java-src-${project.version}</finalName>
              <descriptors>
                <descriptor>release/src/main/assemblies/java-src.xml</descriptor>
              </descriptors>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </profile>

    <profile>
      <id>generate-sources</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-source-plugin</artifactId>
            <executions>
              <execution>
                <id>attach-sources</id>
                <goals>
                  <goal>jar</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>
  </profiles>

  <pluginRepositories>
    <pluginRepository>
      <id>cdh.plugin.repo</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
      <name>Cloudera Repositories</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </pluginRepository>
  </pluginRepositories>

  <repositories>
    <repository>
      <id>cdh.repo</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
      <name>Cloudera Repositories</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>confluent</id>
      <url>http://packages.confluent.io/maven/</url>
    </repository>
    <repository>
      <id>elasticsearch-releases</id>
      <url>https://artifacts.elastic.co/maven</url>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>mapr-releases</id>
      <url>http://repository.mapr.com/maven/</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <releases>
        <enabled>true</enabled>
        <updatePolicy>always</updatePolicy>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
      <snapshots>
        <enabled>false</enabled>
        <updatePolicy>never</updatePolicy>
        <checksumPolicy>fail</checksumPolicy>
      </snapshots>
      <id>HDPReleases</id>
      <name>HDP Releases</name>
      <url>http://repo.hortonworks.com/content/repositories/releases/</url>
      <layout>default</layout>
    </repository>
    <repository>
      <releases>
        <enabled>true</enabled>
        <updatePolicy>always</updatePolicy>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
      <snapshots>
        <enabled>false</enabled>
        <updatePolicy>never</updatePolicy>
        <checksumPolicy>fail</checksumPolicy>
      </snapshots>
      <id>HDPRehosted</id>
      <name>HDP Releases</name>
      <url>http://repo.hortonworks.com/content/repositories/releases/</url>
      <layout>default</layout>
    </repository>
    <repository>
      <releases>
        <enabled>true</enabled>
        <updatePolicy>always</updatePolicy>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
      <snapshots>
        <enabled>false</enabled>
        <updatePolicy>never</updatePolicy>
        <checksumPolicy>fail</checksumPolicy>
      </snapshots>
      <id>HDPJetty</id>
      <name>HDP Jetty</name>
      <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url>
      <layout>default</layout>
    </repository>
    <repository>
      <id>snapshots-repo</id>
      <url>https://oss.sonatype.org/content/repositories/snapshots</url>
      <releases><enabled>false</enabled></releases>
      <snapshots><enabled>true</enabled></snapshots>
    </repository>

    <!-- for Kinetica -->
    <repository>
      <id>kinetica-releases</id>
      <url>http://files.kinetica.com/nexus/content/repositories/releases/</url>
    </repository>

    <!-- Databricks ML -->
    <repository>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <id>bintray-databricks-maven</id>
      <name>bintray</name>
      <url>https://maven.aliyun.com/repository/central</url>
    </repository>
     <repository>
       <id>spring</id>
       <url>https://maven.aliyun.com/repository/spring</url>
     </repository>
     <repository>
       <id>central</id>
       <url>https://maven.aliyun.com/repository/central</url>
     </repository>
      <repository>
        <id>mapr-public</id>
        <url>https://maven.aliyun.com/repository/mapr-public</url>
      </repository>
  </repositories>

</project>

进入datacollector-master/datacollector-ui  修改pom.xml,穷,买不起代理,下载git上的资源经常性的GG。

   <properties>
     <bowerInstallArgs>install --offline</bowerInstallArgs>
   </properties>

手动安装bower.json里面的js项目,这里不写了,网速不友好的最好一个一个下。下载好了再编译,不然在编译真个项目的时候过不了。

然后就是编译dist的时候,会报copy错误,这个需要自己下载包放到m2里面;包见下面连接:

链接: https://pan.baidu.com/s/1BQZa1CK7S8khjIjrIAtZmg?pwd=nu2i 提取码: nu2i 
链接: https://pan.baidu.com/s/1pN6lXY6BbEo_If9NUOT4pg?pwd=vcyk 提取码: vcyk 
链接: https://pan.baidu.com/s/17W4DeXq1_0RkP5w7EK5wqA?pwd=y84r 提取码: y84r 
链接: https://pan.baidu.com/s/1Ft9tkIQ6OeFgiMr61RGz2g?pwd=8bdj 提取码: 8bdj 

将四个包放到.m2/repository/com/streamsets/streamsets-datacollector-edge/3.23.0-SNAPSHOT/文件夹里面。

回到datacollector-master 

发布模式编译:

mvn -T 8 clean package -Drelease -DskipTests -P-rpm

5、等吧,等吧。编译成功后,包都在datacollector-master/release/target 下面。

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐