What’s New in 3.20.x

Data Collector version 3.20.x includes the following new features and enhancements:

Stage Enhancements
  • Amazon stages – You can configure Amazon stages to assume another role when using AWS access keys authentication.

    To assume another role, you first must define the trust policy in AWS that allows the role to be assumed. Then, you configure the required stage properties in Data Collector.

  • Azure Data Lake Storage Gen2 stages – The “OAuth Token” authentication method has been renamed to the “OAuth with Service Principal” authentication method. When using this authentication method, you no longer specify an OAuth 2.0 token endpoint. Instead, you specify a tenant ID.

    This change does not affect existing pipelines. During an upgrade, the tenant ID value is retrieved from the full OAuth 2.0 token endpoint value.

  • JMS stages:
    • The Use Credentials property has been moved from the JMS tab to the Credentials tab.
    • On the Credentials tab, you can optionally define additional JMS or JNDI security properties, such as the java.naming.security.principal and java.naming.security.credentials properties. The additional security properties support using credential functions to retrieve sensitive information from supported credential stores.

      The additional security properties take precedence over additional JMS configuration properties defined on the JMS tab.

    These changes do not affect existing pipelines.

  • Kudu Lookup processor – When the processor performs local caching, you can configure a new Retry on Missing Value property to retry a lookup before using the default value.
  • PostgreSQL Metadata processor:
    • By default, the processor creates new columns using names as they appear in the record. Previously, it lowercased column names.
    • You can configure a new Lowercase Column Names advanced property to create columns with lowercased names.

    These changes do not affect existing pipelines.

  • Google Pub/Sub Producer destination – The default values for the following properties have changed:
    • Max Outstanding Message Count – The default value is now 1000 messages. Previously, the default was 0 for no maximum.
    • Max Outstanding Request Bytes – The default value is now 8000 bytes. Previously, the default was 0 for no maximum.

    This can affect upgraded pipelines. For more information, see Review Google Pub/Sub Producer Pipelines.

  • To Error destination:
    • You can configure the destination to stop the pipeline when the destination receives an error record.
    • You can configure a custom error message that is added to each error record.
Connections when Registered with Control Hub
When Data Collector version 3.20.x is registered with Control Hub cloud or with Control Hub on-premises version 3.19.x or later, the following stages support using Control Hub connections:

  • Azure Data Lake Storage Gen2 stages
  • JMS stages
Connections define the information required to access data in external systems. You create connections in Control Hub and then use those connections when configuring pipelines in Control Hub Pipeline Designer. You cannot use Control Hub connections in the Data Collector pipeline canvas.
Security Enhancements
  • AWS Secrets Manager credential store – When Data Collector runs on an Amazon EC2 instance that has an associated instance profile, you can configure Data Collector to use the instance profile credentials to automatically authenticate with the AWS Secrets Manager.

Stage Libraries
This release includes the following new stage libraries:

Stage Library Name Description
streamsets-datacollector-apache-kafka_2_1-lib For Kafka 2.1.x.
streamsets-datacollector-apache-kafka_2_2-lib For Kafka 2.2.x.
streamsets-datacollector-apache-kafka_2_3-lib For Kafka 2.3.x.
streamsets-datacollector-apache-kafka_2_4-lib For Kafka 2.4.x.
streamsets-datacollector-apache-kafka_2_5-lib For Kafka 2.5.x.
streamsets-datacollector-apache-kafka_2_6-lib For Kafka 2.6.x.
Additional Enhancements
  • Health Inspector – You can now view information about the basic health of your Data Collector. Health Inspector provides a snapshot of how the Data Collector JVM, machine, and network are performing. It also lists related Data Collector configuration details.