3.17.0

What’s New in 3.17.0

Data Collector version 3.17.0 includes the following new features and enhancements:

New Stage
This release includes the following new stage:
  • SAP HANA Query Consumer – Use this origin to read data from an SAP HANA database with a user-defined query.
Stage Enhancements
This release includes the following stage enhancements:
  • Control Hub API processor – The processor can process responses of any size. Previously the maximum response size was 50,000 characters.
  • Elasticsearch destination – You can use record functions and delimited data record functions in the Additional Properties field.
  • Elasticsearch stages – The Elasticsearch origin and destination include a User Name property and a Password property instead of a single Security Username/Password property.

    This change does not affect existing pipelines. During an upgrade, existing configurations for the Security Username/Password property are placed into the User Name property, which supports the username:password format.

  • HTTP Client processor – You can configure the processor to use the following enhancements:
    • Actions to take based on the response status.
    • Pagination properties to enable processing large volumes of data from paginated APIs.
    • Action to take when the request times out because the HTTP service did not respond within the read timeout period.
  • JDBC MySQL data type conversions – The JDBC origins and JDBC processors convert MySQL unsigned integers as follows:
    • Bigint Unsigned converts to Decimal.
    • Int Unsigned and Mediumint Unsigned convert to Long.
    • Smallint Unsigned converts to Integer.

    This change can require performing a post-upgrade task.

  • Kinesis stages – The Kinesis Consumer origin, Kinesis Firehose destination, and Kinesis Producer destination provide an Authentication Method property that allows selecting either IAM Roles or AWS Keys.

    Previously, you used IAM roles by omitting AWS keys when configuring the stages. This change does not affect existing pipelines.

  • Orchestration stages:
    • Some orchestration stages and properties have been renamed. These changes do not affect existing pipelines. This includes, but is not limited to, the following:
      • The Start Job origin and processor are now the Start Jobs origin and processor.
      • The Start Pipeline origin and processor are now the Start Pipelines origin and processor.
      • The Wait for Job Completion processor is now the Wait for Jobs processor.
      • The Wait for Pipeline Completion processor is now the Wait for Pipelines processor.
    • Records generated by the Start Jobs and Start Pipelines stages, and updated by the Wait for Jobs and Wait for Pipelines stages, include pipeline and stage metrics when available. This includes input record, output record, error record, and error message counts.
  • Scripting origins – You can reset the origin for pipelines that include the Groovy Scripting, JavaScript Scripting, or Jython Scripting origin.
  • SFTP/FTP/FTPS origin – The origin generates an error when it encounters a file that it does not have permission to read instead of stopping the pipeline.
  • TensorFlow Evaluator processor:
    • The processor uses the 1.15 TensorFlow client library and supports all 1.x TensorFlow versions.
    • In the Fields to Convert property for each input configuration, you can configure a field type expression that defines a set of fields.
Pipeline Enhancements
This release includes the following pipeline enhancements:

  • Pipeline run history – The pipeline run history displays the input, output, and error record count for each pipeline run.
  • Pipeline run summary – Information about the most recent pipeline run remains available on the Summary tab of the pipeline after the pipeline stops. The summary includes run details such as the start time and duration.
  • Pipeline start and stop events – The event records generated when a pipeline starts and stops include fields for the related Control Hub job ID and job name.
  • Stage library panel display and stage installation:
    • The stage library panel displays all Data Collector stages, instead of only the installed stages. Stages that are not installed appear disabled, or greyed out.
    • When you click on a disabled stage, you can install the stage library that includes the stage.
Security Enhancements
This release includes the following security-related enhancements:
  • File-based user authentication – You can use the Data Collector UI to change your password when Data Collector is configured for file-based authentication.
  • Hashicorp Vault credential store – You can enable the use of a namespace in Hashicorp Vault by configuring a namespace path for the credentialStore.vault.config.namespace property in the $SDC_CONF/credential-stores.properties file.

    For example, credentialStore.vault.config.namespace=nspace1/nspace2/.

  • Runtime:resourcesDirPath() function – Returns the full path to the directory for runtime resource files.
  • SSL/TLS enhancement – Stages that use SSL/TLS can load the contents of the keystore and truststore from a credential store.
Additional Enhancement
This release includes the following additional enhancement:

  • Data Collector production batch size – The default value for the production.maxBatchSize property in the Data Collector configuration file has increased to 50,000 records. This change does not affect existing pipelines.
Deprecated Feature
This release includes the following deprecated feature:

  • Databricks ML Evaluator processor – This processor is deprecated and will be removed in a future release. Do not use the processor in new pipelines.