3.11.0

Data Collector version 3.11.0 includes the following new features and enhancements:

Origins
This release includes enhancements to the following origins:

  • Amazon S3 – The origin now generates event records when it starts processing a new object and when it finishes processing an object.
  • Azure Data Lake Storage Gen1 – The origin is no longer considered a Technology Preview feature and is approved for use in production.
  • Azure Data Lake Storage Gen2 – The origin is no longer considered a Technology Preview feature and is approved for use in production.
  • Google Big Query – The origin now supports JSON service-account credentials pasted directly into the UI.
  • Google Cloud Storage – The origin now supports JSON service-account credentials pasted directly into the UI.
  • Google Pub/Sub Subscriber – The origin now supports JSON service-account credentials pasted directly into the UI.
  • HTTP Client – The origin now supports time functions in the Resource URL property.
  • Kafka Consumer – The origin can now be configured to save the Kafka message key in the record. The origin can save the key in a record header attribute, a record field, or both.
  • Kafka Multitopic Consumer – The origin can now be configured to save the Kafka message key in the record. The origin can save the key in a record header attribute, a record field, or both.
  • Salesforce – The origin has a new Mismatched Types Behavior property, which specifies how to handle fields with types that do not match the schema.
  • SFTP/FTP/FTPS Client – The origin has three new timeout properties: Socket Timeout, Connection Timeout, and Data Timeout.
Processors
This release includes enhancements to the following processors:

  • Field Type Converter – The processor can now convert to the Zoned Datetime data type from the Datetime data type or the Date data type.
  • Groovy Evaluator – The processor now supports the use of the sdc wrapper object to access the constants, methods, and objects available to each script type.
  • HTTP Client – When responses to requests contain multiple values, the processor can now return the first value, all values in a list in a single record, or all values in separate records.
  • JavaScript Evaluator – The processor now supports the use of the sdc wrapper object to access the constants, methods, and objects available to each script type.
  • Jython Evaluator – The processor now supports the use of the sdc wrapper object to access the constants, methods, and objects available to each script type.
Destinations
This release includes enhancements to the following destinations:

  • Azure Data Lake Storage Gen1 – The destination is no longer considered a Technology Preview feature and is approved for use in production.
  • Azure Data Lake Storage Gen2 – The destination is no longer considered a Technology Preview feature and is approved for use in production.
  • Cassandra – The destination has new properties to disable batches and to set a timeout for individual write requests.
  • Google Big Query – The destination now supports JSON service-account credentials pasted directly into the UI.
  • Google Cloud Storage – The destination now supports JSON service-account credentials pasted directly into the UI.
  • Google Pub/Sub Subscriber – The destination now supports JSON service-account credentials pasted directly into the UI.
  • HTTP Client – The destination now supports time functions in the Resource URL property.
  • Kafka Producer – The destination can now read the Kafka message key stored in a record header. On the Data Format tab, you configure the expected format of the key.
  • Salesforce – The destination now writes data to Salesforce objects by matching case-sensitive field names. You can override the default field mappings by continuing to define specific mappings.
  • SFTP/FTP/FTPS Client – The destination has three new timeout properties: Socket Timeout, Connection Timeout, and Data Timeout.
  • Solr destination – The destination has two new timeout properties: Connection Timeout and Socket Timeout.
Executors
This release includes enhancements to the following executors:

  • ADLS Gen1 File Metadata – The executor is no longer considered a Technology Preview feature and is approved for use in production.
  • ADLS Gen2 File Metadata – The executor is no longer considered a Technology Preview feature and is approved for use in production.
  • JDBC Query – The executor can now generate events that you can use in an event stream. You can configure the executor to include the number of rows returned or affected by the query when generating events.
  • Spark – The executor now includes the following:
    • Additional fields in generated event records to store the user who submitted the job and the time that the job started.
    • Additional JARs property for applications written in Python.
Technology Preview Functionality
Data Collector includes certain new features and stages with the Technology Preview designation. Technology Preview functionality is available for use in development and testing, but is not meant for use in production.

Technology Preview stages include the following image on the stage icon: .

When Technology Preview functionality becomes approved for use in production, the release notes and documentation reflect the change, and the Technology Preview icon is removed from the UI.

The following Technology Preview stages are newly available in this release:

  • Cron Scheduler origin – Generates a record with the current datetime as scheduled by a cron expression.
  • Start Pipeline origin – Starts a Data Collector, Data Collector Edge, or Transformer pipeline.
  • Control Hub API processor- Calls a Control Hub API.
  • Start Job processor – Starts a Control Hub job.
  • Start Pipeline processor – Starts a Data Collector, Data Collector Edge, or Transformer pipeline.
Pipelines

This release includes the following pipeline enhancement:

  • You can now configure pipelines to write error records to Amazon S3.
Data Collector Configuration
This release includes the following Data Collector configuration enhancement:

  • The Data Collector configuration file sdc.properties contains a new stage-specific property, stage.conf_com.streamsets.pipeline.stage.hive.impersonate.current.user. You can set the property to true to enable the Hive Metadata processor, the Hive Metastore destination, and the Hive Query executor to impersonate the current user when connecting to Hive.
Stage Libraries
This release includes the following stage library enhancements:

  • New stage libraries – This release includes the following new stage libraries:
    Stage Library Name Description
    streamsets-datacollector-cdh_6_3-lib For the Cloudera CDH version 6.3 distribution of Apache Hadoop.
    streamsets-datacollector-orchestrator-lib For the orchestrator stages.
  • Updated stage libraries – This release includes updates to the following stage library:
    Stage Library Name Description
    streamsets-datacollector-hdp_3_1-lib For Hortonworks 3.1, the library now includes two additional stages:

    • Spark Evaluator processor
    • Spark executor