Cloudera DataFlow Concepts
Concepts
Learn about basic concepts of flow design and how those concepts relate to their NiFi counterparts.
Controller Service
Controller Services are extension points that provide information for use by other components (such
as processors or other controller services). The idea is that, rather than configure this information
in every processor that might need it, the controller service provides it for any processor to use as
needed.
Draft
A draft is a flow definition in development, and its lifecycle is tied to the workspace it is created
in. You can edit, update, and continuously test drafts by starting a Test Session. When your draft is
ready to be deployed in production, you can publish it as a flow definition to the CDF catalog.
Workspace
A workspace is where your drafts are stored. You can manage and delete drafts from this view. The
workspace is automatically created when DataFlow is enabled for an environment and is tied to the
lifecycle of your DataFlow service.
Environment
A logical environment defined with a specific virtual network and region on a customer’s cloud
provider account. After enabling DataFlow for an environment, its service components run in an
environment.
Parameter group
One default parameter group is auto-created when you create a new draft. You can then add
parameters this one group, you cannot create additional ones. When you initiate a test session, a
number of parameters are auto-generated in this parameter group. Depending on the configuration
options you chose, these are necessary for your NiFi sandbox deployment, and also make it possible
to integrate your NiFi instance with CDP components or external data sources.
This default group is automatically bound to each Process Group you create within your draft,
making all parameters available to be used in any process group.
Processor
The Processor is the NiFi component that is used to listen for incoming data; pull data from
external sources; publish data to external sources; and route, transform, or extract information from
FlowFiles.
For more information, see Apache NiFi User Guide.
Process group
When a data flow becomes complex, it often is beneficial to reason about the data flow at a higher,
more abstract level. NiFi allows multiple components, such as Processors, to be grouped together
into a Process Group.
In CDF Flow Designer, creating a draft automatically creates a process group with the name you
specified when creating the draft, acting as the root process group. You can create child process
groups as you see fit to organize your data processing logic.
For more information, see Apache NiFi User Guide.
Service
A logical environment defined with a specific virtual network and region on a customer’s cloud
provider account. CDF service components run in an environment. In the CDF Service, you can
provision a workload. This workload allows you to create a number of NiFi Deployments.
4