Create pipeline component

Building data pipelines often involves merging outputs from multiple data sources. The data from these sources may, however, require separate merging or transformation operations, necessitating their own data pipelines. To enable us to create complex data pipelines involving the combination of smaller data pipelines, we use pipeline components.

Pipeline components allow us to create sub-pipelines that can be used to create a main data pipeline. For instance, imagine that a procure-to-pay process has the transaction history for two months in separate files. We can create a sub-pipeline (called pipeline component) that appends the two files. To enrich the resulting transaction history with some purchase header, we can create a main data pipeline that merges the pipeline component with the purchase header. The figure below visually depicts the data pipeline operation.

Releas019

To perform such an operation, we begin by creating the pipeline component. From Portal, click Data > Create pipeline.

Releas020

Click Create pipeline component.

Releas021

The Data pipeline creation window appears. We can extract the history tables from any of the available sources. In our case, we use File upload. Review the data and click Import.

Releas022

The tables appear in the extract tab.

Releas023

In the Merge step, we can use an append join.

Releas024

In the Transformation step, add the desired columns.

Releas025

We can perform any further transformation steps to update the columns further. In our case, we leave as is. In the Load step, save and exit the pipeline component.

Releas026

Note

While a data pipeline must have the Case ID, Activity, and End timestamp columns to be successfully imported, a pipeline component may not have these field names since it is not a complete event log.

Enter a pipeline component name and click Create.

Releas027

As a next step, we create a data pipeline that enriches the pipeline component with a header table. From Portal, click Data > Create pipeline.

Releas028

Click Create data pipeline.

Releas029

Since the combined history table is the pipeline component, click Pipeline Component to extract it.

Releas030

Select the pipeline component and click Extract.

Releas031

Next, extract the header table through file upload or other means. Both tables appear in the extract tab.

Releas032

Note

When the schema of an existing pipeline component changes, a refresh button appears in the Extract step of other pipelines utilizing the pipeline component. When we click this button, the pipeline adopts the latest pipeline component.

Releas052

We can then merge the tables using a left join.

Releas033

Perform any necessary transformation steps in the Transform tab. In our case, we add the columns.

Releas034

We can then schedule the pipeline or upload the log to our Apromore Workspace.

Releas035

We can create a more complex data pipeline that combines multiple pipeline components. For instance, a first pipeline component (PC1) that appends multiple history tables, and a second pipeline component (PC2) joins multiple header tables. The main data pipeline then joins the output of PC1 and PC2. Note that to join these pipeline components, each pipeline component must have the same joining key.

The figure below describes this process visually.

Releas036

We begin by creating PC1 — appending multiple history tables. This is the same step as the previous example. Next, we create PC2 — appending multiple header tables. From Portal, click Data > Create pipeline. Click Create pipeline component.

Releas037

Extract the header tables.

Releas038

Merge both tables using append join.

Releas039

We can perform any transformation we deem necessary. In our case, we add all the columns from one of the tables.

Releas040

To save the pipeline component, click the save and exit icon Releas041.

Releas042

We can call the pipeline component, headers pipeline component. Click Create.

Releas043

Now we have both pipeline components, “History tables pipeline component” and “Headers pipeline component”, we can create a main data pipeline that combines them.

From Portal, click Data > Create pipeline. Click Create data pipeline.

Releas044

In the Extract step, click Pipeline component.

Releas045

This displays the saved pipeline components. Select the two pipeline components we wish to create this data pipeline off and click Extract.

Releas046

Once extracted successfully, the data components appear in the Extract step.

Releas047

In the Merge step, we perform a left join from the history table pipeline component (PC1).

Releas048

In the Transform step, we can add the relevant columns and perform any relevant transformation.

Releas049

In the Load step, we now must specify the Case ID, Activity, and End timestamp fields since this is a complete event log. Click Schedule Pipeline to schedule the pipeline at a regular interval. Alternatively, we can click Upload log to upload on our Apromore Workspace as a one-off event.

Releas050

Enter a name and click Upload.

Releas051

The log appears in the specified folder.