Dataset Parameterisation with Lookup & ForEach

Dataset Parameterisation with Lookup & ForEach

Dataset Parameterisation:

  • Definition: Dataset parameterization in Azure Data Factory allows you to create flexible datasets by defining parameters for specific properties like file names, folder paths, or table names.
  • Purpose: It enables dynamic data movement without hardcoding values, making your pipelines reusable and adaptable to different data scenarios.
  • Example:
    • You have a dataset pointing to a file in Blob Storage. Instead of hardcoding the file name, you create a parameter FileName and use it in the dataset path. During runtime, you can pass different file names like emp.csv or orders.csv to the dataset.
  • Use Case: Copying multiple files using the same pipeline by changing the file name parameter for each run.

2. Lookup Activity:

  • Definition: The Lookup activity in Azure Data Factory retrieves data from a source (e.g., a table or a file) and stores the result for use in subsequent activities.
  • Purpose: It is used to fetch configuration data, such as a list of files to be processed, parameter values, or control data for conditional execution in a pipeline.
  • Example:
    • Reading a JSON file or a database table that contains a list of file names (emp.csvorders.csv) to be processed. The output of this activity is then passed to a ForEach activity.
  • Use Case: Fetching a list of files to be copied or processing a set of records in a controlled manner.

3. ForEach Activity:

  • Definition: The ForEach activity in Azure Data Factory iterates over a collection of items (e.g., files, rows) and executes specified activities for each item in the collection.
  • Purpose: It automates repetitive tasks, such as processing multiple files or executing a set of activities for each item in a list.
  • Example:
    • Iterating over a list of file names obtained from the Lookup activity and copying each file from Blob Storage to ADLS using a Copy Data activity inside the ForEach loop.
  • Use Case: Copying or processing multiple files dynamically without hardcoding each file’s configuration.

These three components—dataset parameterization, Lookup activity, and ForEach activity—are essential tools in Azure Data Factory for creating dynamic, flexible, and scalable data pipelines.

4. Key differences : Parameterised Dataset & Lookup-ForEach activity

AspectParameterised DatasetLookup and ForEach
ControlParameters are manually set before execution.Automatically reads and processes from a list.
ScalabilityLimited to pre-defined files or folders.Scales to handle unknown or dynamic file lists.
Configuration ComplexitySimpler, fewer activities needed.More complex, involves multiple activities.
Use CaseStatic, known files/folders.Dynamic, unknown or changing files/folders.
ExampleCopy specific files like emp.csvorders.csv.Copy all files listed in a control file.

vamsi manyam Avatar