1. What is Azure Blob Storage?
Azure Blob Storage is a scalable cloud storage service designed to store unstructured data such as text, binary data, images, and files. It is commonly used for data storage, backup, and archiving.
2. What is Azure Data Lake Storage (ADLS)?
Azure Data Lake Storage (ADLS) Gen2 is a data storage service optimized for big data analytics. It provides hierarchical namespace capabilities on top of Azure Blob Storage, making it suitable for large-scale data processing and management.
3. What is the Difference Between Blob Storage & ADLS?
- Blob Storage:
- Designed for storing and managing unstructured data.
- Does not support hierarchical file organization.
- Ideal for general-purpose storage like file shares, backups, and media.
- ADLS:
- Built for big data analytics and hierarchical file organization.
- Supports hierarchical namespaces, enabling directory and file-level operations similar to a file system.
- Suitable for scenarios requiring complex data organization and large-scale analytics.
4. What is Azure SQL Database?
Azure SQL Database is a fully managed relational database service built on Microsoft SQL Server technology. It provides high availability, scalability, and security, making it suitable for hosting a wide variety of applications, including web, mobile, and enterprise solutions.
5. What is Azure Data Factory (ADF)?
Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines. It enables the movement and transformation of data across different data stores and processing systems.
6. What is Copy Activity in ADF?
Copy Activity in Azure Data Factory is used to copy data from a source to a destination. It is the primary activity for moving data between various data stores, such as from Blob Storage to ADLS, SQL databases, and more.
7. What are the Different Components of ADF Copy Activity?
- Linked Services:
- Define connections to data sources and destinations.
- Example: Linked service to connect to Azure Blob Storage (source) and ADLS Gen2 (sink).
- Datasets:
- Represent the structure of the data in the source or sink, including file paths, table names, or query parameters.
- Example: Source dataset points to
employees.csv
in Blob Storage, and sink dataset points to theoutput
folder in ADLS.
- Pipeline:
- A logical grouping of activities that define a data workflow.
- Example: A pipeline containing a Copy Data activity to move data from Blob Storage to ADLS.
- Copy Activity:
- The core activity within a pipeline that performs the data movement.
- Example: The Copy Data activity reads
employees.csv
from Blob Storage and writes it to ADLS.