Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

For our discussion, we will cover ETL jobs. This discussion is intended to inclu

ID: 3920884 • Letter: F

Question

For our discussion, we will cover ETL jobs. This discussion is intended to include practical scenarios that are faced in the workplace together with some general guidelines that make scheduling and completing ETL jobs more effective. In your discussion, be sure to address the following points:

• Get organized with ETL job scheduling.

How should priority be given when multiple jobs are scheduled during the same timeframe?

• How should ETL job output be validated?

• How can ETL jobs be optimized for more efficient processing?

Explanation / Answer

Hi,

Please find the answers below:

How should priority be given when multiple jobs are scheduled during the same timeframe?

ETL jobs are executed in nested batches. A batch is a group of ETL jobs that run together as a single

operation. ETL jobs are grouped together and batched to load a single

data mart. This technique of loading batches is known as

nested batching. You should define rules of dependencies between ETL jobs and enforce them during job execution.

Your batch management tool should be able to set and resolve such dependencies on a batch-by-batch

basis as per the business rules.There are many ways to schedule ETL jobs via vendor scheduler tools, custom scripts via cronjobs, OS schedulers etc.

How should ETL job output be validated?

There are two simple steps to validate the output of an ETL job.

Confirm whether an ETL job has run successfully or not?

This step usually depends on how you run and schedule your ETL jobs. Looking at the vendor specific tools, scripts, logs for job success or failure.

validate of a job execution by verifying the data?

The usual data flow is: Extract, Clean, Conform and Deliver. ETL test and automation usually conforms the data quality and its verification. most of the time its like comparing the source and target databases for the correctness of the data as per the business rules in ETL and data quality policies.

How can ETL jobs be optimized for more efficient processing?

There are several ways you can optimize the ETL job processing.

Optimizing the ETL processes are core for achieving efficient processing. Some of the points are:

Using filters: Extract only what you need from source files.

Process only what you need.

Choosing optimal sort algorithm incase you need sorted records to be fed.

Using database Bulk loader utilities to increase speedy inserts of data.( Look at database specific loader utility)

Parallel processing various activities.

Database performance tuning ( Reduce I/O), drop unwanted constraints etc.