Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. What do you think is the basis for the organizational design of the original

ID: 3809122 • Letter: 1

Question

1. What do you think is the basis for the organizational design of the original traditional transaction processing database?

2. What would be the basis for the data subsets that were derived from the transaction database?

3. What is the difference between a programmed query and an ad hoc query?

4. Why is the ETL (Extract, Transfer, and Load) process which is used to take data from one or more sources (primarily transaction processing systems) and move it to a data warehouse such a critical step in setting up and refreshing a data warehouse?

5. What is the difference between a data warehouse and a data mart?

Explanation / Answer

Answer 1: Transaction processing is an activity in which a transaction is either completed or failed. It cannot be half done or partially complete. Example - If a user is transferring money into someone's account, either the money will be transferred successfully or the transaction will fail and the amount will be refunded to the user. However, if this scenario fails there can be a major loss to the user as well as bank.
So this technology is responsible for maintaining a balance between both the parties involved in transaction.

The basis for the organizational design of the original traditional transaction processing database is ACID property:
1. Atomicity - The change of state in a transaction is atomic. The state of the transaction is defined either before the start or after the completion of transaction.
2. Consistency - The completed transaction should not have an inappropriate effect on the database. i.e. the database should remain in consistent state after every transaction.
3. Durability - The database should not be adversly effected if the database is updated or if the system fails/restarts. It should be durable enough to hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
4. Isolation - When two or more transactions run simultaneously in a database, then the property of isolation comes into picture. It states that no transaction will effect any other transaction in the system.

Answer 2: The data subsets derieved from the transaction database is based on the performance, reliability and consistency of the real time processing. The following 4 properties show the process of transactions -

1. Atomicity - The change of state in a transaction is atomic. The state of the transaction is defined either before the start or after the completion of transaction.
2. Consistency - The completed transaction should not have an inappropriate effect on the database. i.e. the database should remain in consistent state after every transaction.
3. Durability - The database should not be adversly effected if the database is updated or if the system fails/restarts. It should be durable enough to hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
4. Isolation - When two or more transactions run simultaneously in a database, then the property of isolation comes into picture. It states that no transaction will effect any other transaction in the system.

Answer 3: Programmed query - It is a type of query programmed to retrieve information from the database by performing various calculations. Query language is the interface by which such manipulations are specified. These queries are a set of commands that support database searching and gives the result to the user in a tabular form.
Ad Hoc query - It is a type of query which is not determined prior to the moment the query is issued. It is created when a need arises to get information. It is basically designed to answer very specific problems. An ad hoc query does not reside in the computer or the database manager but is dynamically created depending on the needs of the data user.

Answer 4: The process of ETL (Extract, Transfer, and Load) which is used to take data from one or more sources (primarily transaction processing systems) and move it to a data warehouse is such a critical step in setting up and refreshing a data warehouse because it involves number of challenges to be faced in real time system. Almost all ETL tools and systems, based on both off-the-shelf products or custom-coded, operate in a batch mode assuming that the data becomes available on a definite schedule be it weekly or monthly as an extract file. The the data is transformed and cleansing of data is done and finally the data is loaded into the data warehouse.
This whole process involves a downtime of the data warehouse so that none of the users is able to access it while the load takes place. This is not really inconvinient to users as the loads are usually performed at night time.
When loading data continuously in real-time, there can't be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data. The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems. Fortunately, there are new tools on the market that specialize in real-time ETL and data loading. There are also ways of modifying existing ETL systems to perform real-time or near real-time warehouse loading.

Answer 5: Data Warehouse - It is a main component of business intelligence which is used for reporting and data analysis. Data warehouse is basically a central repository of integrated data from one or more disparate sources. The content and all the historical data is stored in one place and are used for creating analytical reports for knowledge workers throughout the enterprise.
Data Mart - It is basically the subset of data warehouse i.e. the access point of data warehouse to the users. It contains the information related to single department. This alows each department to isolate, manipulate and develop their own data.

Basic points of difference between data warehouse and data mart are:
Data Warehouse -
1. Holds multiple subject areas
2. Holds very detailed information
3. Works to integrate all data sources
4. Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart -
1. Often holds only one subject area- for example, Finance, or Sales
2. May hold more summarised data (although many hold full detail)
3. Concentrates on integrating information from a given subject area or set of source systems
4. Is built focused on a dimensional model using a star schema.