A. Convert this data model to a database design. Specify tables, primary keys, a

ID: 3550712 • Letter: A

Question

A. Convert this data model to a database design. Specify tables, primary keys, and foreign keys, using figure 6-41 as a guide, specify column properties.

B. Describe how you have represented weak entities, if any exist.

C. Describe how you have represented supertype and subtype entities, if any exist.

D. Create a visual representation of your database design as a crow's Foot E-R diagram similar to the one in figure 6-37.

E. Document your minimum cardinality enforcement using referential integrity actions for required parents, if any, and the form in figure 6-28(b) for required children, if any.

Explanation / Answer

The purpose of this article is to define the process for converting a logical data model to a physical data model, especially in a warehouse environment.

Before discussing the specific methods for optimizing a data warehouse data model, let us first review the overall process for developing a logical data model and eventually building it as a physical, optimized database. The following procedures can be successfully used to transform a Logical Analytical Data Model into a physical database, such as a data warehouse or a data mart. By following these procedures, one can ensured that:

Overview

Let us start with an overview of the overall process for creating logical and physical data models. Here are the major steps in the transformation of the model:

Analysis Phase - Develop a Logical Data Model

This step is obviously the crucial starting point. There is no substitute for business knowledge. Any team that does not know the business has no business building the logical data model. The key characteristics of this model are that it:

Next we explain these points is slightly more detail.

Information Gathering

The most critical task in the development of the data model is requirements gathering. There are three general ways to approach model expansion:

Top-down expansion essentially starts with a high level model to represent the breadth of data in the DW. This model is rough cut and coarse but represents the major subject areas of data that the DW must support. Top-down expansion starts 3with this rough-cut model and progressively enlarges it by adding data elements to it.

Inside out expansion starts with an initial high level model, which expands by adding details. For example, we can start with Customer and a few attributes, and eventually populate it with many attributes.

Bottom-up expansion proceeds by collection of views or use cases. The data from these is incorporated into the model. A typical view could be as simple as a query. A typical query could be "How much volume did we sell last month (and this year to date) versus a year ago last month (and last year to date)."

It is generally best to start the model in an inside-out fashion to begin with and to expand the data model independent of specific views. This encourages creativity and thinking out of the box. Once this is done, it is then appropriate to identify queries and merge the results of these into the model. A good way is to identify the top 10 queries. (It does not have to be 10; it can be any number - 15 or 20, or even more). These queries can then be used to crosscheck and expand the model.

We have found that when you start modeling by focusing exclusively or heavily on user queries, the result is a model that has a more limited usage and thereby more limited life. A model produced this way satisfies these queries but runs into trouble if the kind of query or the granularity of data in the queries changes.

There are several vehicles to use for information gathering:

User interviews work well to get an overall sense of the business background and to provide detailed feedback. Facilitated sessions are useful for generating a synergistic and creative answer to a problem. These are mostly inside-out (or top-down) methods of development. Examination of current data and queries are bottom-up methods and are good for providing details and confirming the model. Direct observation of business people involved in business processes can provide invaluable insights.

It's important to go beyond external observation. In a DW it is also important to understand the work that people do and how they do it. This often goes beyond queries. For example, say, a marketing department has been producing a set of 35 queries and reports for over a year now. If the model is specifically designed to support those queries, then the model will not work, as the query needs change. We have found this to be true. A typical marketing group will change its reporting and analytical requirements frequently. It is important to consider this wider range of requirements when designing the model. Flexibility is essential.

Scenarios

Queries and reports can be used to verify that the database will support them. From these it is best to identify a set of key user scenarios that can be used to pass against the model. A scenario is a sequence of activities that respond to a major business event. A scenario is actually a test case -- but a business test case. It consists of two parts: the definition of the business case and a script of values. The purpose of scenarios is to test the correctness and completeness of the model. A major business event is something of importance that happens to the business. Usually, but not always, the event is something that is initiated by some agent who is outside of the business area. A scenario should be defined for each major business event. Some people call scenarios Use Cases.

A scenario could be as simple as a sample report from last month. It could be more complex such as: "IBM just made a big announcement. Can the model tell us which of our customers own IBM stock and of them which should we call?" Other typical examples could be:

How to Use Scenarios

The simplest way to use scenarios is to manually walk them through the model. It is easiest to do in a facilitated session. The facilitator takes the model and the scenario. Each activity, data element and sample value in a scenario are compared to the model to ensure that the model is complete and correct. One wants to ensure the data is there and that the model will allow one to navigate properly. Usually, the first 3 or 4 scenarios disrupt the model. This trauma is essential for the health of the model.

Using scenarios and queries with actual sample values validates the data model, ensures that the model satisfies the user needs, guarantees model completeness and simplifies database changes.

It is important to collect whatever volumetric information you can. Volumetric information includes the number of occurrences of data, the frequency of query execution, the ratio between tables, and such things. These factors substantially influence the physical design and should be a part of the process for gathering business requirements.

Finalization of the Logical Model

Two important additions need to be made to the logical model before it is complete:

In the next step, do not neglect to forward engineer the definitions to the physical model.

Design Phase - First Cut Physical Model

Before doing extensive usage analysis, it is possible and even advisable to construct a first-cut physical model. This can be done quickly by applying the following modifications to the data model. These seven steps represent what is considered the first cut physical model:

The process for creating the first-cut physical model should be rapid. The point is to get a neutral model up so that real testing can be done.

Forward Engineer Logical Model to a Physical Database Design

Using these scenarios, the entire data model should be reviewed table by table and column by column to confirm what is really required, what could be eliminated, which data elements should be combined and which should be broken into smaller tables to improve ease of use and reduce query and report response time. For example, it is conceivable to reduce the number of tables from 100 to 60. It is also possible to eliminate unnecessary columns when many of the remaining tables are code tables, used only to look up descriptions. Consequently, the number of actual fact tables and dimensions could significantly be reduced. Also, information gathered during this activity is useful in determining index deployment, load balancing across devices and optimal data clustering. All optimization changes based on discoveries made during this process should be implemented in the physical model. Any business rule changes or additional columns should be incorporated into the logical model and then forward engineered again.

The forward engineering process from a logical model to a first cut physical database can be automated using one of many tools, such as Visible Systems Visible Advantage, CA's Erwin, Sybase's PowerDesigner, Silverrun and others. It is even possible that the logical model be developed using one CASE product and then bussed into another toolset for physical design. For example, doing the logical model in Erwin or Visible Analyst and the physical design in PowerDesigner or Platinum. Different components are more robust or rich in different toolsets. In this case, a product such as the Reichmann Toolbus product can be used to convert the logical entity relationship diagram to physical format. One toolset will minimize data movement errors, save time and cost, and reduce the possibility that the logical and physical models will get out of synch.

At this point, the physical modeling tool is used to forward engineer the logical model into the physical design. Initially, it may be best to take most defaults to automate the first cut physical design creation. An exception to this could be the conversion of entity and attribute to table and column names. It may be that many entity and attribute names exceed length limits imposed by database management systems. In this case, a manual effort will have to be undertaken during forward engineering to assure standard abbreviations were assigned where necessary. The forward engineering process should convert the names according to a physical naming standard. The forward engineering process can be accomplished without such tools, but the level of manual intervention becomes more significant.

Remember to forward engineer all relevant data, including definitions. A data warehouse is useless without these.

Design Data Model Optimization

There are three general ways to optimize a database:

Before the database can be generated a variety of optimizations need to be evaluated for their applicability to the logical model. These optimizations are actually trade-offs. A trade-off is the emphasis of one feature, which is considered to be an advantage, while compromising another, which is considered to be less important. These trade-offs are of two types: safe trade-offs and aggressive trade-offs.

Safe Trade-Offs

Safe trade-offs do not compromise the integrity of the data. They are exemplified by the following model changes:

Aggressive Trade-Offs

Aggressive trade-offs compromise the integrity of the model. They are exemplified by several main modifications:

Some other very aggressive trade-offs are possible, such the use of hybrid data structures, though in practice these are rare. A hybrid structure is a non-relational structure with an internal structure. Exceedingly aggressive trade-offs like hybrid data would require extraordinary justification. With hybrid data, the table contains an imbedded data structure like an array. This structure is usually a variable structure. To an RDBMS this is a long string column. Some data mining applications, such as SAS, might take advantage of such as hybrid structure. Often, these structures are not part of the permanent database but rather retained separately.

Some form of load or data usage analysis should justify aggressive trade-offs. These types of modifications to the data model represent thoughtful changes that require careful study of the data and its relationships.

Technology Choices and Trade-offs

Here is where choices like the use of indices and referential integrity are addressed.

Once the above optimizations are performed, the next activity is a basic clean up of the forward-engineered physical design. Some very substantial choices have to be made at this stage, such as alternate keys and indices. This step also includes defining devices, segments in Sybase (or tablespaces in Oracle or DB2), a database and a server name in the physical design tool. Also, a verification of domain and dimension information (i.e., column size and alphanumeric requirements) should be performed.

Volume Determination and Database Sizing

Volume determination and database sizing is facilitated by determining the best source for data knowledge and getting a best estimate for each table. The numbers can be plugged into a tool, such as Platinum Desktop DBA, that can generate space requirements, or into a spreadsheet designed to perform the same estimations, or manually. It is important to pad your estimates in case actual row counts exceed what was anticipated. An additional 20 percent when sizing each table is reasonable. Determine Hardware Requirements

Disk requirements are determined by volume estimates and system software disk utilization requirements. Aspects such as number of processors and amount of memory are best determined by hardware and database vendors based on information determined in earlier steps combined with detailed user information (i.e. number of users, queries, reports, etc.). Once the hardware requirements are determined, hardware and system software can be ordered, installed and configured. For example, in one situation, an IBM UDB database could be installed on a parallel platform, or a SQL Server database could be deployed on a Sun server. However, the above rules continue to apply and be applicable for most platforms. Generate the Physical Database

After all previous information has been gathered and entered into the physical design tool, a physical database definition (DDL for DB2, schema for Sybase or Oracle) can be generated. This must be reviewed carefully and in all likelihood modified to some degree, since no physical design tool generates 100 percent perfect database definitions. The script can then be run against the database management system to define the physical environment.

Subsequent Steps

While the completion of the above step signifies the end of the logical model to physical database transformation, the task of delivering a production database includes several additional activities that could result in modifications. Issues like load file content, new query requirements, additional data requirements or new user requirements can result in necessitating database modifications. It is recommended that all database changes are implemented through a physical design tool, as this will provide a central authoritative source for the physical database environment and allow the generation of scripts that can recreate the database server if necessary.

Navigate

A. Convert the decimal number 19557 to floating point. Use the format SEEMMMM. A

A. Copyrights protect the intellectual property of artists, photographers, and a

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

A. Convert this data model to a database design. Specify tables, primary keys, a

Question

Explanation / Answer

Related Questions

Navigate