Q . Regardless of the general acceptance of dimensionalmodeling, some mispercept
ID: 3610547 • Letter: Q
Question
Q . Regardless of the general acceptance of dimensionalmodeling, some misperceptionscontinue to be disseminated in the industry. Explain followingmisperception briefly:
Marks
I. Dimensional models and data marts are for summary dataonly
II. Dimensional models and data marts are departmental, notenterprise, solutions.
III. Dimensional models and data marts are not scalable.
IV. Dimensional models and data marts are only appropriate whenthere is a
predictable usage pattern.
V. Dimensional models and data marts can’t be integrated andtherefore lead to
stovepipe solutions.
Explanation / Answer
Dimensional models and data marts are forsummary data only. This myth is the root cause of manyill-designed dimensional models. Because we can't possibly predictall the questions asked by business users, we need to provide themwith queryable access to the most detailed data so that they canroll it up based on the business question at hand. Data at thelowest level of detail is practically impervious to surprises orchanges. Our data marts also will include commonly requestedsummarized data in dimensional schemas. This summary data shouldcomplement the granular detail solely to provide improvedperformance for common queries, but not attempt to serve as areplacement for the details.
A related corollary to this myth is that only alimited amount of historical data should be stored in dimensionalstructures. There is nothing about a dimensional model thatprohibits the storage of substantial history. The amount of historyavailable in data marts must be driven by the business'srequirements.
Dimensional models and data marts aredepartmental, not enterprise, solutions. Rather than drawingboundaries based on organizational departments, we maintain thatdata marts should be organized around business processes, such asorders, invoices, and service calls. Multiple business functionsoften want to analyze the same metrics resulting from a singlebusiness process. We strive to avoid duplicating the coremeasurements in multiple databases around the organization.
Supporters of the normalized data warehouseapproach sometimes draw spiderweb diagrams with multiple extractsfrom the same source feeding into multiple data marts. Theillustration supposedly depicts the perils of proceeding without anormalized data warehouse to feed the data marts. These supporterscaution about increased costs and potential inconsistencies aschanges in the source system of record would need to be rippled toeach mart's ETL process.
This argument falls apart because no oneadvocates multiple extracts from the same source. The spiderwebdiagrams fail to appreciate that the data marts areprocess-centric, not department-centric, and that the data isextracted once from the operational source and presented in asingle place. Clearly, the operational system support folks wouldfrown on the multiple-extract approach. So do we.
Dimensional models and data marts are notscalable. Modern fact tables have many billions of rows inthem. The dimensional models within our data marts are extremelyscalable. Relational DBMS vendors have embraced data warehousingand incorporated numerous capabilities into their products tooptimize the scalability and performance of dimensionalmodels.
A corollary to this myth is that dimensionalmodels are only appropriate for retail or sales data. This notionis rooted in the historical origins of dimensional modeling but notin its current-day reality. Dimensional modeling has been appliedto virtually every industry, including banking, insurance,brokerage, telephone, newspaper, oil and gas, government,manufacturing, travel, gaming, health care, education, and manymore. In this book we use the retail industry to illustrate severalearly concepts mainly because it is an industry to which we haveall been exposed; however, these concepts are extremelytransferable to other businesses.
Dimensional models and data marts are onlyappropriate when there is a predictable usage pattern. Arelated corollary is that dimensional models aren't responsive tochanging business needs. On the contrary, because of theirsymmetry, the dimensional structures in our data marts areextremely flexible and adaptive to change. The secret to queryflexibility is building the fact tables at the most granular level.In our opinion, the source of this myth is the designer strugglingwith fact tables that have been prematurely aggregated based on thedesigner's unfortunate belief in myth 1 regarding summary data.Dimensional models that only deliver summary data are bound to beproblematic. Users run into analytic brick walls when they try todrill down into details not available in the summary tables.Developers also run into brick walls because they can't easilyaccommodate new dimensions, attributes, or facts with theseprematurely summarized tables. The correct starting point for yourdimensional models is to express data at the lowest detail possiblefor maximum flexibility and extensibility.
Dimensional models and data marts can't beintegrated and therefore lead to stovepipe solutions.Dimensional models and data marts most certainly can be integratedif they conform to the data warehouse bus architecture.Presentation area databases that don't adhere to the data warehousebus architecture will lead to standalone solutions. You can't holddimensional modeling responsible for the failure of someorganizations to embrace one of its fundamental tenets.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.