From text book : Patterson and Hennessy: Computer Organization and Design The Ha
ID: 3576963 • Letter: F
Question
From text book : Patterson and Hennessy: Computer Organization and Design The Hardware/software
Interface, 5 th Ed.
6.6 Matrix multiplication plays an important role in a number of applications.
Two matrices can only be multiplied if the number of columns of the fi rst matrix is
equal to the number of rows in the second.
Let’s assume we have an m × n matrix A and we want to multiply it by an n × p
matrix B. We can express their product as an m × p matrix denoted by AB (or A B).
If we assign C = AB, and ci,j denotes the entry in C at position (i, j), then for each
element i and j with 1 i m and 1 j p. Now we want to see if we can parallelize
the computation of C. Assume that matrices are laid out in memory sequentially as
follows: a1,1, a2,1, a3,1, a4,1, …, etc.
6.6.1 [10] <§6.5> Assume that we are going to compute C on both a single core
shared memory machine and a 4-core shared-memory machine. Compute the
speedup we would expect to obtain on the 4-core machine, ignoring any memory
issues.
6.6.2 [10] <§6.5> Repeat Exercise 6.6.1, assuming that updates to C incur a cache
miss due to false sharing when consecutive elements are in a row (i.e., index i) are
updated.
6.6.3 [10] <§6.5> How would you fi x the false sharing issue that can occur?
Explanation / Answer
6.6.1)We would easily get this done
We would get a speed up of 4 times as there are 4 cores available to us
6.6.2)we would get same time as every item is a cache miss
6.6.3)we can fix it using tiling algorithm i.e dividing the array into smaller parts such that they fix exatly into the cache
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.