1. #include 2. void dgemm (int n, double* A, double* B, double* C)

1. #include <x86intrin.h> 2. void dgemm (int n, double* A, double* B, double* C)

ID: 3725877 • Letter: 1

Question

1. #include <x86intrin.h>
2. void dgemm (int n, double* A, double* B, double* C)
3. {
4. for (int i = 0; i < n; i += 4)
5. for (int j = 0; j < n; j++) {
6. __m256 c0 = _mm256_load_pd(C + i + j * n); /*c0 = C[i][j]*/
7. for (int k = 0; k < n ; k++)
8. c0 = _mm256_add_pd (c0 , /*c0 += A[i][k] * B[k][j] */
9. _mm256_mul_pd (_mm256_load_pd(A+i+k*n),
10. _mm256_broadcast_sd(B+k+j*n)));
11. _mm256_store_pd(C+i+j*n, c0); /* C[i][j] = c0 */
12. }
13. }

In line 10, what is the intrinsic function " __m256_broadcast_sd(B+k+j*n)" trying to do?

Explanation / Answer

This is the full line instruction,

c0 = _mm256_add_pd (c0 , _mm256_mul_pd (_mm256_load_pd(A+i+k*n), _mm256_broadcast_sd(B+k+j*n)));

which is ----> c0 += A[i][k] * B[k][j] ,

to do this, load four elements of A using "_mm256_load_pd()", to multiply these element by one oelement of B, we use " _mm256_broadcast_sd()".

This ntrinsic function " __m256_broadcast_sd(B+k+j*n)" , makes four identical copies of double precision number of scalar.

//for any clarification, please do comments

Navigate

1. #include <stdio.h> 2. void main() { 3. FILE *fptr; 4. char name[20]; 5. int a

1. #include \"rims \"h\" state = BL Ledoff; 4. typedef struct task [ default: st

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

1. #include <x86intrin.h> 2. void dgemm (int n, double* A, double* B, double* C)

Question

Explanation / Answer

Related Questions

Navigate