Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write C console programs of the “axpy” algorithm for integer data types. Instrum

ID: 3588300 • Letter: W

Question

Write C console programs of the “axpy” algorithm for integer data types. Instrument and monitor and measure execution time as a function of problem size. For each of the 2 data types:

Write it as a straight forward, single processor, sequential execution program, which you can scale the size of the axpy computation for N = 1, 5, 10 , 50, 100, 500, 1000, 5000, 10000, 50000, 100000, 500000 and 1,000,000 vector elements . Use a random number generator to fill the vectors. Set the vector dimension using a DEFINE statement then recompile & execution each vector length.

Your program should print out the vector length, and wall clock execution time in the console window, and computation time per axpy vector for each vector length used in your program code, and the number of accuracy improvement loops you run the daxpy computation to improve accuracy.

Your console program should print out using formatted printf commands

Vector length(number of vector elements)

The number of accuracy improvement loops you run the daxpy computation to improve accuracy

Total computation time

Computation time per axpy vector

Computation time per vector element

Number of machine cycles per arithmetic operation

Explanation / Answer

AXPY algorithm means to multiply a scalar value to a vector X and then add it to vector Y

Y = a*X + Y

c code for axpy algorithm is

#include <stdio.h>
#include <stdlib.h> // for random number generator rand()
#include <time.h> // to calculate time works on all platform
#define DIM 1000000

static __inline__ unsigned long long rdtsc(void) // inline function to calculate the CPU clock cycles
{
unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); //executes x86 instruction, it has opcode of 0x0f, 0x31 (0F31). A processor always keeps track of cpu clock cycles and the function RDTSC reads that value
     return x;
}

void fillIntVector(int *a)
{
    int i=0;
    for(i=0;i<DIM;i++)
        a[i]=(rand()%100)+1; // generates a random number variable in the range of 1 to 100. You can change it by changing the number in modulo operation
}

void performAXPYOperationInt(int *x,int *y, int a)
{
    int i=0;
    for(i=0;i<DIM;i++)
        y[i]=a*x[i] + y[i]; // performs AXPY operation
}

void performAXPYOperationIntImproved(int *x,int *y, int a)
{
    int i=0;
    for(i=0;i<DIM;)
    {
        y[i]=a*x[i] + y[i];
        y[i+1]=a*x[i+1] + y[i+1];
        y[i+2]=a*x[i+2] + y[i+2];
        i=i+3; // mentioned the logic below
    }
}

void fillDoubleVector(double *a)
{
    int i=0;
    for(i=0;i<DIM;i++)
    {
        double d=RAND_MAX / 100;
        a[i]= 1.0 + (rand() / d); // generates the random number in double in the range of 1.0 to 100.0
    }
}

void performAXPYOperationDouble(double *x, double *y, int a)
{
    int i=0;
    for(i=0;i<DIM;i++)
        y[i]=a*x[i]+y[i]; // the AXPY operation for double
}

int main()
{
    int a=2;
    clock_t t; // clock object
    int x[DIM]={0};
    int y[DIM]={0};
    double xd[DIM]={0};
    double yd[DIM]={0};
    fillIntVector(x);
    fillIntVector(y);
    t=clock(); // clock starts (wall clock)
    unsigned long long cycles = rdtsc(); //current value of CPU cycle
    performAXPYOperationInt(x,y,a);
    cycles = rdtsc() - cycles; // current value of CPU cycles - previous cycles. So we got total cycles needed to compute AXPY operation.
    t=clock()-t; // clock ends
    double time_taken_int = ((double)t)/CLOCKS_PER_SEC; // wall clock time
  
    unsigned long long cyclesi=rdtsc(); // cycle starts
    performAXPYOperationIntImproved(x,y,a);
    cyclesi = rdtsc() - cyclesi; // cycle ends
  
  
    t=clock(); // clock starts
    performAXPYOperationDouble(xd,yd,a);
    t=clock()-t; // clock ends
    double time_taken_double = ((double)t)/CLOCKS_PER_SEC; // total time in seconds
  
  
    printf("Vector length : %d , execution time (wall clock) for int : %f , number of CPU cycles : %d , After improvement CPU cycles : %d, saved cycles : %d, execution time(wall clock) for double : %f",DIM,time_taken_int, (unsigned)cycles,(unsigned)cyclesi,(unsigned)(cycles-cyclesi), time_taken_double);

    return 0;
}

Here I have done everything for int. You can do same for double.

Accuracy Improvement in AXPY algorithm

For that we will combine c programming with MIPS.

The main loop to perform AXPY operation is

for(i=0;i<DIM;i++)
        y[i] +=a*x[i];

Convert this code to assembly language

When you draw a timing diagram for this assembly code you will find 3 stalls (CPU cycles gets free because of dependency). So unroll the loop 3 times

for eg

for(i=0;i<DIM;)
    {
        y[i]=a*x[i] + y[i];
        y[i+1]=a*x[i+1] + y[i+1];
        y[i+2]=a*x[i+2] + y[i+2];
        i=i+3;
    }

When you convert this code to assembly language and then calculate the timing diagram, then there will be no stall. So to improve the accuracy unroll the loop for 3 times

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote