Questions are based on Chapter 8 and 10 1. 15 points] Why do you have to write A

ID: 3754151 • Letter: Q

Question

Questions are based on Chapter 8 and 10 1. 15 points] Why do you have to write Assembly functions inside a C program? [10 points] How do you import assembly function in your C program? Provide an example. [10 points] How do import a C function in your assembly code. Provide an example a. b. C. 2. 120 points) What are static and volatile variables? When do you use static or volatile variable? 15 points] Write a simple if-then-else, for loop and while loop example in C code and write an equivalent assembly code. 120 points] What are the advantages of alignment in the memory? Why does it take more time to access misaligned memories? 4. 140 points] Write a C program with a subroutine that called MoviePassMember that checks if erson a member, then he/she must pay variable price based on age. Write another subroutine 5. the person is a member or not. If he/she is a member, the price is $4 for all ages. If the p called MoviePrice that calculates the movie ticket price based on the input argument called age. If the age is 12 or under, the price is $6. If the age is between 13 and 64, the price is $8. If the age is 65 or above price is $7. Implement MoviePassMember subroutine in Assembly MoviePrice subroutine in C.

Explanation / Answer

1. Assembly langauage is low-level language (closer to machine language) whereas C is a high-level language. The more the language is to machine, faster is its execution.

#include<stdio.h>

void main() {

int a = 3, b = 3, c;

asm {

mov ax,a

mov bx,a

add ax,bx

mov c,ax

}

printf("%d",c);

}

volatile keyword basically instructs compiler to avoid compile time optimisation. For example

So this above code will print 100 to 1 recursively. Now in this case complier does internal optimisation on variable i in the sense value is stored in some cache. (Something like this not sure). So sometimes we dont want this to happen. In case of data comming from io device we want compiler to take new value from io port and not from cache. So we can make that variable volatile. So compiler understands that it should not do internal optimisation.

Static keywords basically states that this variable has only one copy through out the program. We know if we pass any variable to function then a new copy is generated. In this static case. The same variable is passed and hence any change done on that variable in function will reflect throughout. Let me write an example.

Non static exaple.

In this case value printed will be 1 1 1 and not 1 2 3. Because the variable i is a new copy every time the Inc Function is called.

Case 2: static variable

here the output will be 1,2,3 as we will be operating on same copy of variable i. One thing to note here is static variable are only initialized once.

if-else

=======================

C code

Assembly:

for loop:

===================

C code

Assembly

xor cx,cx ; cx-register is the counter, set to 0

while loop

================

C code

Assembly

Memory alignment and arrangement of your data can make a big difference in performance, not just a few percent but few to many hundreds of a percent.

Take this loop, two instructions matter if you run enough loops.

With and without cache, and with alignment with and without cache toss in branch prediction and you can vary those two instructions performance by a significant amount (timer ticks):

A performance test you can very easily do yourself. add or remove nops around the code under test and do an accurate job of timing, move the instructions under test along a wide enough range of addresses to touch the edges of cache lines, etc.

Same kind of thing with data accesses. Some architectures complain about unaligned accesses (performing a 32 bit read at address 0x1001 for example), by giving you a data fault. Some of those you can disable the fault and take the performance hit. Others that allow unaligned accesses you just get the performance hit.

It is sometimes "instructions" but most of the time it is clock/bus cycles.

Look at the memcpy implementations in gcc for various targets. Say you are copying a structure that is 0x43 bytes, you may find an implementation that copies one byte leaving 0x42 then copies 0x40 bytes in large efficient chunks then the last 0x2 it may do as two individual bytes or as a 16 bit transfer. Alignment and target come into play if source and destination addresses are on the same alignment say 0x1003 and 0x2003, then you could do the one byte, then 0x40 in big chunks then 0x2, but if one is 0x1002 and the other 0x1003, then it gets real ugly and real slow.

Most of the time it is bus cycles. Or worse the number of transfers. Take a processor with a 64 bit wide data bus, like ARM, and do a four word transfer (read or write, LDM or STM) at address 0x1004, that is a word aligned address, and perfectly legal, but if the bus is 64 bits wide it is likely that the single instruction will turn into three transfers in this case a 32 bit at 0x1004, a 64 bit at 0x1008 and a 32 bit at 0x100A. But if you had the same instruction but at address 0x1008 it could do a single four word transfer at address 0x1008. Each transfer has a setup time associated. So the 0x1004 to 0x1008 address difference by itself can be several times faster, even/esp when using a cache and all are cache hits.

Speaking of, even if you do a two word read at address 0x1000 vs 0x0FFC, the 0x0FFC with cache misses is going to cause two cache line reads where 0x1000 is one cache line, you have the penalty of a cache line read anyway for a random access (reading more data than using) but then that doubles. How your structures are aligned or your data in general and your frequency of accessing that data, etc, can cause cache thrashing.

You can end up striping your data such that as you process the data you can create evictions, you could get real unlucky and end up using only a fraction of your cache and as you jump through it the next blob of data collides with a prior blob. By mixing up your data or re-arranging functions in the source code, etc you can create or remove collisions, since not all caches are created equal the compiler isnt going to help you here it is on you. Even detecting the performance hit or improvement is on you.

All the things we have added to improve performance, wider data busses, pipelines, caches, branch prediction, multiple execution units/paths, etc. Will most often help, but they all have weak spots, that can be exploited either intentionally or accidentally. There is very little the compiler or libraries can do about it, if you are interested in performance you need to tune and one of the biggest tuning factors is alignment of the code and the data, not just aligned on 32, 64, 128, 256 bit boundaries, but also where things are relative to each other, you want heavily used loops or re-used data to not land in the same cache way, they each want their own. Compilers can help for example ordering of instructions for a super scalar architecture, re-arranging instructions that relative to each other dont matter, can make a big performance gain or hit if you are not efficiently using the execution paths, but you have to tell the compiler what you are running on.

The biggest oversight is the assumption that the processor is the bottleneck. Has not been true for a decade or more, feeding the processor is the problem and that is where issues like alignment performance hits, cache thrashing, etc come into play. With a little work even at the source code level, re-arranging data in a structure, ordering of variable/struct declarations, ordering of functions within the source code, and a little extra code to align data, can improve performance several times over or more.

NOTE: As per Chegg policy, I am allowed to answer only 4 questions on a single post. Kindly post the remaining questions separately and I will try to answer them. Sorry for the inconvenience caused.

Navigate

Questions are around the topic of Impairment Testing and historical cost concept

Questions are based on JAVA: 1. Which of the following statements is true concer

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Questions are based on Chapter 8 and 10 1. 15 points] Why do you have to write A

Question

Explanation / Answer

Related Questions

Navigate