Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

MAE 384: Numerical Methods *Note* Problem must be solved handwritten. Please sho

ID: 3884768 • Letter: M

Question

MAE 384: Numerical Methods

*Note* Problem must be solved handwritten. Please show all steps.

Problem 1 (18 points) (Core Course Outcome 2)

Suppose a new standard MAE384-12 is defined to store floating point numbers using 12 bits. The first bit is used to store the sign as in IEEE-754, the next 7 bits store the exponent plus the appropriate bias, and the remaining 4 bits store the mantissa. As in IEEE-754, an exponent and mantissa of all zero bits is used to indicate true zero, and an exponent of all one bits together with a mantissa of all zero bits is used to store +/- infinity. Determine by hand in decimal format

a) the bias for same number of positive and negative exponents;
b) the smallest possible non-zero positive number;
c) the largest possible finite positive number;
d) the smallest possible difference between the number 1.0×100 and the next larger possible number. e) the smallest possible difference between the number 5.0×100 and the next larger possible number.

Explanation / Answer

a) here 7 bits are used for exponent. So total of 128(0 to 127) exponent can be represented if all exponents are positive. But we may want to represent negative exponent too. So we store biased exponent inside memory.

Here for 7 bits biased exponent = 2^(7-1) - 1 = 63

Hence add 63 to the real exponent value and store it in memory.

b)

For smallest positive value, take exponent = 000 0001 and mantissa = 0000, sign = 0

hence the number formed will be 2^(1-63) * 1.0000 = 2^(-62)

2^(-62) = log10(2^-62) = -62 * log10 2 = -18.66

hence 2^(-62) = 10^(-18)

10^(-18) is the smallest positive number represented by given format.

c)

for largest positive number take exponent = 111 1111 and mantissa = 1111, sign = 0

hence the number = 2^(127-63) * 1.1111 = 2^(64) * 1.1111 in binary.

here binary of 1.1111 = 1+ 1/2 +1/4+1/8+1/16 = 31/16 = 1.9375

2^64 = log10 2^64 = 64 * log102 = 19.26

hence largest possible finite positive number = 10^(19) * 1.9375

d)

note : considering 1.0 x 100 as decimal number

here 1.0 x 100 = 100 = 110 0100 in binary = 1.100100 x 2^6

here exponent = 6, biased exponent = 6+63 = 69

mantissa = 100100 but only 4 bits can be stored so mantissa = 1001

so the next largest value is with same biased exponent 69 and mantissa = 1010

representing next number in decimal :

binary number = 1.1010 * 2^6 = 1101000 = 104 in decimal.

hence smallest possible difference = 100 - 104 = 4.

e)

note : considering 5.0 x 100 as decimal number

here 1.0 x 100 = 500 =1111 10100 in binary = 1.1111 0100 * 2^8

here exponent = 8, biased exponent = 8+63 = 71

mantissa = 1111 0100 but only 4 bits can be stored so mantissa = 1111

so the next value will have mantissa = 0000 and exponent increased to 1 , hence new exponent = 72

representing next number in decimal :

binary number = 1.0000 * 2^9 = 10000 00000 = 512 in decimal.

hence smallest possible difference = 512 - 500 = 12

EDIT :

e) 5.0x10^0 = 5:

Binary of 5 = 101 = 1.01 x 2^2

so mantissa here = 0100

exponent = 2

biased exponent = 2+63 = 65

for next largest possible number, add 1 to mantissa :

hence mantissa for next largest number = 0101 , exponent = 2

hence next larggest number(in binary) = 1.0101 x 2^2 = 101.01

decimal of 101.01 =>

101(in binary) = 5 (in decimal)

.01 (in binary) = 1/4(in decimal) = 0.25 (in decimal)

hence next largest number = 5.25

if you have any doubts then you can ask in comment section.