MKP-logo-white-transparent Title 4th-edition Chapter 3 Arithmetic for Computers MKP-logo Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers nOperations on integers nAddition and subtraction nMultiplication and division nDealing with overflow nFloating-point real numbers nRepresentation and operations MKP-logo Chapter 3 — Arithmetic for Computers — 3 f03-01-P374493 Integer Addition nExample: 7 + 6 nOverflow if result out of range nAdding +ve and –ve operands, no overflow nAdding two +ve operands nOverflow if result sign is 1 nAdding two –ve operands nOverflow if result sign is 0 MKP-logo Chapter 3 — Arithmetic for Computers — 4 Integer Subtraction nAdd negation of second operand nExample: 7 – 6 = 7 + (–6) n +7: 0000 0000 … 0000 0111 –6: 1111 1111 … 1111 1010 +1: 0000 0000 … 0000 0001 nOverflow if result out of range nSubtracting two +ve or two –ve operands, no overflow nSubtracting +ve from –ve operand nOverflow if result sign is 0 nSubtracting –ve from +ve operand nOverflow if result sign is 1 MKP-logo Chapter 3 — Arithmetic for Computers — 5 Dealing with Overflow nSome languages (e.g., C) ignore overflow nUse MIPS addu, addui, subu instructions nOther languages (e.g., Ada, Fortran) require raising an exception nUse MIPS add, addi, sub instructions nOn overflow, invoke exception handler nSave PC in exception program counter (EPC) register nJump to predefined handler address nmfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action MKP-logo Chapter 3 — Arithmetic for Computers — 6 Arithmetic for Multimedia nGraphics and media processing operates on vectors of 8-bit and 16-bit data nUse 64-bit adder, with partitioned carry chain nOperate on 8×8-bit, 4×16-bit, or 2×32-bit vectors nSIMD (single-instruction, multiple-data) nSaturating operations nOn overflow, result is largest representable value nc.f. 2s-complement modulo arithmetic nE.g., clipping in audio, saturation in video MKP-logo Chapter 3 — Arithmetic for Computers — 7 Multiplication nStart with long-multiplication approach 1000 × 1001 1000 0000 0000 1000 1001000 Length of product is the sum of operand lengths multiplicand multiplier product f03-04-P374493 MKP-logo Chapter 3 — Arithmetic for Computers — 8 f03-05-P374493 Multiplication Hardware Initially 0 f03-04-P374493 MKP-logo Chapter 3 — Arithmetic for Computers — 9 f03-06-P374493 Optimized Multiplier nPerform steps in parallel: add/shift nOne cycle per partial-product addition nThat’s ok, if frequency of multiplications is low MKP-logo Chapter 3 — Arithmetic for Computers — 10 Faster Multiplier nUses multiple adders nCost/performance tradeoff f03-08-P374493 nCan be pipelined nSeveral multiplication performed in parallel MKP-logo Chapter 3 — Arithmetic for Computers — 11 MIPS Multiplication nTwo 32-bit registers for product nHI: most-significant 32 bits nLO: least-significant 32-bits nInstructions nmult rs, rt / multu rs, rt n64-bit product in HI/LO nmfhi rd / mflo rd nMove from HI/LO to rd nCan test HI value to see if product overflows 32 bits nmul rd, rs, rt nLeast-significant 32 bits of product –> rd MKP-logo Chapter 3 — Arithmetic for Computers — 12 Division nCheck for 0 divisor nLong division approach nIf divisor ≤ dividend bits n1 bit in quotient, subtract nOtherwise n0 bit in quotient, bring down next dividend bit nRestoring division nDo the subtract, and if remainder goes < 0, add divisor back nSigned division nDivide using absolute values nAdjust sign of quotient and remainder as required 1001 1000 1001010 -1000 10 101 1010 -1000 10 n-bit operands yield n-bit quotient and remainder quotient dividend remainder divisor MKP-logo Chapter 3 — Arithmetic for Computers — 13 f03-10-P374493 Division Hardware Initially dividend Initially divisor in left half f03-09-P374493 MKP-logo Chapter 3 — Arithmetic for Computers — 14 f03-12-P374493 Optimized Divider nOne cycle per partial-remainder subtraction nLooks a lot like a multiplier! nSame hardware can be used for both MKP-logo Chapter 3 — Arithmetic for Computers — 15 Faster Division nCan’t use parallel hardware as in multiplier nSubtraction is conditional on sign of remainder nFaster dividers (e.g. SRT devision) generate multiple quotient bits per step nStill require multiple steps MKP-logo Chapter 3 — Arithmetic for Computers — 16 MIPS Division nUse HI/LO registers for result nHI: 32-bit remainder nLO: 32-bit quotient nInstructions ndiv rs, rt / divu rs, rt nNo overflow or divide-by-0 checking nSoftware must perform checks if required nUse mfhi, mflo to access result MKP-logo Chapter 3 — Arithmetic for Computers — 17 Floating Point nRepresentation for non-integral numbers nIncluding very small and very large numbers nLike scientific notation n–2.34 × 1056 n+0.002 × 10–4 n+987.02 × 109 nIn binary n±1.xxxxxxx2 × 2yyyy nTypes float and double in C normalized not normalized MKP-logo Chapter 3 — Arithmetic for Computers — 18 Floating Point Standard nDefined by IEEE Std 754-1985 nDeveloped in response to divergence of representations nPortability issues for scientific code nNow almost universally adopted nTwo representations nSingle precision (32-bit) nDouble precision (64-bit) MKP-logo Chapter 3 — Arithmetic for Computers — 19 IEEE Floating-Point Format nS: sign bit (0 Þ non-negative, 1 Þ negative) nNormalize significand: 1.0 ≤ |significand| < 2.0 nAlways has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) nSignificand is Fraction with the “1.” restored nExponent: excess representation: actual exponent + Bias nEnsures exponent is unsigned nSingle: Bias = 127; Double: Bias = 1203 S Exponent Fraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits MKP-logo Chapter 3 — Arithmetic for Computers — 20 Single-Precision Range nExponents 00000000 and 11111111 reserved nSmallest value nExponent: 00000001 Þ actual exponent = 1 – 127 = –126 nFraction: 000…00 Þ significand = 1.0 n±1.0 × 2–126 ≈ ±1.2 × 10–38 nLargest value nexponent: 11111110 Þ actual exponent = 254 – 127 = +127 nFraction: 111…11 Þ significand ≈ 2.0 n±2.0 × 2+127 ≈ ±3.4 × 10+38 MKP-logo Chapter 3 — Arithmetic for Computers — 21 Double-Precision Range nExponents 0000…00 and 1111…11 reserved nSmallest value nExponent: 00000000001 Þ actual exponent = 1 – 1023 = –1022 nFraction: 000…00 Þ significand = 1.0 n±1.0 × 2–1022 ≈ ±2.2 × 10–308 nLargest value nExponent: 11111111110 Þ actual exponent = 2046 – 1023 = +1023 nFraction: 111…11 Þ significand ≈ 2.0 n±2.0 × 2+1023 ≈ ±1.8 × 10+308 MKP-logo Chapter 3 — Arithmetic for Computers — 22 Floating-Point Precision nRelative precision nall fraction bits are significant nSingle: approx 2–23 nEquivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision nDouble: approx 2–52 nEquivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision MKP-logo Chapter 3 — Arithmetic for Computers — 23 Floating-Point Example nRepresent –0.75 n–0.75 = (–1)1 × 1.12 × 2–1 nS = 1 nFraction = 1000…002 nExponent = –1 + Bias nSingle: –1 + 127 = 126 = 011111102 nDouble: –1 + 1023 = 1022 = 011111111102 nSingle: 1011111101000…00 nDouble: 1011111111101000…00 MKP-logo Chapter 3 — Arithmetic for Computers — 24 Floating-Point Example nWhat number is represented by the single-precision float n 11000000101000…00 nS = 1 nFraction = 01000…002 nFxponent = 100000012 = 129 nx = (–1)1 × (1 + 012) × 2(129 – 127) n = (–1) × 1.25 × 22 n = –5.0 MKP-logo Chapter 3 — Arithmetic for Computers — 25 Floating-Point Addition nConsider a 4-digit decimal example n9.999 × 101 + 1.610 × 10–1 n1. Align decimal points nShift number with smaller exponent n9.999 × 101 + 0.016 × 101 n2. Add significands n9.999 × 101 + 0.016 × 101 = 10.015 × 101 n3. Normalize result & check for over/underflow n1.0015 × 102 n4. Round and renormalize if necessary n1.002 × 102 MKP-logo Chapter 3 — Arithmetic for Computers — 26 Floating-Point Addition nNow consider a 4-digit binary example n1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375) n1. Align binary points nShift number with smaller exponent n1.0002 × 2–1 + –0.1112 × 2–1 n2. Add significands n1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1 n3. Normalize result & check for over/underflow n1.0002 × 2–4, with no over/underflow n4. Round and renormalize if necessary n1.0002 × 2–4 (no change) = 0.0625 MKP-logo Chapter 3 — Arithmetic for Computers — 27 FP Adder Hardware nMuch more complex than integer adder nDoing it in one clock cycle would take too long nMuch longer than integer operations nSlower clock would penalize all instructions nFP adder usually takes several cycles nCan be pipelined MKP-logo Chapter 3 — Arithmetic for Computers — 28 f03-16-P374493 FP Adder Hardware Step 1 Step 2 Step 3 Step 4 MKP-logo Chapter 3 — Arithmetic for Computers — 29 Floating-Point Multiplication nConsider a 4-digit decimal example n1.110 × 1010 × 9.200 × 10–5 n1. Add exponents nFor biased exponents, subtract bias from sum nNew exponent = 10 + –5 = 5 n2. Multiply significands n1.110 × 9.200 = 10.212 Þ 10.212 × 105 n3. Normalize result & check for over/underflow n1.0212 × 106 n4. Round and renormalize if necessary n1.021 × 106 n5. Determine sign of result from signs of operands n+1.021 × 106 MKP-logo Chapter 3 — Arithmetic for Computers — 30 Floating-Point Multiplication nNow consider a 4-digit binary example n1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375) n1. Add exponents nUnbiased: –1 + –2 = –3 nBiased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127 n2. Multiply significands n1.0002 × 1.1102 = 1.1102 Þ 1.1102 × 2–3 n3. Normalize result & check for over/underflow n1.1102 × 2–3 (no change) with no over/underflow n4. Round and renormalize if necessary n1.1102 × 2–3 (no change) n5. Determine sign: +ve × –ve Þ –ve n–1.1102 × 2–3 = –0.21875 MKP-logo Chapter 3 — Arithmetic for Computers — 31 FP Arithmetic Hardware nFP multiplier is of similar complexity to FP adder nBut uses a multiplier for significands instead of an adder nFP arithmetic hardware usually does nAddition, subtraction, multiplication, division, reciprocal, square-root nFP « integer conversion nOperations usually takes several cycles nCan be pipelined MKP-logo Chapter 3 — Arithmetic for Computers — 32 FP Instructions in MIPS nFP hardware is coprocessor 1 nAdjunct processor that extends the ISA nSeparate FP registers n32 single-precision: $f0, $f1, … $f31 nPaired for double-precision: $f0/$f1, $f2/$f3, … nRelease 2 of MIPs ISA supports 32 × 64-bit FP reg’s nFP instructions operate only on FP registers nPrograms generally don’t do integer ops on FP data, or vice versa nMore registers with minimal code-size impact nFP load and store instructions nlwc1, ldc1, swc1, sdc1 ne.g., ldc1 $f8, 32($sp) MKP-logo Chapter 3 — Arithmetic for Computers — 33 FP Instructions in MIPS nSingle-precision arithmetic nadd.s, sub.s, mul.s, div.s ne.g., add.s $f0, $f1, $f6 nDouble-precision arithmetic nadd.d, sub.d, mul.d, div.d ne.g., mul.d $f4, $f4, $f6 nSingle- and double-precision comparison nc.xx.s, c.xx.d (xx is eq, lt, le, …) nSets or clears FP condition-code bit ne.g. c.lt.s $f3, $f4 nBranch on FP condition code true or false nbc1t, bc1f ne.g., bc1t TargetLabel MKP-logo Chapter 3 — Arithmetic for Computers — 34 FP Example: °F to °C nC code: n float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } nfahr in $f12, result in $f0, literals in global memory space nCompiled MIPS code: n f2c: lwc1 $f16, const5($gp) lwc2 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra MKP-logo Chapter 3 — Arithmetic for Computers — 35 FP Example: Array Multiplication nX = X + Y × Z nAll 32 × 32 matrices, 64-bit double-precision elements nC code: n void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } nAddresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2 MKP-logo Chapter 3 — Arithmetic for Computers — 36 FP Example: Array Multiplication n MIPS code: li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loop L1: li $s1, 0 # j = 0; restart 2nd for loop L2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x) addu $t2, $t2, $s1 # $t2 = i * size(row) + j sll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j] L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j] … MKP-logo Chapter 3 — Arithmetic for Computers — 37 FP Example: Array Multiplication … sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k] mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1 MKP-logo Chapter 3 — Arithmetic for Computers — 38 Accurate Arithmetic nIEEE Std 754 specifies additional rounding control nExtra bits of precision (guard, round, sticky) nChoice of rounding modes nAllows programmer to fine-tune numerical behavior of a computation nNot all FP units implement all options nMost programming languages and FP libraries just use defaults nTrade-off between hardware complexity, performance, and market requirements MKP-logo Chapter 3 — Arithmetic for Computers — 39 Interpretation of Data nBits have no inherent meaning nInterpretation depends on the instructions applied nComputer representations of numbers nFinite range and precision nNeed to account for this in programs The BIG Picture MKP-logo Chapter 3 — Arithmetic for Computers — 40 Associativity nParallel programs may interleave operations in unexpected orders nAssumptions of associativity may fail nNeed to validate parallel programs under varying degrees of parallelism MKP-logo Chapter 3 — Arithmetic for Computers — 41 x86 FP Architecture nOriginally based on 8087 FP coprocessor n8 × 80-bit extended-precision registers nUsed as a push-down stack nRegisters indexed from TOS: ST(0), ST(1), … nFP values are 32-bit or 64 in memory nConverted on load/store of memory operand nInteger operands can also be converted on load/store nVery difficult to generate and optimize code nResult: poor FP performance MKP-logo Chapter 3 — Arithmetic for Computers — 42 x86 FP Instructions nOptional variations nI: integer operand nP: pop operand from stack nR: reverse operand order nBut not all combinations allowed Data transfer Arithmetic Compare Transcendental FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ FIADDP mem/ST(i) FISUBRP mem/ST(i) FIMULP mem/ST(i) FIDIVRP mem/ST(i) FSQRT FABS FRNDINT FICOMP FIUCOMP FSTSW AX/mem FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X MKP-logo Chapter 3 — Arithmetic for Computers — 43 Streaming SIMD Extension 2 (SSE2) nAdds 4 × 128-bit registers nExtended to 8 registers in AMD64/EM64T nCan be used for multiple FP operands n2 × 64-bit double precision n4 × 32-bit double precision nInstructions operate on them simultaneously nSingle-Instruction Multiple-Data MKP-logo Chapter 3 — Arithmetic for Computers — 44 Right Shift and Division nLeft shift by i places multiplies an integer by 2i nRight shift divides by 2i? nOnly for unsigned integers nFor signed integers nArithmetic right shift: replicate the sign bit ne.g., –5 / 4 n111110112 >> 2 = 111111102 = –2 nRounds toward –∞ nc.f. 111110112 >>> 2 = 001111102 = +62 MKP-logo Chapter 3 — Arithmetic for Computers — 45 Who Cares About FP Accuracy? nImportant for scientific code nBut for everyday consumer use? n“My bank balance is out by 0.0002¢!” L nThe Intel Pentium FDIV bug nThe market expects accuracy nSee Colwell, The Pentium Chronicles MKP-logo Chapter 3 — Arithmetic for Computers — 46 Concluding Remarks nISAs support arithmetic nSigned and unsigned integers nFloating-point approximation to reals nBounded range and precision nOperations can overflow and underflow nMIPS ISA nCore instructions: 54 most frequently used n100% of SPECINT, 97% of SPECFP nOther instructions: less frequent