MKP-logo-white-transparent Title 4th-edition
Chapter 3
Arithmetic for Computers

MKP-logo
Chapter 3 — Arithmetic for Computers — 2
Arithmetic for Computers
nOperations on integers
nAddition and subtraction
nMultiplication and division
nDealing with overflow
nFloating-point real numbers
nRepresentation and operations

MKP-logo
Chapter 3 — Arithmetic for Computers — 3
f03-01-P374493
Integer Addition
nExample: 7 + 6
nOverflow if result out of range
nAdding +ve and –ve operands, no overflow
nAdding two +ve operands
nOverflow if result sign is 1
nAdding two –ve operands
nOverflow if result sign is 0

MKP-logo
Chapter 3 — Arithmetic for Computers — 4
Integer Subtraction
nAdd negation of second operand
nExample: 7 – 6 = 7 + (–6)
n +7: 0000 0000 … 0000 0111
–6: 1111 1111 … 1111 1010
+1: 0000 0000 … 0000 0001
nOverflow if result out of range
nSubtracting two +ve or two –ve operands, no overflow
nSubtracting +ve from –ve operand
nOverflow if result sign is 0
nSubtracting –ve from +ve operand
nOverflow if result sign is 1

MKP-logo
Chapter 3 — Arithmetic for Computers — 5
Dealing with Overflow
nSome languages (e.g., C) ignore overflow
nUse MIPS addu, addui, subu instructions
nOther languages (e.g., Ada, Fortran) require raising an exception
nUse MIPS add, addi, sub instructions
nOn overflow, invoke exception handler
nSave PC in exception program counter (EPC) register
nJump to predefined handler address
nmfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective
action

MKP-logo
Chapter 3 — Arithmetic for Computers — 6
Arithmetic for Multimedia
nGraphics and media processing operates on vectors of 8-bit and 16-bit data
nUse 64-bit adder, with partitioned carry chain
nOperate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
nSIMD (single-instruction, multiple-data)
nSaturating operations
nOn overflow, result is largest representable value
nc.f. 2s-complement modulo arithmetic
nE.g., clipping in audio, saturation in video

MKP-logo
Chapter 3 — Arithmetic for Computers — 7
Multiplication
nStart with long-multiplication approach
   1000
×  1001
   1000
  0000
 0000
1000
1001000
Length of product is the sum of operand lengths
multiplicand
multiplier
product
f03-04-P374493

MKP-logo
Chapter 3 — Arithmetic for Computers — 8
f03-05-P374493
Multiplication Hardware
Initially 0
f03-04-P374493

MKP-logo
Chapter 3 — Arithmetic for Computers — 9
f03-06-P374493
Optimized Multiplier
nPerform steps in parallel: add/shift
nOne cycle per partial-product addition
nThat’s ok, if frequency of multiplications is low

MKP-logo
Chapter 3 — Arithmetic for Computers — 10
Faster Multiplier
nUses multiple adders
nCost/performance tradeoff
f03-08-P374493
nCan be pipelined
nSeveral multiplication performed in parallel

MKP-logo
Chapter 3 — Arithmetic for Computers — 11
MIPS Multiplication
nTwo 32-bit registers for product
nHI: most-significant 32 bits
nLO: least-significant 32-bits
nInstructions
nmult rs, rt  /  multu rs, rt
n64-bit product in HI/LO
nmfhi rd  /  mflo rd
nMove from HI/LO to rd
nCan test HI value to see if product overflows 32 bits
nmul rd, rs, rt
nLeast-significant 32 bits of product –> rd

MKP-logo
Chapter 3 — Arithmetic for Computers — 12
Division
nCheck for 0 divisor
nLong division approach
nIf divisor ≤ dividend bits
n1 bit in quotient, subtract
nOtherwise
n0 bit in quotient, bring down next dividend bit
nRestoring division
nDo the subtract, and if remainder goes < 0, add divisor back
nSigned division
nDivide using absolute values
nAdjust sign of quotient and remainder as required
        1001
1000 1001010
    -1000
        10
        101
        1010
       -1000
          10
n-bit operands yield n-bit
quotient and remainder
quotient
dividend
remainder
divisor

MKP-logo
Chapter 3 — Arithmetic for Computers — 13
f03-10-P374493
Division Hardware
Initially dividend
Initially divisor in left half
f03-09-P374493

MKP-logo
Chapter 3 — Arithmetic for Computers — 14
f03-12-P374493
Optimized Divider
nOne cycle per partial-remainder subtraction
nLooks a lot like a multiplier!
nSame hardware can be used for both

MKP-logo
Chapter 3 — Arithmetic for Computers — 15
Faster Division
nCan’t use parallel hardware as in multiplier
nSubtraction is conditional on sign of remainder
nFaster dividers (e.g. SRT devision) generate multiple quotient bits per step
nStill require multiple steps

MKP-logo
Chapter 3 — Arithmetic for Computers — 16
MIPS Division
nUse HI/LO registers for result
nHI: 32-bit remainder
nLO: 32-bit quotient
nInstructions
ndiv rs, rt  /  divu rs, rt
nNo overflow or divide-by-0 checking
nSoftware must perform checks if required
nUse mfhi, mflo to access result

MKP-logo
Chapter 3 — Arithmetic for Computers — 17
Floating Point
nRepresentation for non-integral numbers
nIncluding very small and very large numbers
nLike scientific notation
n–2.34 × 1056
n+0.002 × 10–4
n+987.02 × 109
nIn binary
n±1.xxxxxxx2 × 2yyyy
nTypes float and double in C
normalized
not normalized

MKP-logo
Chapter 3 — Arithmetic for Computers — 18
Floating Point Standard
nDefined by IEEE Std 754-1985
nDeveloped in response to divergence of representations
nPortability issues for scientific code
nNow almost universally adopted
nTwo representations
nSingle precision (32-bit)
nDouble precision (64-bit)

MKP-logo
Chapter 3 — Arithmetic for Computers — 19
IEEE Floating-Point Format
nS: sign bit (0 Þ non-negative, 1 Þ negative)
nNormalize significand: 1.0 ≤ |significand| < 2.0
nAlways has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)
nSignificand is Fraction with the “1.” restored
nExponent: excess representation: actual exponent + Bias
nEnsures exponent is unsigned
nSingle: Bias = 127; Double: Bias = 1203
S
Exponent
Fraction
single: 8 bits
double: 11 bits
single: 23 bits
double: 52 bits

MKP-logo
Chapter 3 — Arithmetic for Computers — 20
Single-Precision Range
nExponents 00000000 and 11111111 reserved
nSmallest value
nExponent: 00000001
Þ actual exponent = 1 – 127 = –126
nFraction: 000…00 Þ significand = 1.0
n±1.0 × 2–126 ≈ ±1.2 × 10–38
nLargest value
nexponent: 11111110
Þ actual exponent = 254 – 127 = +127
nFraction: 111…11 Þ significand ≈ 2.0
n±2.0 × 2+127 ≈ ±3.4 × 10+38

MKP-logo
Chapter 3 — Arithmetic for Computers — 21
Double-Precision Range
nExponents 0000…00 and 1111…11 reserved
nSmallest value
nExponent: 00000000001
Þ actual exponent = 1 – 1023 = –1022
nFraction: 000…00 Þ significand = 1.0
n±1.0 × 2–1022 ≈ ±2.2 × 10–308
nLargest value
nExponent: 11111111110
Þ actual exponent = 2046 – 1023 = +1023
nFraction: 111…11 Þ significand ≈ 2.0
n±2.0 × 2+1023 ≈ ±1.8 × 10+308

MKP-logo
Chapter 3 — Arithmetic for Computers — 22
Floating-Point Precision
nRelative precision
nall fraction bits are significant
nSingle: approx 2–23
nEquivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision
nDouble: approx 2–52
nEquivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision

MKP-logo
Chapter 3 — Arithmetic for Computers — 23
Floating-Point Example
nRepresent –0.75
n–0.75 = (–1)1 × 1.12 × 2–1
nS = 1
nFraction = 1000…002
nExponent = –1 + Bias
nSingle: –1 + 127 = 126 = 011111102
nDouble: –1 + 1023 = 1022 = 011111111102
nSingle: 1011111101000…00
nDouble: 1011111111101000…00

MKP-logo
Chapter 3 — Arithmetic for Computers — 24
Floating-Point Example
nWhat number is represented by the single-precision float
n 11000000101000…00
nS = 1
nFraction = 01000…002
nFxponent = 100000012 = 129
nx = (–1)1 × (1 + 012) × 2(129 – 127)
n = (–1) × 1.25 × 22
n = –5.0

MKP-logo
Chapter 3 — Arithmetic for Computers — 25
Floating-Point Addition
nConsider a 4-digit decimal example
n9.999 × 101 + 1.610 × 10–1
n1. Align decimal points
nShift number with smaller exponent
n9.999 × 101 + 0.016 × 101
n2. Add significands
n9.999 × 101 + 0.016 × 101 = 10.015 × 101
n3. Normalize result & check for over/underflow
n1.0015 × 102
n4. Round and renormalize if necessary
n1.002 × 102

MKP-logo
Chapter 3 — Arithmetic for Computers — 26
Floating-Point Addition
nNow consider a 4-digit binary example
n1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
n1. Align binary points
nShift number with smaller exponent
n1.0002 × 2–1 + –0.1112 × 2–1
n2. Add significands
n1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
n3. Normalize result & check for over/underflow
n1.0002 × 2–4, with no over/underflow
n4. Round and renormalize if necessary
n1.0002 × 2–4 (no change)  = 0.0625

MKP-logo
Chapter 3 — Arithmetic for Computers — 27
FP Adder Hardware
nMuch more complex than integer adder
nDoing it in one clock cycle would take too long
nMuch longer than integer operations
nSlower clock would penalize all instructions
nFP adder usually takes several cycles
nCan be pipelined

MKP-logo
Chapter 3 — Arithmetic for Computers — 28
f03-16-P374493
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4

MKP-logo
Chapter 3 — Arithmetic for Computers — 29
Floating-Point Multiplication
nConsider a 4-digit decimal example
n1.110 × 1010 × 9.200 × 10–5
n1. Add exponents
nFor biased exponents, subtract bias from sum
nNew exponent = 10 + –5 = 5
n2. Multiply significands
n1.110 × 9.200 = 10.212  Þ  10.212 × 105
n3. Normalize result & check for over/underflow
n1.0212 × 106
n4. Round and renormalize if necessary
n1.021 × 106
n5. Determine sign of result from signs of operands
n+1.021 × 106

MKP-logo
Chapter 3 — Arithmetic for Computers — 30
Floating-Point Multiplication
nNow consider a 4-digit binary example
n1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
n1. Add exponents
nUnbiased: –1 + –2 = –3
nBiased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
n2. Multiply significands
n1.0002 × 1.1102 = 1.1102  Þ  1.1102 × 2–3
n3. Normalize result & check for over/underflow
n1.1102 × 2–3 (no change) with no over/underflow
n4. Round and renormalize if necessary
n1.1102 × 2–3 (no change)
n5. Determine sign: +ve × –ve Þ –ve
n–1.1102 × 2–3  = –0.21875

MKP-logo
Chapter 3 — Arithmetic for Computers — 31
FP Arithmetic Hardware
nFP multiplier is of similar complexity to FP adder
nBut uses a multiplier for significands instead of an adder
nFP arithmetic hardware usually does
nAddition, subtraction, multiplication, division, reciprocal, square-root
nFP « integer conversion
nOperations usually takes several cycles
nCan be pipelined

MKP-logo
Chapter 3 — Arithmetic for Computers — 32
FP Instructions in MIPS
nFP hardware is coprocessor 1
nAdjunct processor that extends the ISA
nSeparate FP registers
n32 single-precision: $f0, $f1, … $f31
nPaired for double-precision: $f0/$f1, $f2/$f3, …
nRelease 2 of MIPs ISA supports 32 × 64-bit FP reg’s
nFP instructions operate only on FP registers
nPrograms generally don’t do integer ops on FP data, or vice versa
nMore registers with minimal code-size impact
nFP load and store instructions
nlwc1, ldc1, swc1, sdc1
ne.g., ldc1 $f8, 32($sp)

MKP-logo
Chapter 3 — Arithmetic for Computers — 33
FP Instructions in MIPS
nSingle-precision arithmetic
nadd.s, sub.s, mul.s, div.s
ne.g., add.s $f0, $f1, $f6
nDouble-precision arithmetic
nadd.d, sub.d, mul.d, div.d
ne.g., mul.d $f4, $f4, $f6
nSingle- and double-precision comparison
nc.xx.s, c.xx.d (xx is eq, lt, le, …)
nSets or clears FP condition-code bit
ne.g. c.lt.s $f3, $f4
nBranch on FP condition code true or false
nbc1t, bc1f
ne.g., bc1t TargetLabel

MKP-logo
Chapter 3 — Arithmetic for Computers — 34
FP Example: °F to °C
nC code:
n float f2c (float fahr) {
  return ((5.0/9.0)*(fahr - 32.0));
}
nfahr in $f12, result in $f0, literals in global memory space
nCompiled MIPS code:
n f2c: lwc1  $f16, const5($gp)
     lwc2  $f18, const9($gp)
     div.s $f16, $f16, $f18
     lwc1  $f18, const32($gp)
     sub.s $f18, $f12, $f18
     mul.s $f0,  $f16, $f18
     jr    $ra

MKP-logo
Chapter 3 — Arithmetic for Computers — 35
FP Example: Array Multiplication
nX = X + Y × Z
nAll 32 × 32 matrices, 64-bit double-precision elements
nC code:
n void mm (double x[][],
         double y[][], double z[][]) {
  int i, j, k;
  for (i = 0; i! = 32; i = i + 1)
    for (j = 0; j! = 32; j = j + 1)
      for (k = 0; k! = 32; k = k + 1)
        x[i][j] = x[i][j]
                  + y[i][k] * z[k][j];
}
nAddresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2

MKP-logo
Chapter 3 — Arithmetic for Computers — 36
FP Example: Array Multiplication
n  MIPS code:
    li   $t1, 32       # $t1 = 32 (row size/loop end)
    li   $s0, 0        # i = 0; initialize 1st for loop
L1: li   $s1, 0        # j = 0; restart 2nd for loop
L2: li   $s2, 0        # k = 0; restart 3rd for loop
    sll  $t2, $s0, 5   # $t2 = i * 32 (size of row of x)
    addu $t2, $t2, $s1 # $t2 = i * size(row) + j
    sll  $t2, $t2, 3   # $t2 = byte offset of [i][j]
    addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]
    l.d  $f4, 0($t2)   # $f4 = 8 bytes of x[i][j]
L3: sll  $t0, $s2, 5   # $t0 = k * 32 (size of row of z)
    addu $t0, $t0, $s1 # $t0 = k * size(row) + j
    sll  $t0, $t0, 3   # $t0 = byte offset of [k][j]
    addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]
    l.d  $f16, 0($t0)  # $f16 = 8 bytes of z[k][j]
    …

MKP-logo
Chapter 3 — Arithmetic for Computers — 37
FP Example: Array Multiplication
    …
    sll  $t0, $s0, 5       # $t0 = i*32 (size of row of y)
    addu  $t0, $t0, $s2    # $t0 = i*size(row) + k
    sll   $t0, $t0, 3      # $t0 = byte offset of [i][k]
    addu  $t0, $a1, $t0    # $t0 = byte address of y[i][k]
    l.d   $f18, 0($t0)     # $f18 = 8 bytes of y[i][k]
    mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]
    add.d $f4, $f4, $f16   # f4=x[i][j] + y[i][k]*z[k][j]
    addiu $s2, $s2, 1      # $k k + 1
    bne   $s2, $t1, L3     # if (k != 32) go to L3
    s.d   $f4, 0($t2)      # x[i][j] = $f4
    addiu $s1, $s1, 1      # $j = j + 1
    bne   $s1, $t1, L2     # if (j != 32) go to L2
    addiu $s0, $s0, 1      # $i = i + 1
    bne   $s0, $t1, L1     # if (i != 32) go to L1

MKP-logo
Chapter 3 — Arithmetic for Computers — 38
Accurate Arithmetic
nIEEE Std 754 specifies additional rounding control
nExtra bits of precision (guard, round, sticky)
nChoice of rounding modes
nAllows programmer to fine-tune numerical behavior of a computation
nNot all FP units implement all options
nMost programming languages and FP libraries just use defaults
nTrade-off between hardware complexity, performance, and market requirements

MKP-logo
Chapter 3 — Arithmetic for Computers — 39
Interpretation of Data
nBits have no inherent meaning
nInterpretation depends on the instructions applied
nComputer representations of numbers
nFinite range and precision
nNeed to account for this in programs
The BIG Picture

MKP-logo
Chapter 3 — Arithmetic for Computers — 40
Associativity
nParallel programs may interleave operations in unexpected orders
nAssumptions of associativity may fail
nNeed to validate parallel programs under varying degrees of parallelism

MKP-logo
Chapter 3 — Arithmetic for Computers — 41
x86 FP Architecture
nOriginally based on 8087 FP coprocessor
n8 × 80-bit extended-precision registers
nUsed as a push-down stack
nRegisters indexed from TOS: ST(0), ST(1), …
nFP values are 32-bit or 64 in memory
nConverted on load/store of memory operand
nInteger operands can also be converted
on load/store
nVery difficult to generate and optimize code
nResult: poor FP performance

MKP-logo
Chapter 3 — Arithmetic for Computers — 42
x86 FP Instructions
nOptional variations
nI: integer operand
nP: pop operand from stack
nR: reverse operand order
nBut not all combinations allowed
Data transfer
Arithmetic
Compare
Transcendental
FILD  mem/ST(i)
FISTP mem/ST(i)
FLDPI
FLD1
FLDZ
FIADDP  mem/ST(i)
FISUBRP mem/ST(i) FIMULP  mem/ST(i) FIDIVRP mem/ST(i)
FSQRT
FABS
FRNDINT
FICOMP
FIUCOMP
FSTSW AX/mem
FPATAN
F2XMI
FCOS
FPTAN
FPREM
FPSIN
FYL2X

MKP-logo
Chapter 3 — Arithmetic for Computers — 43
Streaming SIMD Extension 2 (SSE2)
nAdds 4 × 128-bit registers
nExtended to 8 registers in AMD64/EM64T
nCan be used for multiple FP operands
n2 × 64-bit double precision
n4 × 32-bit double precision
nInstructions operate on them simultaneously
nSingle-Instruction Multiple-Data

MKP-logo
Chapter 3 — Arithmetic for Computers — 44
Right Shift and Division
nLeft shift by i places multiplies an integer by 2i
nRight shift divides by 2i?
nOnly for unsigned integers
nFor signed integers
nArithmetic right shift: replicate the sign bit
ne.g., –5 / 4
n111110112 >> 2 = 111111102 = –2
nRounds toward –∞
nc.f. 111110112 >>> 2 = 001111102 = +62

MKP-logo
Chapter 3 — Arithmetic for Computers — 45
Who Cares About FP Accuracy?
nImportant for scientific code
nBut for everyday consumer use?
n“My bank balance is out by 0.0002¢!” L
nThe Intel Pentium FDIV bug
nThe market expects accuracy
nSee Colwell, The Pentium Chronicles

MKP-logo
Chapter 3 — Arithmetic for Computers — 46
Concluding Remarks
nISAs support arithmetic
nSigned and unsigned integers
nFloating-point approximation to reals
nBounded range and precision
nOperations can overflow and underflow
nMIPS ISA
nCore instructions: 54 most frequently used
n100% of SPECINT, 97% of SPECFP
nOther instructions: less frequent