E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer architectures Vlad Popovici, Ph.D. Fac. of Science - RECETOX Outline 1 Introduction 2 A bit of computer architecture Central processing unit Memory Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a2 / 37 Motivation Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a3 / 37 An abstract view of a computer system Physics: electrons Transistors, diodes - devices Analog circuits: amplifiers, filters Logic: adders, memory Digital circuits: gates Microarchitecture: data paths, controllers Architecture: instructions, registers Operating system: device drivers, kernels Application software: programs Abstraction Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a4 / 37 Another view Hardware System software Applications Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a5 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast 4 performance via parallelism Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast 4 performance via parallelism 5 performance via pipelining Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast 4 performance via parallelism 5 performance via pipelining 6 performance via prediction Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast 4 performance via parallelism 5 performance via pipelining 6 performance via prediction 7 hierarchy of memories Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Eight great ideas in computer design (from Patterson and Hennessy’s “Computer Organization and Design”) 1 design for Moore’s Law 2 use abstraction to simplify design 3 make the common case fast 4 performance via parallelism 5 performance via pipelining 6 performance via prediction 7 hierarchy of memories 8 dependability via redundancy Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a6 / 37 Moore’s law The number of transistors in cost-effective integrated circuit double every 18-24 months. Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a7 / 37 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a8 / 37 Chip manufacturing process (from Patterson and Hennessy’s “Computer Organization and Design”) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a9 / 37 Performance what is the performance of a computer? Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a10 / 37 Performance what is the performance of a computer? response time vs throughput Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a10 / 37 Performance what is the performance of a computer? response time vs throughput hardware vs software performance Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a10 / 37 Performance what is the performance of a computer? response time vs throughput hardware vs software performance energy per instruction Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a10 / 37 Performance what is the performance of a computer? response time vs throughput hardware vs software performance energy per instruction measuring performance Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a10 / 37 (from Patterson and Hennessy’s “Computer Organization and Design”) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a11 / 37 General architecture In a very simplistic view, Computer = Central Processing Unit + Memory CPU Data and Instructions Memory CPU Data Memory Instructions Memory von Neumann architecture Harvard architecture Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a12 / 37 Central Processing Unit (CPU) Control Unit Arithmetic and Logic Unit (ALU) C P U Input device Output device Memory unit Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a13 / 37 Central Processing Unit (CPU) CPU executes instructions read from memory instructions for loading and storing values instructions that operate on values from registers, e.g. additions, bitwise operations, math functions etc. branching instructions etc Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a14 / 37 CPU Registers: internal (to CPU) memory cells used MEMORY INSTRUCTIONS R2=LOAD 0x100 R1=100 0x100 | 10 0x090 | 0 0x120 | 0 CPU 0x110 | 110 R3=ADD R1,R2 STORE 0x110=R3 REGISTERS Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a15 / 37 Speed, clock, cycles internal clock: used to maintain synchronicity of th operations the frequency of the clock (in MHz, or GHz nowadays) gives the speed of the CPU: one operation may start on each tick One cycle Amplitude One cycle Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a16 / 37 Instruction cycle Main steps in executing an instruction fetch: read instruction from memory decode: figure out what to do execute: take values from register and execute instruction store: save the result in a register Fetch Decode Execute Store Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a17 / 37 CPU: more details Store Load FP * / + - FP Decode Instruction Floating Point Register File AGU ALU SSE/MMX (etc) program code Integer Register File Cache CPU RAM register: fast internal storage; small - several bytes per register register file: the set of similar registers within CPU register are specialized: storing integer, floating point, instructions, addresses etc AGU: address generation unit - handles data access Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a18 / 37 CPU: pipelines Fetch Decode Execute Store Fetch Decode Execute Store Fetch Decode Execute Store Fetch Decode Execute Store Fetch Decode Execute Store t1 t2 t3 t4 t5 t6 t7 Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a19 / 37 CPU: CISC vs RISC CISC: Complex Instruction Set Computer the original ISA one instruction may take several cycles emphasizes hardware over software complex instructions (e.g memory-to-memory LOAD/STORE) shorter programs high cycles per second RISC: Reduced Instruction Set Computer improvement on CISC one clock-cycle per instruction emphasis on software register-to-register LOAD/STORE uses many internal registers low cycles per second Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a20 / 37 CPU: CISC vs RISC Example: compute A × B. Assume A is stored at memory location 1200, and B at 1201, respectively. The following instruction(s) performs the multiplication and stores the result at the first memory location. CISC MUL 1200,1201 RISC Load A, 1200 Load B, 1201 Mul A, B Store 1200, A Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a21 / 37 CPU: multilevel cache cache: fast memory closer to CPU improves data access speed by reducting emphmiss penalty CPU L1cache L2cache L3cache Main memory (RAM) Price Storage space Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a22 / 37 Moving bits and bytes - data buses a (computer) bus refers to hardware and protocols for transferring data internal buses: data (memory) bus, system bus, control bus, etc external (expansion) buses: connects devices to computer Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a23 / 37 Parallelism SMP: symmetric multiprocessor systems Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a24 / 37 Parallelism SMP: symmetric multiprocessor systems Advantages: increased throughput redundancy, hency reliability easy configuration. more processes executing at same time: MultiProcessing. Drawbacks: increased traffic over bus, longer distances between two CPUs risk of bottlenecks on shared resources coordination becomes much more complex Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a25 / 37 Parallelism Multicore Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a26 / 37 Parallelism Multicore - example OpenSPARC (Sun Microsystems) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a27 / 37 Parallelism Multicore Advantages: run instructions in parallel on different cores usually use a single die, or onto multiple dies but in single chip package more energy efficient: higher performance at lower energy less traffic, shorter distances than SMP Drawbacks: overhead in writing specific code dual-core processor does not work at 2× speed of single processor, but 60% − 80% more speed some operating systems still not exploit the multicore Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a28 / 37 Memory organization Computer = Central Processing Unit + MemoryMain Memory in the System CORE 1 L2CACHE0 SHAREDL3CACHE DRAMINTERFACE CORE 0 CORE 2 CORE 3 L2CACHE1 L2CACHE2 L2CACHE3 DRAMBANKS DRAM MEMORY CONTROLLER Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a29 / 37 Wishes: instantaneous access to any bit (0-latency) infinite capacity cheap (i.e. 0$) infinite bandwidth Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a30 / 37 Wishes: instantaneous access to any bit (0-latency) infinite capacity cheap (i.e. 0$) infinite bandwidth Reality: larger memory is slower: more time to locate the desired position faster memory is more expensive (SRAM vs DRAM) larger bandwidth is more expensive Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a30 / 37 Memory technology SRAM Static Random Access Memory per bit: 2 transistors for access, 4 transistors for storage it keeps state as long as the power is on DRAM Dynamic Random Access Memory per bit: 1 capacitor, 1 access transistor loses charge over time → needs refresh cycles Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a31 / 37 level 0 (volatile): CPU registers: data for instructions, etc level 1 (volatile): L1 cache: SRAM, separate data and instruction space, KBs/core level 2 (volatile): L2/3 cache: SRAM, normally within the same chip as CPU, MBs/core level 3 (volatile): main memory: usually DRAM; tens GBs (less often hundreds GBs or 1TB); in embedded devices could be SRAM (KBs-MBs in size) level 4 (permanent): disks, SSD - TBs in size Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a32 / 37 Memory - other storage media Floppy disks - now mostly extinct Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a33 / 37 Magnetic tapes - still relevant since 50s... Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a34 / 37 Magnetic tapes - still relevant since 50s... Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a35 / 37 Flash memory non-volatile electronic memory that can be electrically reprogrammed based on NAND or NOR gates limited number of write/erase cycles data degradation over time Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a36 / 37 Questions? Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Topic 4: Introduction to computer a37 / 37