CPU Simulator

The CPU Simulator incorporates data and instruction cache simulators as well as a 5-stage CPU instruction pipeline simulator. It supports multiple CPU simulations in shared memory or loosely coupled architectures. The CPU instructions are generated by the compiler.

CPU Simulator Image

The CPU Simulator is loosely based on Reduced Instruction Set Computer (RISC) architecture with a prominent register file composed of from 8 to 64 configurable fast registers, a minimal set of variable-length instructions (pure RISC has fixed length instructions), a limited number of addressing modes, data and instruction caches and a 5-stage instruction pipeline. Except two instructions, viz. load and store, the CPU instruction set is based on register to register addressing. The CPU instructions can be entered manually by selecting valid instructions and operand(s) from a list of instructions and operands. In selecting operands the associated addressing modes can also be specified at the same time. The selected assembler instruction is then added to the CPU instruction memory. The stored instructions can then be individually selected and manually executed one by one or run as a program. The simulator provides runtime debugging facilities for the selected instructions, registers and memory locations. A stack is provided that demonstrates support for interrupts, system calls, subroutine parameters, saving register values between subroutine calls, and return addresses.

A further refinement to CPU simulator is the inclusion of cache and pipeline simulations both of which provide highly configurable and visual operations. These advanced simulators can be used to demonstrate technology specific details and their impact on system performance. The cache placement and replacement policies can be selected; the hit/miss ratios for different cache organizations can be plotted and compared. The pipeline stages are colour coded and animated. Different methods of eliminating pipeline hazards to improve performance can be clearly demonstrated to improve understanding. A history of pipeline activity is maintained that can be used to investigate the stages of the pipeline.

In order to be able to study systems with multiple processors, e.g. multicore processors , the simulator can optionally start multiple processor simulations. Each processor is identical and loading code in one can optionally be duplicated in others thus simulating shared memory or tightly-coupled architectures. It is also possible to configure the CPU simulators as loosely-coupled architectures. The processors can be used to demonstrate load balancing and virtualization with multiple operating systems. The CPU simulator defines a list of vectored interrupts. Each interrupt vector is triggered by a pre-defined event, e.g. console input or timer event. The inbuilt high-level language has constructs for the definition of interrupt routines as interrupt handlers the addresses of which are placed in the interrupt vectors at program load time.


Caches and Pipeline

Main Memory (RAM): The simulator data memory for each process is shown as maximum 10 pages where each page is 256 bytes in size (in real hardware these are much larger). The contents are shown in hexadecimal as well as in printable form if the hex values represent printable characters otherwise dots are displayed. The hex data is arranged in rows of 16 bytes. The data can be manually modified. Note that data in data cache may not be shown in the data memory until the cache is flushed or data is transferred as a result of cache replacement policy, i.e. copying data from cache to main memory as the cache space is re-used by new data. This is not necessary in instruction cache as instructions are not modified.

Page Table: A list of pages belonging to a process is displayed in the page table. Each entry in the table give page statistics including whether the page is swapped out or not. It is possible to manually swap in and out each page. A count of page faults indicate the frequency of page swapping due to memory space shortages as a result of multitasking.

Caches: The CPU has two different caches that it simulates. One is the data cache and the other is the instruction cache. This is sometimes called the Harvard Architecture where an attempt is made to mitigate the so called Von Nuemann bottleneck where both the instructions and the data reside in the same cache. The cache tutorial is designed to demonstrate the various aspects of modern caches. For, example, a special mechanism is needed to keep caches of multiple CPUs in, say, a multicore processor in step with each other, known as cache coherency. One such mechanism uses MESI protocol that is implemented in the data cache simulator (available in version 8.5 that will be made available soon!).

Pipeline: A 5-stage instruction pipeline is simulated to demonstrate the way multiple instructions are executed in parallel rather than in sequence. So, for example, a CPU with a 5-stage pipeline is capable of executing 5 instructions in 5 clock cycles, i.e. one instruction per clock cycle. A CPU without a 5-stage pipeline will execute the same 5 instructions in 5 clock cycles, i.e. 5 times slower than the pipelined CPU. This is in theory. In reality things are a little less straightforward, e.g. jump instructions can upset a pipeline increasing the clock cycles per instruction. The pipeline simulator is designed to demonstrate these aspects of a computer architecture. The pipeline simulator tutorial is designed to show the different aspects of modern pipelines including operand forwarding and jump prediction optimisers.

%d bloggers like this: