commercially available high-performance processors in September, 1996. There is a distinct possibility that they will be somewhat out of date by the time you are reading this (even if it is only October, 1996!). Even in 1996, some very large machines were larger or faster than the figures below would indicate. |
One of the most important considerations in understanding the performance capabilities of a modern processor is the memory hierarchy. We can classify memory based on its "distance" from the processor: here distance is measured by the number of machine cycles required to access it. As memory becomes further away from the main processor (ie becomes slower to access) the number of words in a typical system increases. Some indicative numbers for 1996 processors would be:
Name | Access Time (cycles) | Number of words |
---|---|---|
Register | 1 | 32 |
Cache Level 1 | 2 | 16x103 |
Cache Level 2 | 5 | 0.25x106 |
Main memory | 30 | 108 |
Disc | 106 | 109 |
In 1996, high performance processors had clock frequencies of of 200-400 MHz or cycle times of 2.5-5.0 nanoseconds.
Many high performance systems will have a number of levels of cache: a small level 1 cache "close" the processor (typically needing 2 cycles to access) and as many as 2 more levels of successively lower and larger caches built from high performance (but expensive!) memory chips.
For state-of-the-art (2 cycle access) performance, a processor needs to have the level 1 cache on the same die as the processor itself. Sizes of 16 Kwords (64 Kbytes, often separated into instruction and data cache) were common.
The bus between the cache and main memory is a significant bottleneck: system designers usually organise the cache as a set of "lines" of, typically, 8 words. A whole line of 8 contiguous memory locations will be fetched into the cache each time it is updated. (8 words will be fetched in a 4-cycle burst - 2 words in each cycle on a 64-bit bus.) This means that when one memory word is fetched into the cache, 7 of its neighbours will also be fetched. A program which is able to use this factor (by, for instance, keeping closely related data in contiguous memory locations) will make more effective use of the cache and see a reduced effective memory access time.
At least one processor (DEC's Alpha) has a larger level 2 cache on the processor die. Other systems place the level 2 cache on a separate die within the same physical package (such packages are sometimes referred to as multi-chip modules).
The large gap between access times (a factor of 104) for the last two levels of the hierarchy is probably one of the factor that is driving DRAM research and development towards higher density rather than higher speed. However work on cache-DRAMs and synchronous DRAM is pushing its access time down.
Back to the Table of Contents |