How a Computer’s CPU Cache Works

In the 1980s microprocessor speeds exponentially increased when compared to memory access times. It quickly became evident that something had to be done to improve the speed with which memory could be accessed and make the entire system more efficient. Those discrepancies between processing speed and memory speed led to the development of the cache.

The invention of the cache was one of the most critical events in the history of computer science. But what exactly is the cache? How does it work?


At its basic level a cache is a speedy kind of memory. It contains a small pool of memory containing instructions that the computer will most likely need next when carrying out a particular task. The computer loads that information into the cache using complex algorithms and knowledge of programming code. The purpose of having a cache system in the computer is to make sure the CPU has the unhindered access to the data it needs in the order it needs it.

To see how this works, you need to know that computers have three types of memory. First there is the the primary memory found in the hard drive or the SSD. It is the largest repository of memory in the machine. Then there is the RAM or Random Access Memory, which is faster, but smaller, than the primary memory device. Lastly, there are memory units within the CPU itself, known as the cache. The cache is the fastest of all the memory types.

When a program launches, that program begins to execute a series of instructions found in the program’s code. That information first loads into the RAM and then moves on to the CPU. To best use the data to carry out the instructions, the CPU needs a high-speed memory. That’s where the cache comes in.

Within the CPU, there are three different levels of cache: the L1, L2, and L3. Some companies are even working on an L4 cache.

The L1 cache is the fastest and smallest of the three. It contains the data the CPU is most likely to need to perform the operations. The L1 usually holds around 256KB, although some have pushed it up to 1MB.


This small cache has a dual purpose, having both an instruction cache and a data cache. The instruction cache deals with the operations the CPU has to perform, and the data cache holds the information on which the process has to be done.

Next, there is the L2 cache. The L2 is slower and holds more information than the L1. It contains between 256K and 8MB of data that the computer will most likely need to access next.

Lastly, we see the L3 cache. It is the largest and slowest cache, storing anywhere from 4MB to 50MB.

When a program starts on your computer, data flows from the RAM to the L3 cache, then the L2 and finally to the L1. While the program is running, the CPU looks for the information it needs to run, starting in the L1 cache and working backwards from there. If the CPU finds the needed information, it’s called a cache hit. If it cannot find the information it needs, it is a cache miss, and the computer has to go looking somewhere else to find the information it needs.


Latency is an important factor in the efficiency of a computer. Latency is the time needed for a piece of information to be retrieved. The L1 cache is the fastest, and therefore it has the lowest latency. When a cache miss occurs, latency increases as the computer must keep searching in different caches to find the information it needs.

Newer computers have a much smaller CPU transistor size that has made it possible to build a board with more room on which to place the cache directly on it. Physically putting the cache closer to the CPU reduces latency.

Although the cache is not something those selling computers point out often, it’s worth checking into. Faster caches will have less latency, making your programs run faster and more efficiently.

Leave a Comment

Yeah! You've decided to leave a comment. That's fantastic! Check out our comment policy here. Let's have a personal and meaningful conversation.

Sponsored Stories