Why did we need GPUs

Introduction

Moores law is the prediction that the numbers of transistors per unit area would double every 18-24 months.

This was been true until 2005.

The transistors are getting harder to make smaller due to the physical limitations.
[look up] why is it actually getting harder? -> physics was an issue, heat became too high to realistically rely on cooling to solve the problem

When they get smaller it is easier to switch them on and off faster, so we can increase the clock frequency faster.

The frequency (clock rate) followed the same trend as a result but this lasted only until 2005.

Even if the transistors were doubling we stopped trying to increase the frequency because it takes more power to do this. Burning more power generates more heat thus needing cooling or the CPU would begin melting.

There was a bottleneck because of the cooling technology we had.

Single thread performance also started to stagnate but because of compiler advancements and improvements in architecture there were still some improvements. [look up] speculative execution, branch prediction, out of order execution

Because the single thread performance was stagnating but we were still getting more transistors. We started to use extra transistors to make the CPU cores more complex, better at single threaded performance, we hit physical and architectural walls (power, ILP (Instruction-Level Parallelism), heat). So we began to use those transistors to build many simpler cores with more ALUs, focused on parallelism rather than per-core complexity.

This all happened at around 2005.

Before the slow down in single thread performance, it was possible to just see massive performance gains by running an older program on a new processor. However, to now see performance gains in software it was now required to update programs to take advantage of parallel processing.

Design approaches

Latency-Oriented Design: minimize the time it takes for a single task
Throughput-Oriented Design: maximise the number of task that can be performed in a given time frame

CPU: Latency-Oriented Design

has few powerful ALU to reduce operation latency
large caches reduce the missrate of the CPU,
sophisticated control units for branch prediction to reduce control hazards
- data forwarding to reduce data hazards
- out-of-order operation to reduce latency

[look up] n-bit multi cycle and single cycle why do smaller ALUs give higher latency for operations?

GPU: Throughput-Oriented Design

ALU are much smaller and have higher latency but have higher throughput. [look up] Heavily pipelined for further throughput.
small caches so we can dedicate more area in the silicon to compute
- memory access operations take longer as a result
simpler control units so we have more area dedicated to computation

minimize stalls in the pipeline with scheduling instructions with compiler techniques.

On the hardware side we can do out-of-order ROB [look up] they can also do multithreading to hide short latency

use the same core to execute multiple threads
hyperthreading commonly uses 2 threads running at the same time in the same core
CPU have a modest amount of multithreading

On the GPU we have a massive number of threads to hide the high latency

many more cores means more threads that can be used
more threads can be used on the same core, instead of just 2 we could use 32

CPU have high clock frequency while GPU have moderate clock frequency

What is a GPU

In graphics we don’t care about the speed at which we can render 1 pixel, we care more about rendering as many pixels as possible at the same time.

People realised that GPUs althought built for graphics, were actually great at general purpose computing as well. However, before 2007 there were only graphics APIs to program on GPU like openGL.

So people had to reformulate their computation as functions that operate on pixels to do their operations.

NVIDIA then released CUDA as a result. A programming interface to use the power of GPUs in a general purpose way. This still required extensions to the GPU architecture.

2007 marks the beginning of the modern computing era.

Why did GPUs succeed?

Chips are very expensives to build and require a large volume of sales to balance the costs.

this makes the chip market hard to get into and succeed in
when parallel computing becamse mainstream, GPUs were already being used in the gaming sector so it gave them a large head start compared to other potential massively parallel accelerators

One issue is that because the gaming sector is the largest market share for GPUs, if there is a new advancement that could improve performance in scientefic computing but would negatively impact the performance in gaming, the likelyhood of the manufacturer going ahead with it is low.

[look up] tenstorrent

References

PMPP lecture 1