Why Amdahl's Law is important

2022 January 05 01:13 Kostadin 7171¤ 1191¤

Amdahl's Law tells us how much in theory the time it takes to complete a task can be reduced if portions of the task can be executed in parallel by multiple task executors/processors/cores/threads instead of a single one.

theoretical_speedup = 1 / ((1 - p) + p/s)


p = original_execution_time_of_parallel_section_on_a_single_core / total_execution_time_of_the_entire_task_on_a_single_core


s = original_execution_time_of_parallel_section_on_a_single_core / execution_time_of_parallel_section_on_all_available_cores

As you throw more and faster cores at the problem of executing the task, the theoretical_speedup approaches 1 / (1 - p) but never reaches it.

An example: You have a task in which

We plug in the numbers:

s = 3 seconds / 0.3 seconds = 10

theoretical_speedup = 1 / ((1 - 0.75) + 0.75 / 10) = 1 / (0.25 + 0.075) = 1 / 0.325 = approx. 3.0769 times faster

Even if we throw infinitely more cores at the problem so that s approaches infinity, the theoretical_speedup will approach 1 / (1 - 0.75) = 1 / 0.25 = 4 but never really achieve it.

The biggest bottleneck in this idealised example is the 25% of the task that cannot be parallelised. If this percentage is instead closer to 1% then the maximum theoretical_speedup we can achieve with parallelism is 1 / (1 / 0.99) = 100 times. If the non-parallel portion of the task is 0.1% then the theoretical_speedup can approach 1000 times. But it can never be infinite.

What does this mean for everyday people? In short, unless the task/algorithm is changed so that as small as possible portion of it has to be executed serially, throwing more cores at the problem will start having diminishing returns quickly.

Let's say you are a PC gamer. You like to have the latest and greatest. The games you play use mostly the CPU for serial portions of their logic and the GPU for parallel portions of their logic (+ graphics). Let's say you have 1 CPU core that your game runs on and your GPU has 2000 cores that can execute logic (rendering graphics is included in that). You run your favourite game at buttery smooth 60 frames per second with no hiccups. You already have a brand new display that can accept a video signal at 120 Hz at the same resolution in which you play your favourite game. Wouldn't it be nice if you could upgrade your GPU so that it can achieve 120 frames per second for that game? Here comes the brand new GPU generation with 4000 cores per GPU (and no other changes)! You are excited! You save from your hard earned money to buy the new GPU with 4000 cores (running the same frequency for this example). You expect that you should be able to reach 120 FPS. You assemble everything, start your favourite game, look at the FPS counter and what do you see? 96 FPS?!?!? How is that possible? Easy! The serial portion of your game seems to be around 25% of the work for every frame. What is the result? You bought a GPU with 100% more cores but only got 60% speedup.

What does this mean for the consumer electronics industry? Some years ago, 1080p60fps was a decent spec. Now there is a hard push for 4K60fps, 1080p120fps and the display manufacturers can barely hold themselves from trying to push 8K60fps. From the perspective of PC gamers, why buy higher resolution and/or higher refresh rate display if the current generation PC hardware cannot deliver those framerates at those resolutions with all settings set to Ultra without image up-scaling and/or frame interpolation?

So far we only looked at core frequency. What about memory bandwidth, clock rate, cache speed, latency, size...? The speed of light is not infinite. All matter radiates heat. Energy transmission inefficiencies galore...

We are starting to hit fundamental physical limits. Moore's Law is showing its age. How can we use Amdahl's Law in order to extend our rate of progress as much as possible?

We can no longer expect that tomorrow's hardware will solve today's slowness. Algorithm and data structure optimizations are key. Programmers will have to think about performance more and more often. First and foremost, the quickest action that can ever be executed is no action at all. The careful balance between calculation time and total memory use will become harder and harder to achieve. Reducing as much as possible the portion of the task that cannot be run in parallel will become a goal.

Some observations on how Amdahl's Law affects the current tech world...

Crypto currencies that use a blockchain:

The blockchain is fundamentally a data structure in which the writing must be done serially and not in parallel. Writing to the blockchain is a synchronization event. It requires a minimum amount of time to calculate everything in order to append the new block in the chain. That time is never zero. Then it takes time for that new block to propagate through the entire network which is hard limited by the speed of light and also limited by the hardware of the network which is much slower than the speed of light. All of this restricts the total transactions per second that can be processed reliably. My humble prediction is that all crypto currencies that rely on a single blockchain that is cloned across all nodes will not achieve acceptable count of transactions per second without a consumer focused Level 2 solution. There is a reason why Visa, MasterCard, American Express and other networks still exist. There is a reason why each bank chain has a separate database and those databases prefer batch communication with each other at certain intervals.

The computer industry in general:

When a new CPU or GPU is released, it is no longer enough to increase just one of the core/memory frequences, cache size and core count. You need to improve everything across the board. No stone left unturned in the search for way to calculate faster. Once Moore's Law is no longer your friend, you start looking at Amdahl's Law. Hardware architectures optimised for parallelism will be required no matter what. Programming languages that have better parallel execution ergonomics will become more widely used.

Further reading:

Sort: Cost Yield Time