The term computation itself was termed mid-way through the 20th century with the notion of someone writing a bunch of numbers on a notepad and adding them up. As modern computing started to become a reality, and progress accelerated, our vocabulary about computation has increased and so has the complexity around it. Reasoning about computation problems is extremely important as the implications are very far reaching.
Let’s present a pretty simple problem of knowing who knows who, and forming cliques from that information. Represented below is a group of people and their relation represented by lines connected to them.
If I asked you to find a clique of size 3 in the above diagram, you wouldn’t even need to think about it and be able to point it out.
However, if we made this group of people to be just a bit bigger, it wouldn’t be super easy to find certain cliques. In the example below, try to find the largest clique with the same amount of ease as the previous example.
It’s not super hard but it quite clearly shows the fact that if you increase the number of people it gets much harder to find a solution. The primary reason for this is because each person is linked to other people and each check requires another 4 checks.
P and NP
In complexity theory, we have two classes of problems. Ones where you can find a solution quite easily in polynomial time and one where you can only verify the solution in polynomial time. Don’t worry about the definition of polynomial time, we’ll cover that soon.
As shown in our previous example, finding a clique can be hard but verifying if a clique is valid is much easier. Similarly, we describe problems where finding solutions easily is in P and problems where only verifying the solution easily is in NP.
How do you know if a problem is in P versus NP? That’s where the fun begins.
We’ve got plenty more to get through but just keep these two facts straight:
- P = easy to find a solution (in polynomial time)
- NP = easy to verify a solution is valid (in polynomial time)
Complexity theory’s fundamentals rely on how many steps it takes for a program to run not the amount of time taken to solve a problem. Why is this? According to Moore’s law the amount of computational power doubles every two years. Rather than reasoning about the state of hardware, we want to reason about how hard it is to solve a problem regardless of the machine.
Let’s take the simple example of a sorting algorithm called bubble sort. The way this algorithm works is by going through a list and then swapping any numbers if it finds a number which is greater than the number ahead of it.
Ultimately we’ll end up with the sorted list 1, 2, 4, 5, 8.
That sounds easy but let’s break down what we actually had to:
- Read through the entire list of numbers
- Find if any element is out of order by comparing against the next number
- If the next number is smaller than the current one, swap them around
- Repeat the first step multiple times until you can’t swap any numbers
Computationally this is extremely expensive because for every number we have in the list we might need to swap it as many times as the number of elements we have in the list. The worst case for a similar list with 5 numbers could take 5 x 5 = 25 steps.
We refer to this as having a complexity at most O(n²). It might be less if we get lucky with our data set but this is the worst case scenario.
On the other hand, if we wanted to write a program to check if the number 1 exists in our list we’d just have to read the entire list which would take at most 5 steps. This means the complexity is at most O(n).
To finish this off, if we just wanted to read the first number in our list that’d always take 1 step. We refer to this as having a complexity of O(1).
The examples we’ve described here are all examples of polynomial time algorithms.
Exponential time is when our exponent is based on the size of our list. So rather than having O(n²) we’d have O(2^n). This is referred to as exponential time. To demonstrate how much harder it is to solve these problems refer to the graph below:
Deterministic vs Non-Deterministic
Sounds fancy but it’s not once you get into the details of it. Deterministic machines are those in which the final state are predictable and will always yield the same results.
In a non-deterministic machine (they don’t actually exist) an example flow look would look something like this:
As you can see, there’s plenty that could go wrong in this diagram since the execution could hang (repeat) or continually be rejected until it finds that single acceptance path.
Revisiting P and NP
Now that we know what polynomial time, exponential time, deterministic and non-deterministic means we can clarify a more accurate description about what P vs NP actually means.
- P is commonly referred to as the class of problems which can be solved in polynomial time on a deterministic machine.
- NP is known as the class of problems which can be solved in polynomial time on a non-deterministic machine or a solution verifiable in polynomial time.
If non-deterministic machines don’t exist then how do we solve NP problems? The short answer is we can’t. The best we can do is try to solve NP problems with algorithms that take exponential time but that gets out of hand very quickly.
If someone finds a way to solve even a single NP problem in polynomial time on a deterministic machine they get an official prize for $1,000,000. Why is that? Well all NP problems share similar characteristics and if you solve one, you can solve all of them. It’s for this very reason no one has been able to prove that P = NP or vice versa.
What would happen if you solved an NP problem in polynomial time on a deterministic machine? You’d break all encryption, there’d be no Bitcoin and humanity would be a mess.
It doesn’t just stop at P and NP. We go as far as to saying that there exists NP-hard and NP-complete problems.
As you can see, everything inside P is also encapsulated in NP. We’re just not sure where the boundary is exactly.
NP-hard problems are problems where even checking the solution takes exponential time to verify, such as the winning moves for a chess game. NP-complete problems on the other hand are problems where excluding the wrong answer is easy, but finding the right one is still hard. Odd but interesting contradiction.
I still haven’t touched upon reductions, big-oh notation, SAT problems and so much more that makes complexity theory interesting but I hope you have enough to check it out yourself if you want!
The idea behind going into depth with this article is that it’ll give confidence to tackle more interesting articles in the future which talk about consensus algorithms, proof verification and more since they touch upon the concepts of complexity theory.
Loopring is a protocol for building high-performance, non-custodial, orderbook exchanges on Ethereum. You can sign up for our bi-weekly update, or learn more at:
⭑ Twitter: twitter.com/loopringorg
⭑ Reddit: reddit.com/r/loopringorg
⭑ Telegram: t.me/loopring_en
⭑ Discord: discord.gg/KkYccYp
⭑ GitHub: https://github.com/Loopring
⭑ Kakao: open.kakao.com/o/gJbSZdF (Korean)