# About Big-O Notation – The good, the bad, and the confusing

Often, prior to using a specific algorithm, it helps to understand information about how the algorithm scales.  In order to document this, computer science has borrowed a notation from mathematics called Big-O notation.

Understanding Big-O notation is critical in making good decisions about algorithms or libraries used to implement routines.  However, there is often a lot of misunderstanding about what it means, and I frequently see this used as justifications for poor decisions.

First, letâ€™s define Big-O notation.  Big-O notation describes the limiting factor of a function as the argument or domain size of the function approaches a specific value, typically infinity.  Big-O notation describes a growth rate, not a speed.  This is important â€“ one very common misunderstanding about Big-O notation is that it describes how â€œfastâ€ a function will be â€“ and this is never true.  It describes how a function or algorithm will scale, but says absolutely nothing about speed.

When programming, Big-O notation can be used to describe many different things.  Most frequently it is describing the algorithm in terms of instruction count or speed, but can also be used to describe memory or resource usage.  Itâ€™s important to know what is being described when making decisions based off this information, and seeing that an algorithm is â€œO(N)â€ isnâ€™t enough information, in and of itself, to know what that means.

When and why is this useful?  It becomes very important to understand this when we start dealing with collections, especially if the collection is large or of an unknown scale.  For example, say youâ€™re going to do some basic image processing, and you have the choice between a couple of different algorithms.  If each algorithmâ€™s limiting behavior is based off the total image size, and the algorithmâ€™s operation count is described in Big-O notation, the total number of operations will vary dramatically as your image size changes.  For example, letâ€™s say we have the choice of 3 algorithms described by three different, but common Big-O notations:

• Algorithm A: Linear – O(N)
• Algorithm B: Logarithmic – O(log N)
• Algorithm C: Quadratic – O(N^2)
Image Resolution Total Pixels Potential Instructions: A Potential Instructions: B Potential Instructions: C
16×16 256 256 2 65,536
256×256 65,536 65,536 5 4,294,967,296
512×512 262,144 262,144 5 68,719,476,736

[I am, for here, assuming Log in base 10, but for Big-O, it could be any logarithmic base and still be O(log N)]

Note the huge discrepancy in terms of instruction count!  Algorithm C is a quadratic algorithm, meaning that its instruction count increases by the square of the number of elements (pixels, in this case).

Now, given the information above, it seems like you would naturally always choose Algorithm B â€“ it scales much, much better.  Often, this is true, and this is a good way to choose between algorithms, but we donâ€™t have enough information to make an educated decision at this point.

Remember above â€“ I mentioned that Big-O describes a growth rate, and not a speed.  In addition, it describes an upper bound for the growth rate, not necessarily the average case scenario.  For example, it may be that algorithm C is the fastest algorithm for processing a single pixel â€“ perhaps by a factor of thousands.  Letâ€™s look at these algorithms, and assign the number of actual microseconds each pixel takes to process on average, and look at the same chart as above:

• Algorithm A: 0.1 seconds
• Algorithm B: 10 seconds
• Algorithm C: 50 microseconds
Image Resolution Total Pixels Theoretical Execution Time: A Theoretical Execution Time: B Theoretical Execution Time: C
16×16 256 25.6 seconds 20 seconds 3.3 seconds
256×256 65,536 6,553.6 seconds 50 seconds 214,748 seconds
512×512 262,144 26,214 seconds 50 seconds â€¦too longâ€¦

This shows off a basic misconception of Big-O notation.  Itâ€™s not about speed.  Granted, if youâ€™re planning to deal with megapixel imagery, youâ€™d probably still choose the algorithm above with the best Big-O notation.  However, there are two things that may trip you up here when youâ€™re developing this algorithm.

1. If you developed the three algorithms above looking only at a small image, youâ€™d probably have picked algorithm C, hands down, because the per-pixel routine is so dramatically faster.
2. If you developed the three algorithms above with a large image, youâ€™d have given up on algorithm C quickly, because of itâ€™s poor scalability.

However, both of those are not necessarily good things.

In case 1: The reason this is bad is obvious.  Later, when you fed the routine a large image, youâ€™d be in trouble (especially if you didnâ€™t realize that it was quadratic at the time).

In case 2: If this routine was always going to be used on very small images, Algorithm C is a superior choice, even though it has the worst Big-O description.  This is because itâ€™s single element speed is so much better.

Also, Big-O only describes the upper bound (which is very important), but the average case of an O(N^2) function may actually be closer to N Log N (quicksort is this way).  It may be that, when we actually look at real execution times in a profiler with real data, the algorithm performs, on average, in N log (N), in which case the â€œrealâ€ runtime may be more like:

Image Resolution Total Pixels â€œRealâ€ Execution Time: A â€œRealâ€ Execution Time: B â€œRealâ€ Execution Time: C
16×16 256 21 seconds 18 seconds 0.003 seconds
256×256 65,536 3,531 seconds 42 seconds 3.5 seconds
512×512 262,144 21,454 seconds 43 seconds 8 seconds

All of a sudden, our O(N^2) algorithm is looking very, very good â€“ much better than any of our other algorithms!

What can we take from this?  There are a couple of important points:

• Big-O Notation can provide critical information about the scalability of an algorithm.  Coupled with information about the algorithm itself, this can help us decide which of multiple algorithms will be the â€œbestâ€ choice.
• Big-O Notation does not provide information about speed.  Factors in the algorithm itself, including total number of instructions required, are not included as part of this notation.
• Knowing the Big-O notation for a method, or doing your own analysis of the algorithm, is not a substitute for profiling.  â€œBadâ€ routines in terms of their Big-O notation can outperform â€œgoodâ€ routines, in certain situations. About Reed
Reed Copsey, Jr. - http://www.reedcopsey.com - http://twitter.com/ReedCopsey

1. Youha says:
2. Marc says: