The Interview Question

Published: February 7, 2021

Recently, this tweet crossed my path:

Most candidates cannot solve this interview problem:

🔸 Input: "aaaabbbcca"
🔸 Output: [("a", 4), ("b", 3), ("c", 2), ("a", 1)]

Write a function that converts the input to the output

I ask it in the screening interview and give it 25 minutes

How would you solve it?
— Alexey Grigorev (@Al_Grigor) February 3, 2021

One can debate the merits of the question itself and wonder if the intended functionality is to produce a run-length encoding. Or perhaps a "simple" answer is good enough.

def func(x):
return [("a", 4), ("b", 3), ("c", 2), ("a", 1)]
— Robert Smallshire (@robsmallshire) February 3, 2021

However, assuming that trolling the interviewer isn't the intended outcome, what kind of answer would you provide?

A Simple Solution

Lately, I've been on a bit of a "back to basics" kick. Thus, I thought about how I'd solve the problem in the most "uninteresting" way possible. No imports. No fancy code golf. Just plain Python code. Here's what I came up with:

Ok, I'll bite. I'd hire myself solely for the fact that I'm the only lunatic in this whole thread that actually used a comment. pic.twitter.com/M8pLAj7QGr
— David Beazley (@dabeaz) February 6, 2021

An astute follower noted that perhaps an early "return" statement would be better and even less clever. Upon reflection, I think I agree. Here is a modified, even more uninteresting, version of the code.

I think this is a good idea. In fact, doing this makes the code generic! pic.twitter.com/BG5uIifARb
— David Beazley (@dabeaz) February 7, 2021

Not only was less clever better, it even had the extra side effect of making the code completely generic--now working with any sequence whatsoever. Win!

A Concise Solution

As a software engineer, it's easy to look at a "simple" solution and think that it's too verbose, too clumsy, or too embarrassing to share. Surely there must be a better way. Indeed, perhaps something like this:

I would think a "software engineer" would avoid the unnecessary dependency when the standard library already makes short work of the problem:

>>> [(k, len(list(g))) for k, g in groupby(s)]
[('a', 4), ('b', 3), ('c', 2), ('a', 1)]
— Raymond Hettinger (@raymondh) February 6, 2021

Hmmmm. A one-liner using Python's groupby() function from the itertools module--presented by Raymond Hettinger, the author of itertools. Let's put it in a function:

from itertools import groupby
def rle_groupby(s):
    return [ (k, len(list(g))) for k, g in groupby(s) ]

Now, let's load it up in iPython and have a little race...

In [1]: %timeit rle("aaaabbbcca")
758 ns ± 1.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]: %timeit rle_groupby("aaaabbbcca")
1.08 µs ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Ah, simple beats concise by a good 40% in performance! Excellent!

Everything gets a bit more interesting if the input string, instead of having a single trailing "a" at the end, has 100 million! Like this:

>>> s = "aaaabbbcc" + "a"*100_000_000
>>> rle(s)
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
>>>

So, what's interesting about that you ask? Let's do a little memory profiling!

>>> import tracemalloc
>>> tracemalloc.start()
>>> rle(s)
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
>>> tracemalloc.get_traced_memory()
(8751, 11893)
>>> rle_groupby(s)
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
>>> tracemalloc.get_traced_memory()
(9831, 835138293)
>>>

No, you do not need to get your eyes checked. Using groupby() incurs more than 800 megabytes of memory overhead! Curiously, the rle_groupby() runs 3x faster on this input despite that--because, well, of course it would (although figuring out why is left as an exercise).

>>> s = "aaaabbbcc" + "a"*100_000_000
>>> import time
>>> start = time.time(); rle(s); time.time() - start
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
4.459298849105835
>>> start = time.time(); rle_groupby(s); time.time() - start
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
1.352071762084961
>>>

Someone suggests that maybe building a list is the problem.

Is it better to do len(list(g)) or sum(1 for _ in g)?
— Dr. Mahdi Hosseinali (@HosseinaliMahdi) February 7, 2021

An interesting idea. Let's put it in a function and try it out:

>>> def rle_sumby(text):
...     return [ (ch, sum(1 for _ in items)) for ch, items in itertools.groupby(text) ]
...
>>> s = "aaaabbbcc" + "a"*100_000_000
>>> import tracemalloc
>>> tracemalloc.start()
>>> rle_sumby(s)
[('a', 4), ('b', 3), ('c', 2), ('a', 100000000)]
>>> tracemalloc.get_traced_memory()
(8751, 11893)
>>>

Indeed, that does fix the memory problem. However, it's still more than twice as slow as the original simple solution on the original input.

In [2]: %timeit rle_sumby("aaaabbbcca")
1.67 µs ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit rle("aaaabbbcca")
754 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The thing that I like about this is the fact that just because you decide to use a built-in function doesn't mean that you get to ignore thinking about the problem. There are often subtle corner cases.

An Enterprise Solution

Instead of focusing on clever, perhaps the solution to this problem is to make it fully "enterprise grade". For example, maybe this generic fully-typed solution that's almost certain to land someone a job on appearance alone:

pic.twitter.com/9MTY4q9Efy
— glyph (@glyph) February 6, 2021

Let's take that code, put it in a function called rle_generic(), and test drive it:

In [1]: %timeit rle("aaaabbbcca")
749 ns ± 0.784 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]: %timeit rle_generic("aaaabbbcca")
1.97 µs ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Here, simple beats the performance of "enterprise grade" by a good 160%. Of course, naturally the whole point of this last solution was to be generic. So, surely both functions should work with a list of type list[Optional[int]]. For example:

>>> rle([1,1,1,1,None,None,None,3])
[(1, 4), (None, 3), (3, 1)]
>>> rle_generic([1,1,1,1,None,None,None,3])
[(1, 4), (3, 1)]
>>>

Alas, it seems that the simple solution beats the typed solution in actual correctness too. No, not correctness. Types.

Takeaways

I don't know if there's any big takeaway from this exercise other than you shouldn't be afraid of the "simple" solution. Sometimes it will run faster, use less memory, and be easier to understand--people will thank you! You might even thank yourself when you have to refactor your code six months from now.

Postscript

This modification was later suggested!

I think if count > 0 check is redundant now.
— . (@ayakzob) February 7, 2021

Yes! There is always room for making it even more simple!

Someone independently verified the performance and even compared it to using numpy. Simple is competitive.

I tried three solutions (David's, itertools, numpy) on both PyPy and CPython. On PyPy, David's is fastest. (On PyPy, numpy is SLOW.) But I was surprised to see that his solution ekes out a win on CPython, too! pic.twitter.com/XRu0ZNoccj
— Sigh of the Tiger (@double_thunk) February 7, 2021

[ More Bits ]