Published: February 7, 2021
Recently, this tweet crossed my path:
Most candidates cannot solve this interview problem:
— Alexey Grigorev (@Al_Grigor) February 3, 2021
🔸 Input: "aaaabbbcca"
🔸 Output: [("a", 4), ("b", 3), ("c", 2), ("a", 1)]
Write a function that converts the input to the output
I ask it in the screening interview and give it 25 minutes
How would you solve it?
One can debate the merits of the question itself and wonder if the intended functionality is to produce a run-length encoding. Or perhaps a "simple" answer is good enough.
def func(x):
— Robert Smallshire (@robsmallshire) February 3, 2021
return [("a", 4), ("b", 3), ("c", 2), ("a", 1)]
However, assuming that trolling the interviewer isn't the intended outcome, what kind of answer would you provide?
Lately, I've been on a bit of a "back to basics" kick. Thus, I thought about how I'd solve the problem in the most "uninteresting" way possible. No imports. No fancy code golf. Just plain Python code. Here's what I came up with:
Ok, I'll bite. I'd hire myself solely for the fact that I'm the only lunatic in this whole thread that actually used a comment. pic.twitter.com/M8pLAj7QGr
— David Beazley (@dabeaz) February 6, 2021
An astute follower noted that perhaps an early "return" statement would be better and even less clever. Upon reflection, I think I agree. Here is a modified, even more uninteresting, version of the code.
I think this is a good idea. In fact, doing this makes the code generic! pic.twitter.com/BG5uIifARb
— David Beazley (@dabeaz) February 7, 2021
Not only was less clever better, it even had the extra side effect of making the code completely generic--now working with any sequence whatsoever. Win!
As a software engineer, it's easy to look at a "simple" solution and think that it's too verbose, too clumsy, or too embarrassing to share. Surely there must be a better way. Indeed, perhaps something like this:
I would think a "software engineer" would avoid the unnecessary dependency when the standard library already makes short work of the problem:
— Raymond Hettinger (@raymondh) February 6, 2021
>>> [(k, len(list(g))) for k, g in groupby(s)]
[('a', 4), ('b', 3), ('c', 2), ('a', 1)]
Hmmmm. A one-liner using Python's groupby() function from the itertools module--presented by Raymond Hettinger, the author of itertools. Let's put it in a function:
from itertools import groupby def rle_groupby(s): return [ (k, len(list(g))) for k, g in groupby(s) ]
Now, let's load it up in iPython and have a little race...
In [1]: %timeit rle("aaaabbbcca") 758 ns ± 1.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [2]: %timeit rle_groupby("aaaabbbcca") 1.08 µs ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Ah, simple beats concise by a good 40% in performance! Excellent!
Everything gets a bit more interesting if the input string, instead of having a single trailing "a" at the end, has 100 million! Like this:
>>> s = "aaaabbbcc" + "a"*100_000_000 >>> rle(s) [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] >>>
So, what's interesting about that you ask? Let's do a little memory profiling!
>>> import tracemalloc >>> tracemalloc.start() >>> rle(s) [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] >>> tracemalloc.get_traced_memory() (8751, 11893) >>> rle_groupby(s) [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] >>> tracemalloc.get_traced_memory() (9831, 835138293) >>>
No, you do not need to get your eyes checked. Using groupby() incurs more than 800 megabytes of memory overhead! Curiously, the rle_groupby() runs 3x faster on this input despite that--because, well, of course it would (although figuring out why is left as an exercise).
>>> s = "aaaabbbcc" + "a"*100_000_000 >>> import time >>> start = time.time(); rle(s); time.time() - start [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] 4.459298849105835 >>> start = time.time(); rle_groupby(s); time.time() - start [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] 1.352071762084961 >>>
Someone suggests that maybe building a list is the problem.
Is it better to do len(list(g)) or sum(1 for _ in g)?
— Dr. Mahdi Hosseinali (@HosseinaliMahdi) February 7, 2021
An interesting idea. Let's put it in a function and try it out:
>>> def rle_sumby(text): ... return [ (ch, sum(1 for _ in items)) for ch, items in itertools.groupby(text) ] ... >>> s = "aaaabbbcc" + "a"*100_000_000 >>> import tracemalloc >>> tracemalloc.start() >>> rle_sumby(s) [('a', 4), ('b', 3), ('c', 2), ('a', 100000000)] >>> tracemalloc.get_traced_memory() (8751, 11893) >>>
Indeed, that does fix the memory problem. However, it's still more than twice as slow as the original simple solution on the original input.
In [2]: %timeit rle_sumby("aaaabbbcca") 1.67 µs ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [3]: %timeit rle("aaaabbbcca") 754 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The thing that I like about this is the fact that just because you decide to use a built-in function doesn't mean that you get to ignore thinking about the problem. There are often subtle corner cases.
Instead of focusing on clever, perhaps the solution to this problem is to make it fully "enterprise grade". For example, maybe this generic fully-typed solution that's almost certain to land someone a job on appearance alone:
— glyph (@glyph) February 6, 2021
Let's take that code, put it in a function called rle_generic(), and test drive it:
In [1]: %timeit rle("aaaabbbcca") 749 ns ± 0.784 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [2]: %timeit rle_generic("aaaabbbcca") 1.97 µs ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Here, simple beats the performance of "enterprise grade" by a good 160%. Of course, naturally the whole point of this last solution was to be generic. So, surely both functions should work with a list of type list[Optional[int]]. For example:
>>> rle([1,1,1,1,None,None,None,3]) [(1, 4), (None, 3), (3, 1)] >>> rle_generic([1,1,1,1,None,None,None,3]) [(1, 4), (3, 1)] >>>
Alas, it seems that the simple solution beats the typed solution in actual correctness too. No, not correctness. Types.
I don't know if there's any big takeaway from this exercise other than you shouldn't be afraid of the "simple" solution. Sometimes it will run faster, use less memory, and be easier to understand--people will thank you! You might even thank yourself when you have to refactor your code six months from now.
This modification was later suggested!
I think if count > 0 check is redundant now.
— . (@ayakzob) February 7, 2021
Yes! There is always room for making it even more simple!
Someone independently verified the performance and even compared it to using numpy. Simple is competitive.
[ More Bits ]I tried three solutions (David's, itertools, numpy) on both PyPy and CPython. On PyPy, David's is fastest. (On PyPy, numpy is SLOW.) But I was surprised to see that his solution ekes out a win on CPython, too! pic.twitter.com/XRu0ZNoccj
— Sigh of the Tiger (@double_thunk) February 7, 2021
Copyright (C) 2005-2024, David Beazley