In our previous episode we wrote a merge sort implementation that runs a bit faster than the one in stdlibc++. The question then becomes, could it be made even faster. If you go through the relevant literature one potential improvement is to do a multiway merge. That is, instead of merging two arrays into one, you merge four into one using, for example, a priority queue.
This seems like a slam dunk for performance.
- Doubling the number of arrays to merge at a time halves the number of total passes needed
- The priority queue has a known static maximum size, so it can be put on the stack, which is guaranteed to be in the cache all the time
- Processing an element takes only log(#lists) comparisons
Why is this so? Maybe there are bugs that cause it to do extra work? Assuming that is not the case, what actually is? Measuring seems to indicate that a notable fraction of the runtime is spent in the priority queue code. Beyond that I got very little to nothing.
The best hypotheses I could come up with has to with the number of comparisons made. A classical merge sort does two if statements per output elements. One to determine which of the two lists has a smaller element at the front and one to see whether removing the element exhausted the list. The former is basically random and the latter is always false except when the last element is processed. This amounts to 0.5 mispredicted branches per element per round.
A priority queue has to do a bunch more work to preserve the heap property. The first iteration needs to check the root and its two children. That's three comparisons for value and two checks whether the children actually exist. Those are much less predictable than the comparisons in merge sort. Computers are really efficient at doing simple things, so it may be that the additional bookkeeping is so expensive that it negates the advantage of fewer rounds.
Or maybe it's something else. Who's to say? Certainly not me. If someone wants to play with the code, the implementation is here. I'll probably delete it at some point as it does not have really any advantage over the regular merge sort.
No comments:
Post a Comment