Such a cornucopia of madness wouldn’t be complete without a bot for certain administrative tasks such as keeping stats on each user.
The “riddle” I posted last year is a question we pondered about for a few days back then:
We keep a log-file of all that is said on our IRC channel. What’s the fastest way to extract one random line said by a specified person from that file, with every line having equal chance of being picked.
To keep the problem interesting, no “persistent” data can be kept in memory over multiple queries, such as an index or a counter.
Adhemar was the only person to propose a solution, but we also asked our professor for Datastructures & Algorithms, Gunnar Brinkman. As it turns out Adhemar’s solution was very close to the one Prof. Brinkmann suggested.
This is the algorithm we were using:
totallines = 1 while not eof(logfile) do currentline = readline(logfile) if (rand() mod totallines) == 0 then currentqoute = currentline totallines++ done
In plain English:
For every line i, pick that line with chance 1/i.
Adhemar’s solution however, is a tad faster on a real-life system because it does not need the relatively expensive mod-operation for every line:
currenthighest = 0 while not eof(logfile) do currentline = readline(logfile) currentrand = rand() if (currentrand >= currenthighest) currentquote = currentline currenthighest = currentrand done
The player who rolls the highest dice gets picked.
Emperical data suggests the second algorithm is about 1% faster than the first. It’s obvious that this problem is an I/O-limited one, so these algorithms are probably as good as it gets without storing any data in memory.
Although the problem is relatively simple, the interesting thing to remember here is how to randomly pick an item from a set with an unknown amount of items.