The Statistics::TopK module implements the top-k streaming algorithm, also
know as the "heavy hitters" algorithm. It is designed to process data streams
and probabilistally calculate the k most frequent items while using limited
memory.
.
A typical example would be to determine the top 10 IP addresses listed in an
access log. A simple solution would be to hash each IP address to a counter
and then sort the resulting hash by the counter size. But the hash could
theoretically require over 4 billion keys.
.
The top-k algorithm only requires storage space proportional to the number of
items of interest. It accomplishes this by sacrificing precision, as it is
only a probabilistic counter.
Installed Size: 22.5 kB
Architectures: all