paperlined.org
dev > concepts > stream_processing
document updated 11 years ago, on Sep 18, 2012
When processing large amounts of data (eg. calculating statistics for reports, doing GROUP-BY operations, etc), the simplest way to implement it is [usually] to collect all of the data in RAM, and then iterate over the array.

When implementing it this way, the code is relatively small, and easy to comprehend.

However, there are several downsides when working with very large datasets, or data sets that take time to retrieve:

Stream processing means handling the input in smaller chunks, and keeping data in RAM only if you really need to.

Texts

Wikipedia articles

Implementations

From the user's standpoint, it's a relatively simple concept. However, it can be implemented in various ways, and the code isn't always as easy to understand:

Perl modules