A Mapper
maps or converts a value frome one data structure to another and decides how that should be grouped.
It does not matter what structure it is mapped into as long it is shared with its corresponding Reducer.
All Mappers must return an array looking like array($key, $structure)
as the key is used to group its mapped
values into buckets. The contents of a bucket is given to a reduce one at the time until a single value is left.
A Reducer
gets two values and has the responsibility to reduce them down into a single value. It is important
the reduced result has the exact same structure as the given arguments.
When a MapReduce or Distributor returns the result, you will get a ResultSet
. By default this is just
a simple wrapper over an array
. But to limit the use of memory this could be implemented with Redis or Memcache
support.
In fact Chute does provide a RedisSet
which uses a redis hash for the results. In order to hinder collisions if
multiple ResultSet’s are used, RedisSet
have a $key
which is a Universally Unique Identifier (UUID) that is
used to isolate its result and being able to retrieve earlier results.
The ResultSet used is controlled by a ResultSetFactory
implementation. This is the third optional argument to the
MapReduce
constructor. As it is optional it defaults to ArrayFactory
which creates ArraySet
instances.
<?php
use Chute\ResultSet\RedisFactory;
$mapReduce = new MapReduce($mapper, $reducer, new RedisFactory);
// $resultSet will now be a `Chute\ResultSet\RedisSet` instance.
$resultSet = $mapReduce->run(new ArrayIterator(array()));
Doing runs in parallel is good for performance, with forks and threads this can be done. Chute provides a Distributor to do this.
<?php
use Chute\Iterator\ChunkedIterator;
use Chute\Distributor\SequentialDistributor;
// $mapReduce contains the same mapper and reducer as the simplistic example further above.
// it will split the ArrayIterator up into two chunks containing (1, 2) and (4, 5). When each
// of the chunks have been completed it will merge the two resultsets together.
$runner = new SequentialDistributor;
$runner->run($mapReduce, new ChunkedIterator(new ArrayIterator([1, 2, 3, 4]), 2);
Of course the above example code is very simple as it just chunks up the iterator and runs them in a sequential way.