This note relates to:
- CouchDb version 1.0.1
- curl version 7.21.0
- Ubuntu 10.10
References:
This article discusses some of the details in writing a reduce function for a CouchDb view. A reduce function is used to perform server-side operations on a number of rows returned by a view without having to send the rows to the client, only the end result.
The tricky part of reduce functions is that they must be written to handle two “modes”: reduce and re-reduce. The signature of a reduce function is as follows:
|
function(keys, values, rereduce) { ... } |
If the parameter “rereduce” is reset (false), then the function is called in a “reduce” mode. If the parameter “rereduce” is set (true), then the function is called in a “re-reduce” mode.
The aim of a reduce function is to return one value (one javascript entity, a scalar, a string, an array, an object…) that represents the result of an operation over a set of rows selected by a view. Ultimately, the result of the reduce function is sent to the client.
The reason for the two modes is that the reduce function is not always given at once all the rows that the operation must be performed over. For efficiency reasons, including caching and reasons related to database architecture, there are circumstances where the operation is repeated over subsets of all rows, and then these results are combined into a final one.
The “reduce” mode is used to create a final result when it is called over all the rows. When only a subset of rows are given in the “reduce” mode, then the result is an intermediate result, which will be given back to the reduce function in “re-reduce” mode.
The “re-reduce” mode can be called once or multiple times with intermediate results to produce the final result.
Therefore, the tricky part of reduce function is to write them in such a way that:
- the keys and values from a view can be accepted as input
- the result must be convenient as the output for the client
- the result of the reduce function must be accepted as input in the case of “re-reduce”
The remainder of this note is an example of a reduce function that computes simple statistics over a set of scores. The example follows these steps:
- Create a database in CouchDb
- Install a design document with the map and reduce function that is tested
- Load a number of documents, which are score results
- Request the reduction to access the expected statistics
In this example, it is assumed that the CouchDb database is located at http://127.0.0.1:5984. Also, it is assumed that there are no assigned administrators (anyone can write to the database).
Create Database
curl is used to perform all operations.
|
curl -X PUT http://127.0.0.1:5984/db |
Install Design Document
Create a text file named “design.txt” with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
|
{ "_id" : "_design/db" ,"views" : { "stats" : { "map" : "function(doc){ if( typeof(doc.name) === 'string' && typeof(doc.score) === 'number' ) { emit(doc.name, doc.score); }; }" ,"reduce" : "function(keys,values,rereduce){ if( rereduce ) { var result = { topScore: values[0].topScore ,bottomScore: values[0].bottomScore ,sum: values[0].sum ,count: values[0].count }; for(var i=1,e=values.length; i<e; ++i) { result.sum = result.sum + values[i].sum; result.count = result.count + values[i].count; if( result.topScore < values[i].topScore ) { result.topScore = values[i].topScore; }; if( result.bottomScore > values[i].bottomScore ) { result.bottomScore = values[i].bottomScore; }; }; result.mean = (result.sum / result.count); log('rereduce keys:'+toJSON(keys)+' values:'+toJSON(values)+' result:'+toJSON(result)); return result; }; // Non-rereduce case var result = { topScore: values[0] ,bottomScore: values[0] ,sum: values[0] ,count: 1 }; for(var i=1,e=keys.length; i<e; ++i) { result.sum = result.sum + values[i]; result.count = result.count + 1; if( result.topScore < values[i] ) { result.topScore = values[i]; }; if( result.bottomScore > values[i] ) { result.bottomScore = values[i]; }; }; result.mean = (result.sum / result.count); log('reduce keys:'+toJSON(keys)+' values:'+toJSON(values)+' result:'+toJSON(result)); return result; }" } } } |
Load design document:
|
curl -X PUT http://127.0.0.1:5984/db/_design/db --upload-file design.txt |
Load Documents
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Alicia","score":85}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Beth","score":87}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Carmen","score":58}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Dalida","score":62}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Elizabeth","score":71}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Fiona","score":75}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Gertrude","score":94}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Halle","score":76}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Irene","score":82}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Julia","score":73}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Kim","score":75}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Lynn","score":91}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Mary","score":56}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Nancy","score":66}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Olie","score":80}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Pat","score":69}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Queen","score":89}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Roseline","score":93}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Sally","score":62}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Trudy","score":71}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Una","score":80}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Victoria","score":79}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Willow","score":68}' |
Consume View and Reduction
To see the output of the view:
|
curl -X GET http://127.0.0.1:5984/db/_design/db/_view/stats?reduce=false |
The following result should be reported:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
|
{"total_rows":23,"offset":0,"rows":[ {"id":"7ab05a72d3cf2ad68c5816713e07efc5","key":"Alicia","value":85}, {"id":"7ab05a72d3cf2ad68c5816713e07f78f","key":"Beth","value":87}, {"id":"7ab05a72d3cf2ad68c5816713e07f81a","key":"Carmen","value":58}, {"id":"7ab05a72d3cf2ad68c5816713e0804bc","key":"Dalida","value":62}, {"id":"7ab05a72d3cf2ad68c5816713e081063","key":"Elizabeth","value":71}, {"id":"7ab05a72d3cf2ad68c5816713e081657","key":"Fiona","value":75}, {"id":"7ab05a72d3cf2ad68c5816713e081cf7","key":"Gertrude","value":94}, {"id":"7ab05a72d3cf2ad68c5816713e0824d5","key":"Halle","value":76}, {"id":"7ab05a72d3cf2ad68c5816713e08349e","key":"Irene","value":82}, {"id":"7ab05a72d3cf2ad68c5816713e083a75","key":"Julia","value":73}, {"id":"7ab05a72d3cf2ad68c5816713e083c86","key":"Kim","value":75}, {"id":"7ab05a72d3cf2ad68c5816713e0845b6","key":"Lynn","value":91}, {"id":"7ab05a72d3cf2ad68c5816713e084c70","key":"Mary","value":56}, {"id":"7ab05a72d3cf2ad68c5816713e085c23","key":"Nancy","value":66}, {"id":"7ab05a72d3cf2ad68c5816713e0863dc","key":"Olie","value":80}, {"id":"7ab05a72d3cf2ad68c5816713e086808","key":"Pat","value":69}, {"id":"7ab05a72d3cf2ad68c5816713e087734","key":"Queen","value":89}, {"id":"7ab05a72d3cf2ad68c5816713e0878d9","key":"Roseline","value":93}, {"id":"7ab05a72d3cf2ad68c5816713e087945","key":"Sally","value":62}, {"id":"7ab05a72d3cf2ad68c5816713e0887ee","key":"Trudy","value":71}, {"id":"7ab05a72d3cf2ad68c5816713e08978a","key":"Una","value":80}, {"id":"7ab05a72d3cf2ad68c5816713e08a59f","key":"Victoria","value":79}, {"id":"7ab05a72d3cf2ad68c5816713e08b14e","key":"Willow","value":68} ]} |
To include the reduction:
|
curl -X GET http://127.0.0.1:5984/db/_design/db/_view/stats |
which should lead to this report:
|
{"rows":[ {"key":null,"value":{ "topScore":94 ,"bottomScore":56 ,"sum":1742 ,"count":23 ,"mean":75.73913043478261} } ]} |
Watching the reduction
Looking at the CouchDb logs helps in the understanding of the steps taken by the reduction function:
|
sudo tail -f /var/log/couchdb/couch.log |
Add more document:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Al","score":85}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Ben","score":87}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Carl","score":58}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"David","score":62}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Erik","score":71}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Fred","score":75}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Gordon","score":93}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Horton","score":76}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Ivan","score":82}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Jim","score":73}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Kyle","score":75}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Ludvig","score":91}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Mike","score":53}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Nefario(Dr)","score":66}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Oscar","score":80}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Peter","score":69}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Quentin","score":89}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Rob","score":93}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Sam","score":62}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Taylor","score":71}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Ulysse","score":80}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Victor","score":79}' curl -X POST http://127.0.0.1:5984/db -H 'Content-Type: application/json' -d '{"name":"Walter","score":68}' |
Some of the logs show the function used in “reduce” mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
reduce keys: [ ["Rob","7ab05a72d3cf2ad68c5816713e07c82c"] ,["Roseline","7ab05a72d3cf2ad68c5816713e0736c9"] ,["Sally","7ab05a72d3cf2ad68c5816713e0741b5"] ,["Sam","7ab05a72d3cf2ad68c5816713e07cc19"] ,["Taylor","7ab05a72d3cf2ad68c5816713e07cd53"] ,["Trudy","7ab05a72d3cf2ad68c5816713e0741b7"] ,["Ulysse","7ab05a72d3cf2ad68c5816713e07d97b"] ,["Una","7ab05a72d3cf2ad68c5816713e0746bf"] ,["Victor","7ab05a72d3cf2ad68c5816713e07e36f"] ,["Victoria","7ab05a72d3cf2ad68c5816713e07478c"] ,["Walter","7ab05a72d3cf2ad68c5816713e07eb73"] ,["Willow","7ab05a72d3cf2ad68c5816713e074906"] ] values: [ 93,93,62,62,71,71,80,80,79,79,68,68 ] result:{ "topScore":93 ,"bottomScore":62 ,"sum":906 ,"count":12 ,"mean":75.5 } |
Some of the logs show the function used in “re-reduce” mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
rereduce keys:null values:[ { "topScore":91 ,"bottomScore":53 ,"sum":974 ,"count":13 ,"mean":74.92307692307692 },{ "topScore":94 ,"bottomScore":58 ,"sum":2506 ,"count":33 ,"mean":75.93939393939394} ] result:{ "topScore":94 ,"bottomScore":53 ,"sum":3480 ,"count":46 ,"mean":75.65217391304348 } |
Explanation
To help understanding, let’s reproduce the content of the reduce function, here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
|
function(keys,values,rereduce){ if( rereduce ) { var result = { topScore: values[0].topScore ,bottomScore: values[0].bottomScore ,sum: values[0].sum ,count: values[0].count }; for(var i=1,e=values.length; i<e; ++i) { result.sum = result.sum + values[i].sum; result.count = result.count + values[i].count; if( result.topScore < values[i].topScore ) { result.topScore = values[i].topScore; }; if( result.bottomScore > values[i].bottomScore ) { result.bottomScore = values[i].bottomScore; }; }; result.mean = (result.sum / result.count); log('rereduce keys:'+toJSON(keys)+' values:'+toJSON(values)+' result:'+toJSON(result)); return result; }; // Non-rereduce case var result = { topScore: values[0] ,bottomScore: values[0] ,sum: values[0] ,count: 1 }; for(var i=1,e=keys.length; i<e; ++i) { result.sum = result.sum + values[i]; result.count = result.count + 1; if( result.topScore < values[i] ) { result.topScore = values[i]; }; if( result.bottomScore > values[i] ) { result.bottomScore = values[i]; }; }; result.mean = (result.sum / result.count); log('reduce keys:'+toJSON(keys)+' values:'+toJSON(values)+' result:'+toJSON(result)); return result; } |
In “reduce” mode, the parameter “keys” is populated with an array of elements, each element being an association (array) between a key and a document identifier. In that mode, the parameter “values” is an array of values reported by the view. In the example above, the first part of the function is skipped during the “reduce” mode. The last part of the fucntion accepts scalar values and computes top, bottom, sum and count of the scores. Finally, it computes an average over those scores.
As discussed earlier, this result can be the final result, or an intermediate result. It is impossible for the reduce function to predict how the result is to be used.
In “re-reduce” mode, the parameter “keys” is null while the parameter “values” contains a set of intermediate results. In the example above, the first part of the function is used to merge the intermediate results into a new one. This new result could be the final result, or it could be a new intermediate result.
Reduce functions over subset of a View
A reduction does not have to be over the complete set returned by a view. For example, to see only a subset:
|
curl -X GET 'http://127.0.0.1:5984/db/_design/db/_view/stats?startkey="k"&endkey="n"&reduce=false' |
yields only some students:
|
{"total_rows":46,"offset":20,"rows":[ {"id":"7ab05a72d3cf2ad68c5816713e083c86","key":"Kim","value":75}, {"id":"7ab05a72d3cf2ad68c5816713e08d612","key":"Kyle","value":75}, {"id":"7ab05a72d3cf2ad68c5816713e08de9c","key":"Ludvig","value":91}, {"id":"7ab05a72d3cf2ad68c5816713e0845b6","key":"Lynn","value":91}, {"id":"7ab05a72d3cf2ad68c5816713e084c70","key":"Mary","value":56}, {"id":"7ab05a72d3cf2ad68c5816713e08e00a","key":"Mike","value":53} ]} |
If reduction is included:
|
curl -X GET 'http://127.0.0.1:5984/db/_design/db/_view/stats?startkey="k"&endkey="n"' |
then:
|
{"rows":[ {"key":null,"value":{"topScore":91,"bottomScore":53,"sum":441,"count":6,"mean":73.5}} ]} |
Conclusion
Reduce functions can be tricky because of the dual usage. The modes in use are controlled by the CouchDb database and the person designing a reduce function must take into account the various permutations.
NOTE:Do not leave the log statements in view map and reduce functions since they degrade performance.