2010
12.19

Writing a Reduce Function in CouchDb

This note relates to:

  • CouchDb version 1.0.1
  • curl version 7.21.0
  • Ubuntu 10.10

References:

This article discusses some of the details in writing a reduce function for a CouchDb view. A reduce function is used to perform server-side operations on a number of rows returned by a view without having to send the rows to the client, only the end result.

The tricky part of reduce functions is that they must be written to handle two “modes”: reduce and re-reduce. The signature of a reduce function is as follows:

If the parameter “rereduce” is reset (false), then the function is called in a “reduce” mode. If the parameter “rereduce” is set (true), then the function is called in a “re-reduce” mode.

The aim of a reduce function is to return one value (one javascript entity, a scalar, a string, an array, an object…) that represents the result of an operation over a set of rows selected by a view. Ultimately, the result of the reduce function is sent to the client.

The reason for the two modes is that the reduce function is not always given at once all the rows that the operation must be performed over. For efficiency reasons, including caching and reasons related to database architecture, there are circumstances where the operation is repeated over subsets of all rows, and then these results are combined into a final one.

The “reduce” mode is used to create a final result when it is called over all the rows. When only a subset of rows are given in the “reduce” mode, then the result is an intermediate result, which will be given back to the reduce function in “re-reduce” mode.

The “re-reduce” mode can be called once or multiple times with intermediate results to produce the final result.

Therefore, the tricky part of reduce function is to write them in such a way that:

  1. the keys and values from a view can be accepted as input
  2. the result must be convenient as the output for the client
  3. the result of the reduce function must be accepted as input in the case of “re-reduce”

The remainder of this note is an example of a reduce function that computes simple statistics over a set of scores. The example follows these steps:

  1. Create a database in CouchDb
  2. Install a design document with the map and reduce function that is tested
  3. Load a number of documents, which are score results
  4. Request the reduction to access the expected statistics

In this example, it is assumed that the CouchDb database is located at http://127.0.0.1:5984. Also, it is assumed that there are no assigned administrators (anyone can write to the database).

Create Database

curl is used to perform all operations.

Install Design Document

Create a text file named “design.txt” with the following content:

Load design document:

Load Documents

Consume View and Reduction
To see the output of the view:

The following result should be reported:

To include the reduction:

which should lead to this report:

Watching the reduction
Looking at the CouchDb logs helps in the understanding of the steps taken by the reduction function:

Add more document:

Some of the logs show the function used in “reduce” mode:

Some of the logs show the function used in “re-reduce” mode:

Explanation
To help understanding, let’s reproduce the content of the reduce function, here:

In “reduce” mode, the parameter “keys” is populated with an array of elements, each element being an association (array) between a key and a document identifier. In that mode, the parameter “values” is an array of values reported by the view. In the example above, the first part of the function is skipped during the “reduce” mode. The last part of the fucntion accepts scalar values and computes top, bottom, sum and count of the scores. Finally, it computes an average over those scores.

As discussed earlier, this result can be the final result, or an intermediate result. It is impossible for the reduce function to predict how the result is to be used.

In “re-reduce” mode, the parameter “keys” is null while the parameter “values” contains a set of intermediate results. In the example above, the first part of the function is used to merge the intermediate results into a new one. This new result could be the final result, or it could be a new intermediate result.

Reduce functions over subset of a View

A reduction does not have to be over the complete set returned by a view. For example, to see only a subset:

yields only some students:

If reduction is included:

then:

Conclusion
Reduce functions can be tricky because of the dual usage. The modes in use are controlled by the CouchDb database and the person designing a reduce function must take into account the various permutations.

NOTE:Do not leave the log statements in view map and reduce functions since they degrade performance.

9 comments so far

Add Your Comment
  1. This is great.
    It helped me a lot.
    Thanks

  2. Good explanation. In summary, Views work with a minimum of one map function. Map function helps to get the necessary keys/values available in view.

    If a reduce function is also specified AND used (via omitting reduce flag in the URI of view while calling), then the reduce function output will be shown in the view. The output of a reduce fn need not be just a single value, it can be a JSON object Or a set of JSON objects too.

    Also the reduce function has a re-reduce parameter which if true indicates that the reduce function is being called in intermediate stage by Couch, in which case the keys parameter will be null.

    Thanks!

  3. Congrats, you’re the first person I’ve found on the internet who’s given a practical example of the use of a reduce function! Give yourself a pat on the back friend.

  4. Hi all,
    I have a small problem with CouchDB. I’m using Command Prompt to execute my commands. I’m trying to create a document in CouchDB. The command that i’m trying to execute is as follows:

    C:\Users\Karthik>curl -X POST http://localhost:5984/music -d ‘{“Name”: “Wings”}’
    -H “Content-Type: application/json”
    {“error”:”bad_request”,”reason”:”invalid_json”}
    curl: (3) [globbing] unmatched close brace/bracket at pos 6

    But whenever i try executing this command, i get an error which is as shown above…

    The weird fact is, whenever I try executing the same command (Shown below) without any values, it executes successfully.

    curl -X POST http://localhost:5984/music -d {} -H “Content-Type
    : application/json”
    {“ok”:true,”id”:”8b8814aca5d4ab98cd1eee06da003944″,”rev”:”1-967a00dff5e02add4181
    9138abb3284d”}

    Please guide me about any mistakes that i have done with the first command…
    Thank you

  5. This is a bit off-topic. However, I can create a document using this command

    I wonder if you have to declare all the headers before the data.

  6. Thank you so much for a non trivial example of reduce functions. Excellent presentation, clear and concise. Well done!

  7. This helped me a lot. I used this guide to create a map and reduce for counting instances of specific tags across a set of documents. Thank you very much! https://github.com/elifesciences/elife-couch-flask-prototype/tree/funder-reports/_design/elife-articles/views/funder-report2

  8. The only example on the web that makes you really understand reduce functions beyond simple sum or count. Thank you!

  9. A nice and clean CouchDB rereduce, I’m sure ill find more good stuff from you jpfiset.