Writing a Reduce Function in CouchDb

This note relates to:

  • CouchDb version 1.0.1
  • curl version 7.21.0
  • Ubuntu 10.10

References:

This article discusses some of the details in writing a reduce function for a CouchDb view. A reduce function is used to perform server-side operations on a number of rows returned by a view without having to send the rows to the client, only the end result.

The tricky part of reduce functions is that they must be written to handle two “modes”: reduce and re-reduce. The signature of a reduce function is as follows:

If the parameter “rereduce” is reset (false), then the function is called in a “reduce” mode. If the parameter “rereduce” is set (true), then the function is called in a “re-reduce” mode.

The aim of a reduce function is to return one value (one javascript entity, a scalar, a string, an array, an object…) that represents the result of an operation over a set of rows selected by a view. Ultimately, the result of the reduce function is sent to the client.

The reason for the two modes is that the reduce function is not always given at once all the rows that the operation must be performed over. For efficiency reasons, including caching and reasons related to database architecture, there are circumstances where the operation is repeated over subsets of all rows, and then these results are combined into a final one.

The “reduce” mode is used to create a final result when it is called over all the rows. When only a subset of rows are given in the “reduce” mode, then the result is an intermediate result, which will be given back to the reduce function in “re-reduce” mode.

The “re-reduce” mode can be called once or multiple times with intermediate results to produce the final result.

Therefore, the tricky part of reduce function is to write them in such a way that:

  1. the keys and values from a view can be accepted as input
  2. the result must be convenient as the output for the client
  3. the result of the reduce function must be accepted as input in the case of “re-reduce”

The remainder of this note is an example of a reduce function that computes simple statistics over a set of scores. The example follows these steps:

  1. Create a database in CouchDb
  2. Install a design document with the map and reduce function that is tested
  3. Load a number of documents, which are score results
  4. Request the reduction to access the expected statistics

In this example, it is assumed that the CouchDb database is located at http://127.0.0.1:5984. Also, it is assumed that there are no assigned administrators (anyone can write to the database).

Create Database

curl is used to perform all operations.

Install Design Document

Create a text file named “design.txt” with the following content:

Load design document:

Load Documents

Consume View and Reduction
To see the output of the view:

The following result should be reported:

To include the reduction:

which should lead to this report:

Watching the reduction
Looking at the CouchDb logs helps in the understanding of the steps taken by the reduction function:

Add more document:

Some of the logs show the function used in “reduce” mode:

Some of the logs show the function used in “re-reduce” mode:

Explanation
To help understanding, let’s reproduce the content of the reduce function, here:

In “reduce” mode, the parameter “keys” is populated with an array of elements, each element being an association (array) between a key and a document identifier. In that mode, the parameter “values” is an array of values reported by the view. In the example above, the first part of the function is skipped during the “reduce” mode. The last part of the fucntion accepts scalar values and computes top, bottom, sum and count of the scores. Finally, it computes an average over those scores.

As discussed earlier, this result can be the final result, or an intermediate result. It is impossible for the reduce function to predict how the result is to be used.

In “re-reduce” mode, the parameter “keys” is null while the parameter “values” contains a set of intermediate results. In the example above, the first part of the function is used to merge the intermediate results into a new one. This new result could be the final result, or it could be a new intermediate result.

Reduce functions over subset of a View

A reduction does not have to be over the complete set returned by a view. For example, to see only a subset:

yields only some students:

If reduction is included:

then:

Conclusion
Reduce functions can be tricky because of the dual usage. The modes in use are controlled by the CouchDb database and the person designing a reduce function must take into account the various permutations.

NOTE:Do not leave the log statements in view map and reduce functions since they degrade performance.

Remove document duplicates from CouchDb view query using a list function

This note relates to CouchDb 1.0.1

In CouchDb, documents accessible via a view can be mapped to multiple keys. When querying for multiple keys, it is possible for a document to be returned multiple times. In some circumstances, this might be the desired behaviour. However, when the desired semantics are to retrieve only one copy of each document matching any key, without duplicates, a different approach is required.

As a note of caution, this article might provide a complicated solution to a problem easily solved another way. I was under the impression that the work covered here could be easily done using a special flag on a view query. However, I can not readily find it. I am hoping someone will come around and comment on this article with a simpler approach. Until then, the solution presented here will suffice.

The result of a view query is a JSON object that contains an array of rows, each row reporting a document matching the query. A list function is used to transform the result of a view query into a format desired for output. One advantage of using a list function is that a list function has a chance of inspecting each row (or document) before sending to the output.

In this approach, we use a list function to output a result in the exact same format as a view query, suppressing duplicates of documents that were already sent.

The following list function is generic enough to be used any view that emit the documents as values:

The input parameter called “head” is used to retrieve the total number of rows and the offset. Then, the list function outputs the “rows” member. Each row is sent as a JSON string, so the list function must take care of inserting the commas at the right place. A map (javascript object) called “ids” is used recall which documents have already been sent. The key used in the map is the identifier of the document. When a document has already been sent, it is skipped.

For example, if a query to a view named “testview” yielded duplicates of a document using the following URL:
http://127.0.0.1:5984/db/_design/test/_view/testview
then duplicates would be removed if the above function was named “noduplicate” and the following URL employed:
http://127.0.0.1:5984/db/_design/test/_list/noduplicate/testview

In conclusion, the presented function is generic enough to be reused in many situations. However, I suspect that a much easier way to perform this will be designed shortly, if it does not already exist.

Installing CouchApp on Ubuntu 10.04

CouchApp is a python tool to help develop, upload and clone applications meant for couchDb. Those applications are also known as “couchApps”.

The following recipe is used to install couchapp on Ubuntu 10.04. To use couchapp, you probably first need to install “couchdb”, but this is readily available from the usual repositories.

The issue in installing couchapp on Ubuntu 10.04 is that one needs to rely on some personal packages made available via launchpad.net.

Warning: This recipe installs keys from developers on your platform. From this point on, your platform will trust packages made available from those individuals.

From a high level view, two packages are required:

  1. add-apt-repository: utility tool to easily add a new repository
  2. couchapp : the python tool itself
  3. python-restkit: a python library that couchapp is dependent on

Installing add-apt-repository

Installing python-restkit

Installing couchapp