SQL使いのためのビュークックブック

View Cookbook for SQL Jockeys

SQL使いのためのビュークックブック

This is a collection of some common SQL queries and how to get the same result in CouchDB. The key to remember here is that CouchDB does not work like an SQL database at all and that best practices from the SQL world do not translate well or at all to CouchDB. This chapter’s “cookbook” assumes that you are familiar with the CouchDB basics such as creating and updating databases and documents.

この章は、いくつかのよくあるSQLによる問い合わせとCouchDBで同じ結果を得るにはどうしたらよいかを集めたものです。ここで思い出してほしいポイントは、CouchDBはSQLデータベースのような動作は全くせず、SQLの世界のベストプラクティスはCouchDBに対してうまくいかない、あるいは全然通用しない、ということです。このクックブックでは、あなたがCouchDBの基本、データベースやドキュメントの作成や更新について、よく知っていることを前提にします。

Using Views

ビューを使用する

How you would do this in SQL:

これをSQLで行うには次のようにします。

CREATE TABLE

or:

または、次のようになります。

ALTER TABLE

Using views is a two-step process. First you define a view; then you query it. This is analogous to defining a table structure (with indexes) using CREATE TABLE or ALTER TABLE and querying it using an SQL query.

ビューを使うには2つのプロセスがあります。まず、ビューを定義すること、そして次にビューに対して問い合わせることです。これは、(インデックス付きで)テーブルの構造をCREATE TABLEあるいはALTER TABLEで定義し、SQLクエリを用いて問い合わせるのと似ています。

Defining a View

ビューを定義する

Defining a view is done by creating a special document in a CouchDB database. The only real specialness is the _id of the document, which starts with _design/—for example, _design/application. Other than that, it is just a regular CouchDB document. To make sure CouchDB understands that you are defining a view, you need to prepare the contents of that design document in a special format. Here is an example:

ビューの定義は、CouchDBに特定のドキュメントを作成することによって行われます。そのドキュメントの唯一特別な点は_idフィールドです。例えば、_design/applicationのように_designという文字で始まります。この点を除いては、普通のCouchDBドキュメントです。CouchDBに定義しようとしているビューを理解させるために、特定のフォーマットでデザインドキュメントのコンテンツを準備する必要があります。以下はサンプルです。

{
  "_id": "_design/application",
  "_rev": "1-C1687D17",
  "views": {
    "viewname": {
      "map": "function(doc) { ... }",
      "reduce": "function(keys, values) { ... }"
    }
  }
}

We are defining a view viewname. The definition of the view consists of two functions: the map function and the reduce function. Specifying a reduce function is optional. We’ll look at the nature of the functions later. Note that viewname can be whatever you like: users, by-name, or by-date are just some examples.

ここではviewnameという名前のビューを定義しています。ビューの定義は、2つの関数からなります。map関数とreduce関数です。reduce関数の定義はオプションです。関数の本質についてはあとで触れます。viewnameというビューの名前は好きなようにつけることができます。例えばusers、by-name、あるいはby-dateなどです。

A single design document can also include multiple view definitions, each identified by a unique name:

一つのデザインドキュメントには、ユニークな名前で一意に特定される複数のビューの定義を格納することができます。

{
  "_id": "_design/application",
  "_rev": "1-C1687D17",
  "views": {
    "viewname": {
      "map": "function(doc) { ... }",
      "reduce": "function(keys, values) { ... }"
    },
    "anotherview": {
      "map": "function(doc) { ... }",
      "reduce": "function(keys, values) { ... }"
    }
  }
}

Querying a View

ビューに対して問い合わせる

The name of the design document and the name of the view are significant for querying the view. To query the view viewname, you perform an HTTP GET request to the following URI:

デザインドキュメントの名前とビューの名前は、ビューの問い合わせにおいて非常に重要な意味を持ちます。viewnameという名前のビューに問い合わせるためには、HTTP GETリクエストを次のURIに発行します。

/database/_design/application/_view/viewname

database is the name of the database you created your design document in. Next up is the design document name, and then the view name prefixed with _view/. To query anotherview, replace viewname in that URI with anotherview. If you want to query a view in a different design document, adjust the design document name.

databaseはあなたがデザインドキュメントを作ったデータベースの名前です。その次に、デザインドキュメントの名前が来て、_view/というプリフィックスに続いてビューの名前が来ます。もし異なる別のデザインドキュメントのビュー（anotherview）に問い合わせたければ、デザインドキュメントの名前（viewname）をanotherviewに置き換えてください。

MapReduce Functions

MapReduce関数

MapReduce is a concept that solves problems by applying a two-step process, aptly named the map phase and the reduce phase. The map phase looks at all documents in CouchDB separately one after the other and creates a map result. The map result is an ordered list of key/value pairs. Both key and value can be specified by the user writing the map function. A map function may call the built-in emit(key, value) function 0 to N times per document, creating a row in the map result per invocation.

MapReduceは(mapフェーズおよびreduceフェーズというよい名前の)二つのステップを適用することで問題を解決するコンセプトです。mapフェーズではCouchDBの中に保存されている全てのドキュメントを１つ１つ検査し、結果を作成します。mapフェーズの結果は、キーバリューペアの、順序付きリストです。keyもvalueもユーザーがmap関数の中に記述します。map関数では、1つのドキュメントに対して、emit(key, value)という組み込み関数を0回からN回呼び出し、呼び出し毎にmapフェーズの結果の行を1つ作ります。

CouchDB is smart enough to run a map function only once for every document, even on subsequent queries on a view. Only changes to documents or new documents need to be processed anew.

CouchDBは賢いので、ビューが何回呼び出されても、各ドキュメント毎に1回のmap関数しか呼び出しません。ドキュメントに対する変更が行われたとき、あるいは新しいドキュメントが追加されたときだけ、改めて実行されます。

Map functions

map関数

Map functions run in isolation for every document. They can’t modify the document, and they can’t talk to the outside world—they can’t have side effects. This is required so that CouchDB can guarantee correct results without having to recalculate a complete result when only one document gets changed.

map関数は、それぞれのドキュメントに対して独立して実行されます。それぞれの呼び出しにおいて、ドキュメントを変更することはできませんし、外の世界と話すこともできません。副作用を起こすことができないのです。これはCouchDBが、一つのドキュメントが変更されただけで全ての結果を再計算せずに結果の正しさを保証するために必要なことです。

The map result looks like this:

map関数の結果は次のような形です。

{"total_rows":3,"offset":0,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":"Lena","value":5},
{"id":"1fb2449f9b9d4e466dbfa47ebe675063","key":"Lisa","value":4},
{"id":"8ede09f6f6aeb35d948485624b28f149","key":"Sarah","value":6}
]}

It is a list of rows sorted by the value of key. The id is added automatically and refers back to the document that created this row. The value is the data you’re looking for. For example purposes, it’s the girl’s age.

これは、keyの値でソートされた行のリストです。idは自動的に追加され、その行を作成したドキュメントを指しています。valueはあなたが求めているデータです。この例においては、それは女の子の年齢です。

The map function that produces this result is:

この結果を生成するmap関数は次の通りです。

function(doc) {
  if(doc.name && doc.age) {
    emit(doc.name, doc.age);
  }
}

It includes the if statement as a sanity check to ensure that we’re operating on the right fields and calls the emit function with the name and age as the key and value.

ifステートメントは、正しいフィールドを操作しようとしているかどうかを確認する健全なチェック機構として働きます。そしてemit関数を呼び出してnameとageをそれぞれキーおよびバリューとして渡しています。

Reduce functions

reduce関数

Reduce functions are explained in the section called “Aggregate Functions”.

reduce関数は集合関数のセクションで説明します。

Look Up by Key

完全一致検索

How you would do this in SQL:

これをSQLで行うには次のようにします。

SELECT field FROM table WHERE value="searchterm"

Use case: get a result (which can be a record or set of records) associated with a key ("searchterm").

ユースケース: "searchterm"というキーに関連づけられた結果(1つのレコード、またはその集合)を取得します。

To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.

ストレージ機構によらず、検索を素早く行うにはインデックスが必要です。インデックスは素早い検索と取得を目的として最適化されたデータ構造です。CouchDBのmap関数の結果はこのようなインデックスとして保存されており、偶然ですがB+木となっています。

To look up a value by "searchterm", we need to put all values into the key of a view. All we need is a simple map function:

値が"searchterm"というドキュメントを検索するには、全ての値をビューのキーに配置しなければなりません。これには、単純なmap関数として次のような関数を登録するだけです。

function(doc) {
  if(doc.value) {
    emit(doc.value, null);
  }
}

This creates a list of documents that have a value field sorted by the data in the value field. To find all the records that match "searchterm", we query the view and specify the search term as a query parameter:

この関数はvalueというフィールドを持っているドキュメントに対して、valueフィールドの値で並び替えられたリストを作成します。リストから"searchterm"にマッチするすべてのレコードを見つけるには、クエリパラメーターに次のように指定してビューに問い合わせます。

/database/_design/application/_view/viewname?key="searchterm"

Consider the documents from the previous section, and say we’re indexing on the age field of the documents to find all the five-year-olds:

前のセクションで紹介したドキュメントで考えてみましょう。5歳の子どもたちを検索するために、ドキュメントのageフィールドのインデックスを構築するとします。

function(doc) {
  if(doc.age && doc.name) {
    emit(doc.age, doc.name);
  }
}

Query:

問い合わせはこうです。

/ladies/_design/ladies/_view/age?key=5

Result:

結果はこうなります。

{"total_rows":3,"offset":1,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":5,"value":"Lena"}
]}

Easy.

簡単でしょう。

Note that you have to emit a value. The view result includes the associated document ID in every row. We can use it to look up more data from the document itself. We can also use the ?include_docs=true parameter to have CouchDB fetch the documents individually for us.

値を発行しなければならない点に注意してください。ビューの結果には、それぞれの行の中に、map関数を実行したときのドキュメントのidが含まれています。これを使えば、ドキュメントの情報を取得する事ができます。あるいは、?include_docs=trueというパラメーターをつけることでCouchDBにここのドキュメントを取得するように指示することもできます。

Look Up by Prefix

前方一致検索

How you would do this in SQL:

これをSQLで行うには次のようにします。

SELECT field FROM table WHERE value LIKE "searchterm%"

Use case: find all documents that have a field value that starts with searchterm. For example, say you stored a MIME type (like text/html or image/jpg) for each document and now you want to find all documents that are images according to the MIME type.

ユースケース: searchtermという文字で始まる値をもつドキュメントを全て検索します。例えば、mime-type(text/htmlやimage/jpg等)を各ドキュメントが持っていて、mime-typeに従って画像を検索したい場合です。

The solution is very similar to the previous example: all we need is a map function that is a little more clever than the first one. But first, an example document:

このケースの解決方法は、前の方法と似ています。map関数だけが必要ですが、前のものより少し賢いものです。まず例となるドキュメントです。

{
  "_id": "Hugh Laurie",
  "_rev": "1-9fded7deef52ac373119d05435581edf",
  "mime-type": "image/jpg",
  "description": "some dude"
}

The clue lies in extracting the prefix that we want to search for from our document and putting it into our view index. We use a regular expression to match our prefix:

ヒントはドキュメントから検索したいプレフィックスを抽出し、それをビューのインデックスとして配置することです。プレフィックスを抽出するために正規表現を使います。

function(doc) {
  if(doc["mime-type"]) {
    // from the start (^) match everything that is not a slash ([^\/]+) until
    // we find a slash (\/). Slashes needs to be escaped with a backslash (\/)
    var prefix = doc["mime-type"].match(/^[^\/]+\//);
    if(prefix) {
      emit(prefix, null);
    }
  }
}

We can now query this view with our desired MIME type prefix and not only find all images, but also text, video, and all other formats:

これで、目的のmime-typeプレフィックスを検索できます。画像でなく、テキスト、動画、あるいは他のフォーマットも検索できますね。

/files/_design/finder/_view/by-mime-type?key="image/"

Aggregate Functions

集合関数

How you would do this in SQL:

これをSQLで行うには次のようにします。

SELECT COUNT(field) FROM table

Use case: calculate a derived value from your data.

ユースケース: データから取得した値を計算します。

We haven’t explained reduce functions yet. Reduce functions are similar to aggregate functions in SQL. They compute a value over multiple documents.

まだreduce関数について説明していませんでした。reduce関数はSQLの集合関数と似ています。reduce関数は複数のドキュメントに渡って一つの値を計算します。

To explain the mechanics of reduce functions, we’ll create one that doesn’t make a whole lot of sense. But this example is easy to understand. We’ll explore more useful reductions later.

reduce関数の仕組みを説明するために、あまり多くのことをしないreduce関数を作りましょう。この例は簡単で理解しやすいものです。その後でもっと役立つreduce関数を考えます。

Reduce functions operate on the output of the map function (also called the map re⁠sult or intermediate result). The reduce function’s job, unsurprisingly, is to reduce the list that the map function produces.

reduce関数はmap関数の出力(あるいは結果、またはその中間値)に対して作用します。reduce関数の仕事は、当然、map関数が生成したリストを減らすことです。

Here’s what our summing reduce function looks like:

これは合計を計算するreduce関数です。

function(keys, values) {
  var sum = 0;
  for(var idx in values) {
    sum = sum + values[idx];
  }
  return sum;
}

Here’s an alternate, more idiomatic JavaScript version:

もっと自然なJavaScriptで書くならばこうです。

function(keys, values) {
  var sum = 0;
  values.forEach(function(element) {
    sum = sum + element;
  });
  return sum;
}

This reduce function takes two arguments: a list of keys and a list of values. For our summing purposes we can ignore the keys-list and consider only the value list. We’re looping over the list and add each item to a running total that we’re returning at the end of the function.

このreduce関数は2つの引数をとります。keysのリストとvaluesのリストです。合計を計算する場合は、keysのリストは無視して、valueのリストだけを考えればよいです。リストに対してループを回し、合計値に加算していきます。そして、関数の最後でその合計値を返します。

You’ll see one difference between the map and the reduce function. The map function uses emit() to create its result, whereas the reduce function returns a value.

あなたは、map関数とreduce関数の違いを見つけたことでしょう。map関数はemit()を用いて結果を生成しますが、reduce関数は1つの値を返します。

For example, from a list of integer values that specify the age, calculate the sum of all years of life for the news headline, “786 life years present at event.” A little contrived, but very simple and thus good for demonstration purposes. Consider the documents and the map view we used earlier in this chapter.

例えば、ニュースの見出しに「786年分の生命が参加しています」と記すために、年齢が記されている整数値のリストから、すべての年齢の合計値を計算する例を考えます。少し不自然な話ですが、非常に単純ですから、デモ目的にはよいでしょう。ドキュメントとmap関数については、前の章で使ったものを想定しています。

The reduce function to calculate the total age of all girls is:

全ての女の子の年齢の合計を計算するreduce関数は次のようになります。

function(keys, values) {
  return sum(values);
}

Note that, instead of the two earlier versions, we use CouchDB’s predefined sum() function. It does the same thing as the other two, but it is such a common piece of code that CouchDB has it included.

既にでてきた２つのreduce関数の代わりに、CouchDBが事前定義しているsum()関数を使いました。これは前の2つのものと同じですが、よくあるコード断片なのでCouchDBが持っているのです。

The result for our reduce view now looks like this:

これで結果は次のようになります。

{"rows":[
{"key":null,"value":15}
]}

The total sum of all age fields in all our documents is 15. Just what we wanted. The key member of the result object is null, as we can’t know anymore which documents took part in the creation of the reduced result. We’ll cover more advanced reduce cases later on.

私たちのドキュメントの中に入っているageの全ての合計は15です。これがほしかったのです。結果のオブジェクトのkeyメンバーの値はnullです。これは、どのドキュメントからreduce関数の結果が作られたのか知るすべがないからです。続いて、もう少し進んだreduce関数の例を見ていきましょう。

As a rule of thumb, the reduce function should reduce a single scalar value. That is, an integer; a string; or a small, fixed-size list or object that includes an aggregated value (or values) from the values argument. It should never just return values or similar. CouchDB will give you a warning if you try to use reduce “the wrong way”:

経験からいえば、reduce関数は一つのスカラー値を返すべきです。スカラー値とは整数値や文字列、小さな固定長の配列またはオブジェクトで、これらは、values引数から集約された値(または値の集合)を持っています。valuesそのものを返すべきではありません。もしあなたが間違った方法でreduceを使ったのならば、次のようにCouchDBが警告を返すでしょう。

{"error":"reduce_overflow_error","message":"Reduce output must shrink more rapidly: Current output: ..."}

Get Unique Values

一意の値を取得する

How you would do this in SQL:

これをSQLで行うには次のようにします。

SELECT DISTINCT field FROM table

Getting unique values is not as easy as adding a keyword. But a reduce view and a special query parameter give us the same result. Let’s say you want a list of tags that your users have tagged themselves with and no duplicates.

一意の値を取得するのはキーワードを追加するほど簡単ではありません。しかし、reduceを用いたビューと特別なクエリパラメーターを用いることで、同じ結果を得られます。ユーザーが自分自身につけたタグのリストを重複なく取得したいときを考えましょう。

First, let’s look at the source documents. We punt on _id and _rev attributes here:

まず、元になるドキュメントを見てみます。_idと_rev属性は省略しています。

{
  "name":"Chris",
  "tags":["mustache", "music", "couchdb"]
}

{
  "name":"Noah",
  "tags":["hypertext", "philosophy", "couchdb"]
}

{
  "name":"Jan",
  "tags":["drums", "bike", "couchdb"]
}

Next, we need a list of all tags. A map function will do the trick:

まずは、全てのタグを含んだ一つのリストが必要です。これはmap関数に仕込みます。

function(dude) {
  if(dude.name && dude.tags) {
    dude.tags.forEach(function(tag) {
      emit(tag, null);
    });
  }
}

The result will look like this:

結果は次のようになるでしょう。

{"total_rows":9,"offset":0,"rows":[
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"bike","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"couchdb","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"couchdb","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"couchdb","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"drums","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"hypertext","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"music","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"mustache","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"philosophy","value":null}
]}

As promised, these are all the tags, including duplicates. Since each document gets run through the map function in isolation, it cannot know if the same key has been emitted already. At this stage, we need to live with that. To achieve uniqueness, we need a reduce:

約束したとおり、全てのタグが重複も含めて出てきています。それぞれのドキュメントがmap関数に独立して渡されているため、同じキーがすでに発行されたかどうかを知ることができません。この時点では、こうするほかありません。一意性を達成するには次のようなreduce関数が必要です。

function(keys, values) {
  return true;
}

This reduce doesn’t do anything, but it allows us to specify a special query parameter when querying the view:

このreduce関数は何もしません。しかし、このようにすることで、特別なクエリパラメーターをビューの問い合わせ時に利用することができるようになるのです。

/dudes/_design/dude-data/_view/tags?group=true

CouchDB replies:

CouchDBは次のように返します。

{"rows":[
{"key":"bike","value":true},
{"key":"couchdb","value":true},
{"key":"drums","value":true},
{"key":"hypertext","value":true},
{"key":"music","value":true},
{"key":"mustache","value":true},
{"key":"philosophy","value":true}
]}

In this case, we can ignore the value part because it is always true, but the result includes a list of all our tags and no duplicates!

このケースでは、value部分は常にtrueなので無視することができますが、これで、結果に全てのタグが重複なくはいってきたでしょう！

With a small change we can put the reduce to good use, too. Let’s see how many of the non-unique tags are there for each tag. To calculate the tag frequency, we just use the summing up we already learned about. In the map function, we emit a 1 instead of null:

少し変更することで、私達はreduceを十分活用できるようになります。各タグについて、同じタグが何個あるかを考えてみましょう。タグの頻度を求めるために、私達は既に学習した合計を求める方法を使います。map関数では、私たちはnullの代わりに1をemitします。

function(dude) {
  if(dude.name && dude.tags) {
    dude.tags.forEach(function(tag) {
      emit(tag, 1);
    });
  }
}

In the reduce function, we return the sum of all values:

reduce関数では、値の合計を返します。

function(keys, values) {
  return sum(values);
}

Now, if we query the view with the ?group=true parameter, we get back the count for each tag:

これで?group=trueパラメーターを使ってビューに問い合わせれば、それぞれのタグの数を取得できます。

{"rows":[
{"key":"bike","value":1},
{"key":"couchdb","value":3},
{"key":"drums","value":1},
{"key":"hypertext","value":1},
{"key":"music","value":1},
{"key":"mustache","value":1},
{"key":"philosophy","value":1}
]}

Enforcing Uniqueness

一意性を強制するには

How you would do this in SQL:

UNIQUE KEY(column)

Use case: your applications require that a certain value exists only once in a database.

ユースケース: あなたのアプリケーションでは、特定の値がデータベース内に一度だけ現れるようにする必要があります。

This is an easy one: within a CouchDB database, each document must have a unique _id field. If you require unique values in a database, just assign them to a document’s _id field and CouchDB will enforce uniqueness for you.

これは簡単です。CouchDBのデータベース内では、それぞれのドキュメントは一意の_idフィールドを持たなければなりません。もしデータベースで一意の値を必要とするならば、その値をドキュメントの_idフィールドに設定すれば、CouchDBが一意性を強制してくれるでしょう。

There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.

とはいえ、一つ気をつけることがあります。CouchDBのノードを2つ以上動作させ、各ノードで書き込みリクエストを許可するような分散システムのケースでは、一意性は各ノード毎のみ、あるいはCouchDBの外で保証するしかありません。CouchDBは二つの全く同じidが2つのノードに書き込まれることを許してしまいます。レプリケーション時に、CouchDBは衝突を検知し、適切にドキュメントにフラグ付けを行うでしょう。