From 180ab427996b0c6066b7cd978aa70c7991c05e3e Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Fri, 30 Nov 2012 14:04:48 -0500 Subject: [PATCH 1/2] DOCS-831 shard keys and gridfs --- source/core/sharding-internals.txt | 31 ++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/source/core/sharding-internals.txt b/source/core/sharding-internals.txt index c2b5ce4af5d..c821d128b1a 100644 --- a/source/core/sharding-internals.txt +++ b/source/core/sharding-internals.txt @@ -534,6 +534,37 @@ a document with a ``msg`` field that holds the string If the application is instead connected to a :program:`mongod`, the returned document does not include the ``isdbgrid`` string. +Shard GridFS Documents +---------------------- + +One common way to shard :term:`GridFS` is to do so based on pre-existing +indexes and to configure the shard as follows: + +- Do not shard the "files" collection. This means all the file-metadata + documents live on one shard. It is highly recommended that the shard + is a replica set with at least three members, for resiliency. + +- Shard the "chunks" collection using the index "files_id: 1". You must + create this separate index. Do not use the existing "files_id, n" + index created by the drivers. + + The new "files_id" index ensures that all chunks of a given file live + on the same shard, which is safer and allows FileMD5 hashing. + + To shard "chunks" by "files_id", issue commands similar to the following: + + .. code-block:: javascript + + db.fs.chunks.ensureIndex( { files_id : 1 } ) + + db.runCommand( { shardcollection : "test.fs.chunks" , key : { files_id : 1 } } ) + + The default ``files_id`` is an :term:`ObjectId`. The ``files_id`` is + ascending, and all GridFS chunks are sent to a single sharding chunk. + If your write load is too high for a single server to handle, you may + want to shard on a different key or use a different value for ``_id`` + in the ``files`` collection. + .. index:: config database .. index:: database, config .. _sharding-internals-config-database: From ebc9c847489a3fbe55bdf849a277264bd95c2989 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 3 Dec 2012 17:24:02 -0500 Subject: [PATCH 2/2] DOCS-831 review edits --- source/core/sharding-internals.txt | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/source/core/sharding-internals.txt b/source/core/sharding-internals.txt index c821d128b1a..aaaef9e9cfa 100644 --- a/source/core/sharding-internals.txt +++ b/source/core/sharding-internals.txt @@ -537,27 +537,31 @@ returned document does not include the ``isdbgrid`` string. Shard GridFS Documents ---------------------- -One common way to shard :term:`GridFS` is to do so based on pre-existing -indexes and to configure the shard as follows: +A common way to shard :term:`GridFS` is to configure the shard as follows: -- Do not shard the "files" collection. This means all the file-metadata - documents live on one shard. It is highly recommended that the shard - is a replica set with at least three members, for resiliency. +- Do not shard the ``files`` collection, as the keys in this collection do + not easily lend themselves to even distributions. -- Shard the "chunks" collection using the index "files_id: 1". You must - create this separate index. Do not use the existing "files_id, n" - index created by the drivers. + Leaving ``files`` unsharded means that all the file metadata documents + live on one shard. It is recommended that the shard is a replica set + with at least three members, for high availability. - The new "files_id" index ensures that all chunks of a given file live - on the same shard, which is safer and allows FileMD5 hashing. +- Shard the ``chunks`` collection using a new ``files_id : 1 , n : 1`` + index. You must create this index. Do not use the existing + ``files_id : 1 , n : 1`` index already created by the drivers. - To shard "chunks" by "files_id", issue commands similar to the following: + The new ``files_id : 1 , n : 1`` index ensures that all chunks of a + given file live on the same shard, which is safer and allows FileMD5 + hashing. + + To shard the ``chunks`` collection by ``files_id : 1 , n : 1``, issue + commands similar to the following: .. code-block:: javascript - db.fs.chunks.ensureIndex( { files_id : 1 } ) + db.fs.chunks.ensureIndex( { files_id : 1 , n : 1 } ) - db.runCommand( { shardcollection : "test.fs.chunks" , key : { files_id : 1 } } ) + db.runCommand( { shardcollection : "test.fs.chunks" , key : { files_id : 1 , n : 1 } } ) The default ``files_id`` is an :term:`ObjectId`. The ``files_id`` is ascending, and all GridFS chunks are sent to a single sharding chunk.