← Back to MongoDB Mastery
Advanced24 min read

Replication & Sharding

Configure replica sets for high availability and shard collections for horizontal scale.

Replica Sets

A replica set consists of a primary node receiving writes and secondary nodes replicating the oplog. Automatic failover promotes a secondary to primary when the current primary is unreachable.

Deploy at least three voting members (or two data-bearing nodes plus an arbiter in constrained environments) to maintain quorum during elections. Secondaries can serve read queries with appropriate read preference.

  • Use readPreference "secondaryPreferred" for analytics reads off the primary
  • Enable majority write concern for durability across replica set members
  • Monitor replication lag; large lag indicates capacity or network issues
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017" },
    { _id: 1, host: "mongo2:27017" },
    { _id: 2, host: "mongo3:27017" }
  ]
})

Sharding Overview

Sharding distributes data across shards keyed by a shard key. Each shard is itself a replica set. mongos routers direct queries to the appropriate shards based on the shard key.

Sharding is for horizontal scale when a single replica set cannot hold your working set or write throughput. It adds operational complexity—choose sharding only when measurement proves it necessary.

sh.enableSharding("myapp")
sh.shardCollection("myapp.events", { tenantId: 1, createdAt: 1 })

Choosing a Shard Key

An ideal shard key provides high cardinality, even distribution, and supports your most common query patterns with targeted queries. Monotonically increasing keys (like _id alone or timestamp alone) cause hot shards.

Compound shard keys often balance distribution and query isolation—tenantId plus timestamp is a common multi-tenant pattern. Avoid low-cardinality keys that funnel all traffic to one chunk.

  • Targeted queries include the shard key in the filter
  • Scatter-gather queries hit all shards and scale poorly
  • Use hashed shard keys to improve distribution on monotonic fields (with query tradeoffs)

High Availability Patterns

Run replica set members across availability zones for fault tolerance. Use priority and tags to control election behavior and regional read routing.

Regularly test failover by stepping down the primary in staging. Ensure applications retry writes on transient errors and use appropriate write concern during elections.

// Driver write concern for critical operations
collection.insertOne(doc, { writeConcern: { w: "majority", wtimeout: 5000 } })

Balancing and Chunk Migration

The balancer migrates chunks between shards to maintain even data distribution. Disable balancing temporarily during maintenance windows if needed, but re-enable promptly.

Monitor chunk counts per shard and jumbo chunks that cannot split. Poor shard key choice manifests as unmovable hot spots that balancing cannot fix—often requiring collection reshard.

  • Zone sharding pins data to geographic regions for compliance
  • ReshardCollection allows changing shard key on existing sharded collections (MongoDB 4.4+)
  • Capacity plan each shard replica set independently

Get In Touch


Ready to discuss your next project? Drop me a message.