MongoDB PlanCache Memory Issue: Debugging & Fix

Mydbops
Dec 8, 2025
12
Mins to Read
All
MongoDB PlanCache Memory Issue
MongoDB PlanCache Memory Issue

When MongoDB PlanCache Goes Rogue: Debugging a Real-World Memory Issue

We recently ran into a puzzling high-memory issue on one of our production MongoDB 7.0 replica sets. What started as occasional OOM kills turned into a weeks-long investigation that eventually traced back to an unexpected interaction between MongoDB's SBE query engine and the plan cache. This post walks through the full investigation, step by step, and how we finally nailed down the root cause - a bug specific to MongoDB 7.0's SBE implementation.

MongoDB 7.0 High Memory Usage: Environment and Initial Symptoms

We operate a 5-node MongoDB replica set where everything initially looked normal: CPU was low, cache behavior was healthy, and no slow-query spikes were visible. Over time, however, resident memory kept climbing until the Linux OOM killer started terminating mongod processes. The growth was gradual—not a sudden spike—which made the root cause harder to pinpoint.

Here’s what our setup looked like:

  • MongoDB Version: 7.0.14
  • Operating System: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
  • RAM: 123GiB
  • Storage Engine: WiredTiger (default)

Despite tuning and restarts, MongoDB's memory usage kept increasing beyond what could be explained by the configured WiredTiger cache size plus free heap.

Troubleshooting MongoDB Memory Issues: The Investigation

Let’s break down our journey into a checklist of questions and discoveries:

Q1: Did we analyze the MongoDB logs for anomalies or slow queries? Did creating indexes help?

Checklist:

✅ Reviewed logs for anomalies

✅ Verified no COLLSCAN or significant key examination issues

✅ Ensured all queries were optimized with proper indexing strategies

Outcome: Logs were clean, and queries were efficient. Creating indexes didn’t resolve the issue. Memory usage continued to grow.

Q2: Was aggressive logging enabled?

Checklist:

✅ Enabled debug-level logging

✅ Verified no abnormalities in logs

Outcome: The logs revealed no clues.

Q3: Could cache activity indicate high data volumes exceeding RAM capacity?

Checklist:

✅ Monitored cache activity using PMM (Percona Monitoring and Management)

✅ Verified normal cache evictions and activity levels

Outcome: WiredTiger Cache activity was not the issue, as there were no significant page faults or eviction events.

Q4: Were there temporary measures to prevent MongoDB from crashing?

Checklist:

✅ Scheduled weekly server restarts

Outcome: To temporarily mitigate the issue, we have implemented regular weekly restarts of the MongoDB servers. However, the root cause has not yet been identified.

Q5: Which part of MongoDB's memory usage was growing?

Checklist:

✅ Analyzed cache and heap memory patterns

Outcome: Memory usage did not align with the expected patterns of cache (MongoDB) and free heap (MongoDB unreleased memory to OS). The cache growth behaved as configured.

top - 19:14:57 up 80 days,  6:05,  5 users,  load average: 0.93, 0.89, 0.99
Tasks: 410 total,   1 running, 405 sleeping,   4 stopped,   0 zombie
%Cpu(s):  7.2 us,  0.3 sy,  0.0 ni, 92.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 55.8/126174.0 [                                                                                                    ]
MiB Swap: 12.1/51200.0  [                                                                                                    ]    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                           
 704060 mongodb   20   0   73.6g  65.5g  36660 S 236.7  53.1   1746:34 mongod 
user@dbserver04-cds:~$ free 
               total        used        free      shared  buff/cache   available
Mem:       129202132    71075224     1408264        2220    56718644    57199256
Swap:       52428796     6333788    46095008
mydbops@dbserver04-cds:~$ 
cds_prod_repl [direct: secondary] test> db.serverStatus().tcmalloc
{
  generic: {
    current_allocated_bytes: Long('72638896664'),
    heap_size: Long('77213437952')
  },
  tcmalloc: {
    pageheap_free_bytes: 0,
    pageheap_unmapped_bytes: 1744842752,
    max_total_thread_cache_bytes: 1073741824,
    current_total_thread_cache_bytes: 157297696,
    total_free_bytes: Long('2829698536'),
    central_cache_free_bytes: Long('2672203592'),
    transfer_cache_free_bytes: 197248,
    thread_cache_free_bytes: 157297696,
    aggressive_memory_decommit: 1,
    pageheap_committed_bytes: Long('75468595200'),
    pageheap_scavenge_count: 63345,
    pageheap_commit_count: 167197448,
    pageheap_total_commit_bytes: Long('68210629558272'),
    pageheap_decommit_count: 161407512,
    pageheap_total_decommit_bytes: Long('68135160963072'),
    pageheap_reserve_count: 36834,
    pageheap_total_reserve_bytes: Long('77213437952'),
    spinlock_total_delay_ns: Long('166869211835'),
    release_rate: 10,
    formattedString: '------------------------------------------------\\n' +
      'MALLOC:    72638897112 (69273.9 MiB) Bytes in use by application\\n' +
      'MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist\\n' +
      'MALLOC: +   2672203592 ( 2548.4 MiB) Bytes in central cache freelist\\n' +
      'MALLOC: +       197248 (    0.2 MiB) Bytes in transfer cache freelist\\n' +
      'MALLOC: +    157297248 (  150.0 MiB) Bytes in thread cache freelists\\n' +
      'MALLOC: +    459014144 (  437.8 MiB) Bytes in malloc metadata\\n' +
      'MALLOC:   ------------\\n' +
      'MALLOC: =  75927609344 (72410.2 MiB) Actual memory used (physical + swap)\\n' +
      'MALLOC: +   1744842752 ( 1664.0 MiB) Bytes released to OS (aka unmapped)\\n' +
      'MALLOC:   ------------\\n' +
      'MALLOC: =  77672452096 (74074.2 MiB) Virtual address space used\\n' +
      'MALLOC:\\n' +
      'MALLOC:        6117591              Spans in use\\n' +
      'MALLOC:            404              Thread heaps in use\\n' +
      'MALLOC:           4096              Tcmalloc page size\\n' +
      '------------------------------------------------\\n' +
      'Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\\n' +
      'Bytes released to the OS take up virtual address space but no physical memory.\\n'
  }
}
cds_prod_repl [direct: secondary] test> 

Actual TCMalloc Memory Stats

From db.serverStatus().tcmalloc

💾

Total Allocated

72.6 GB
current_allocated_bytes
📦

Heap Size

77.2 GB
heap_size
⚠️

Page Heap Free

0 GB
pageheap_free_bytes
No free pages available!
Q6: Could it be a bug in MongoDB?

Checklist:

✅ Upgraded to the latest stable release in the 7.x series

Outcome: The problem persisted. This prompted us to check MongoDB JIRA for known issues and consider upgrading to MongoDB 8.0.

Q7: Did we attempt to force MongoDB to release memory back to the OS?

Checklist:

✅ Applied tcmallocAggressiveDecommit=1

✅ Increased release_rate to 10 (10x faster than default)

Outcome: Memory growth continued. Free heap memory dropped to 0, and cache size exceeded configured limits.

Q8: Did we escalate the issue to MongoDB community & forums?

Checklist:

✅ Created a support ticket

✅ Provided diagnostic logs with (heapProfilingEnabled=true)

Outcome: Analysis revealed:

  • Stable WiredTiger cache size
  • The memory used by tcmalloc outside of the WiredTiger cache was increasing over time
  • The memory stacks were growing as well.
Q9: Did traffic levels impact the issue?

Checklist:

✅ Monitored memory behavior during low-traffic periods (Diwali week)

Outcome: Memory growth slowed significantly during low traffic, indicating a correlation between traffic and memory consumption.

MongoDB Plan Cache Analysis: Finding the Root Cause

Based on the analysis conducted so far, we observed the following

  • Indexing for the queries is functioning correctly.
  • Cache eviction and activity are operating as expected.
  • Heap memory is being released back to the operating system properly.
  • Memory usage increases in correlation with traffic to the server.

It appears that traffic plays a role, with certain read or write operations contributing to the issue. We began analyzing the read queries, and while reviewing the MongoDB logs with heapProfilingEnabled=true, we observed logs related to memory allocation. Two key findings emerged:

  1. There was a high number of memory allocation logs for the query planner.
  2. For the query below, the queryHash and planCacheKey values were changing with each query, even though the query itself remained the same.
{"t":{"$date":"2024-12-23T06:56:01.516+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936922,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":1407,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"44AD4417","planCacheKey":"917F67C1","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1815350,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":2}
root@db1# zgrep -i 'prod.test' /var/log/mongodb/mongod.log  | grep -oE '"queryHash":"[a-zA-Z0-9]+"'  | awk '!seen[$0]++' | wc -l
896

Traffic Volume vs Memory Growth

Memory growth strongly correlates with query traffic volume

Mon
Tue
Wed
Thu
Fri
Sat
Sun
Traffic:
Low
High
Critical
Key Observation:
During Diwali week (low traffic), memory growth slowed significantly. When traffic resumed, memory spiked again—confirming the leak was tied to query execution volume.
Q10: Why were so many query plan hashes generated, and where are they stored?

Checklist:

✅ Counted query plans for specific collections

Outcome: A single collection had 52,891 query plans for one query shape, causing memory usage to rise with planCache growth. Caches are stored in RAM, and it is clear that they are cleared upon a MongoDB restart.

> db.test.getPlanCache().list().length
52891 

Why Query Hashes Keep Changing in MongoDB 7.0 SBE

Same query pattern + different array values = different hashes

Query #1 SBE 7.0
$in: [{a1:-1234567, a2:1234567}, ...]
Query #2 SBE 7.0
$in: [{a1:-1234567, a2:-1234567}, ...]
Query #3 SBE 7.0
$in: [{a1:1234567, a2:-1234567}, ...]
×
896 Unique Query Hashes
= 113,702 plan cache entries
In Classic (6.0) or Fixed SBE (8.0)
$in: [<any_values>] → Same queryHash
Result: 1 cache entry instead of 113,702

Plan Cache Entries Growth

Normal
~50
Intermediate
52,891
Critical
113,702
From 1 expected entry to 113,702 actual entries
896 unique queryHash values for the same query pattern
Q11: Did clearing the cache help reclaim memory?

Checklist:

✅ Cleared cache with the command:

Outcome: Clearing the plan cache reduced memory usage slightly (~5%) and increased free heap memory significantly. Memory stabilized after releasing the heap back to the OS.

> db.getSiblingDB("test").test.getPlanCache().list().length
113702
> (db.serverStatus().tcmalloc.tcmalloc["pageheap_free_bytes"] / 1024 / 1024 / 1024).toFixed(2)
2.48
> db.test.getPlanCache().clear()
> (db.serverStatus().tcmalloc.tcmalloc["pageheap_free_bytes"] / 1024 / 1024 / 1024).toFixed(2)
28.91

Impact of Clearing Plan Cache

Before

Cache Bloated

C
Plan Cache Entries
113,702
M
Free Heap Memory
2.48 GB
!
System Status
Near OOM
Memory Pressure: High
After

Cache Cleared

C
Plan Cache Entries
0
M
Free Heap Memory
28.91 GB
System Status
Healthy
Memory Pressure: Low
Memory Reclaimed: 26.43 GB
db.test.getPlanCache().clear()
Q12: What caused query shapes to change?

Checklist:

✅ Identified problematic queries involving arrays:

Outcome: Adding or removing elements in arrays triggered new query shapes unnecessarily. This is a critical consideration for MongoDB aggregation pipeline optimization when working with complex array queries.

db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })

db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })

Query Engine Architecture Comparison

Classic Engine

v6.0
1
Query Parser
Normalizes structure
2
Shape Generator
Ignores literal values
3
Plan Cache
Single hash: 9CBC6A40
~50 Cache Entries

SBE Engine

v7.0 Bug
1
Query Parser
Normalizes structure
2
Shape Generator
Includes array values!
3
Plan Cache
896 different hashes
!
113,702 Entries
The Fix in v8.0:
MongoDB 8.0 uses queryShapeHash that properly normalizes array values, treating all variations as the same shape.

MongoDB Version Behavior Comparison

v6.0

Stable
Engine: Classic
queryHash: 9CBC6A40
Plan Cache: ~50 entries
Status: Normal

v7.0 to 7.0.16

Bug
Engine: SBE
queryHash: 896 different!
Plan Cache: 113,702 entries
Status: Memory Leak

v8.0

Fixed
Engine: SBE v2
queryShapeHash: 432E6D6C
Plan Cache: ~65 entries
Status: Resolved
Q13: Is this issue present in other MongoDB versions?

Checklist:

✅ Tested behavior across versions 6.0, 7.0, and 8.0

Outcome: The issue occurred in MongoDB 7.0 but not in 6.0 or 8.0. It was identified as a bug in the SBE (Slot-Based Execution) engine introduced in 7.0. The issue is tracked in MongoDB JIRA SERVER-96924.

db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })

6.0

{"t":{"$date":"2024-12-23T06:51:21.247+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn21","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"8fccbac3-db39-4a59-93db-486dd7cdb1b0"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936619,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":694,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"9CBC6A40","planCacheKey":"A0F72FAE","queryFramework":"classic","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58730","protocol":"op_msg","durationMillis":1}}

7.0

{"t":{"$date":"2024-12-23T06:55:31.715+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936872,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":44923,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":1,"nreturned":0,"queryHash":"5D460A72","planCacheKey":"5C28D72D","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":3}},"Global":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":2840,"timeReadingMicros":16}},"cpuNanos":13822765,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":55}}

8.0

{"t":{"$date":"2024-12-23T07:08:43.967+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn20","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"prodtest","collectionType":"normal","appName":"mongosh 2.2.5","queryShapeHash":"52A4EBE4B3770E13CB82532CF55BDB3CAAA5A0AE00D6011A8003B4FF56E36E71","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"74110071-ba35-47b5-abae-6511fb32dce1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734937693,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":12215,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":1,"nreturned":0,"planCacheShapeHash":"432E6D6C","planCacheKey":"F047721F","queryFramework":"sbe","reslen":227,"locks":{"Global":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":2840,"timeReadingMicros":17}},"cpuNanos":8305201,"remote":"127.0.0.1:38936","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":4}},"workingMillis":27,"durationMillis":27}}

Running with fewer array elements:

db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })

6.0

{"t":{"$date":"2024-12-23T06:52:02.128+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn21","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"8fccbac3-db39-4a59-93db-486dd7cdb1b0"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936679,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":354,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"9CBC6A40","planCacheKey":"A0F72FAE","queryFramework":"classic","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58730","protocol":"op_msg","durationMillis":0}}

7.0

{"t":{"$date":"2024-12-23T06:56:01.516+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936922,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":1407,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"44AD4417","planCacheKey":"917F67C1","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1815350,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":2}}

8.0

{"t":{"$date":"2024-12-23T07:09:19.255+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn20","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"prodtest","collectionType":"normal","appName":"mongosh 2.2.5","queryShapeHash":"52A4EBE4B3770E13CB82532CF55BDB3CAAA5A0AE00D6011A8003B4FF56E36E71","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"74110071-ba35-47b5-abae-6511fb32dce1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734937723,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":956,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"planCacheShapeHash":"432E6D6C","planCacheKey":"F047721F","queryFramework":"sbe","reslen":227,"locks":{"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1703772,"remote":"127.0.0.1:38936","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":3}},"workingMillis":1,"durationMillis":1}}

MongoDB 8.0 fixed this issue with improved query shape hashing and plan cache management. If you're considering an upgrade, check out our MongoDB 8.0 migration guide.

Q14: What is the resolution for this issue?

Checklist:

✅ Applied MongoDB support recommendations:

We tested two solutions:

1. Forced the classic query engine:

db.adminCommand({ setParameter: 1, internalQueryFrameworkControl: "forceClassicEngine" });

This server parameter forces MongoDB to use the classic query engine instead of SBE, bypassing the bug entirely.

2. Tested an upgrade to MongoDB 8.0

Outcome: Both solutions worked. Memory usage stabilized with forceClassicEngine, and the issue was fully resolved in MongoDB 8.0. For production deployments, we recommend MongoDB managed services support during such critical changes.

Resolution Path

⚠️

MongoDB 7.0 SBE Bug

113,702 plan cache entries

🔧

Option 1

forceClassicEngine

Immediate
OR
🚀

Option 2

Upgrade to 8.0

Permanent

Memory Stabilized

Plan cache: ~50-65 entries

Through persistent investigation and collaboration with MongoDB support, we identified and resolved the root cause of high memory usage in our MongoDB cluster. The culprit was excessive query plan generation due to a bug in the SBE engine in MongoDB 7.0.

Key Takeaways:

  1. Analyze diagnostic logs thoroughly: Use tools like heapProfilingEnabled to uncover hidden memory issues.
  2. Monitor query shapes and cache behavior: Excessive plan generation can lead to memory bloat.
  3. Stay updated on MongoDB versions: Upgrade to leverage fixes and improvements.

Need Help with MongoDB Issues?

Facing memory issues, performance problems, or planning a MongoDB upgrade? Contact Mydbops for expert MongoDB consulting and managed database services.

No items found.

About the Author

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.