.avif)
.avif)
When MongoDB PlanCache Goes Rogue: Debugging a Real-World Memory Issue
We recently ran into a puzzling high-memory issue on one of our production MongoDB 7.0 replica sets. What started as occasional OOM kills turned into a weeks-long investigation that eventually traced back to an unexpected interaction between MongoDB's SBE query engine and the plan cache. This post walks through the full investigation, step by step, and how we finally nailed down the root cause - a bug specific to MongoDB 7.0's SBE implementation.
MongoDB 7.0 High Memory Usage: Environment and Initial Symptoms
We operate a 5-node MongoDB replica set where everything initially looked normal: CPU was low, cache behavior was healthy, and no slow-query spikes were visible. Over time, however, resident memory kept climbing until the Linux OOM killer started terminating mongod processes. The growth was gradual—not a sudden spike—which made the root cause harder to pinpoint.
Here’s what our setup looked like:
- MongoDB Version: 7.0.14
- Operating System: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
- RAM: 123GiB
- Storage Engine: WiredTiger (default)
Despite tuning and restarts, MongoDB's memory usage kept increasing beyond what could be explained by the configured WiredTiger cache size plus free heap.
Troubleshooting MongoDB Memory Issues: The Investigation
Let’s break down our journey into a checklist of questions and discoveries:
Q1: Did we analyze the MongoDB logs for anomalies or slow queries? Did creating indexes help?
Checklist:
✅ Reviewed logs for anomalies
✅ Verified no COLLSCAN or significant key examination issues
✅ Ensured all queries were optimized with proper indexing strategies
Outcome: Logs were clean, and queries were efficient. Creating indexes didn’t resolve the issue. Memory usage continued to grow.
Q2: Was aggressive logging enabled?
Checklist:
✅ Enabled debug-level logging
✅ Verified no abnormalities in logs
Outcome: The logs revealed no clues.
Q3: Could cache activity indicate high data volumes exceeding RAM capacity?
Checklist:
✅ Monitored cache activity using PMM (Percona Monitoring and Management)
✅ Verified normal cache evictions and activity levels
Outcome: WiredTiger Cache activity was not the issue, as there were no significant page faults or eviction events.
Q4: Were there temporary measures to prevent MongoDB from crashing?
Checklist:
✅ Scheduled weekly server restarts
Outcome: To temporarily mitigate the issue, we have implemented regular weekly restarts of the MongoDB servers. However, the root cause has not yet been identified.
Q5: Which part of MongoDB's memory usage was growing?
Checklist:
✅ Analyzed cache and heap memory patterns
Outcome: Memory usage did not align with the expected patterns of cache (MongoDB) and free heap (MongoDB unreleased memory to OS). The cache growth behaved as configured.
top - 19:14:57 up 80 days, 6:05, 5 users, load average: 0.93, 0.89, 0.99
Tasks: 410 total, 1 running, 405 sleeping, 4 stopped, 0 zombie
%Cpu(s): 7.2 us, 0.3 sy, 0.0 ni, 92.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 55.8/126174.0 [ ]
MiB Swap: 12.1/51200.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
704060 mongodb 20 0 73.6g 65.5g 36660 S 236.7 53.1 1746:34 mongod user@dbserver04-cds:~$ free
total used free shared buff/cache available
Mem: 129202132 71075224 1408264 2220 56718644 57199256
Swap: 52428796 6333788 46095008
mydbops@dbserver04-cds:~$ cds_prod_repl [direct: secondary] test> db.serverStatus().tcmalloc
{
generic: {
current_allocated_bytes: Long('72638896664'),
heap_size: Long('77213437952')
},
tcmalloc: {
pageheap_free_bytes: 0,
pageheap_unmapped_bytes: 1744842752,
max_total_thread_cache_bytes: 1073741824,
current_total_thread_cache_bytes: 157297696,
total_free_bytes: Long('2829698536'),
central_cache_free_bytes: Long('2672203592'),
transfer_cache_free_bytes: 197248,
thread_cache_free_bytes: 157297696,
aggressive_memory_decommit: 1,
pageheap_committed_bytes: Long('75468595200'),
pageheap_scavenge_count: 63345,
pageheap_commit_count: 167197448,
pageheap_total_commit_bytes: Long('68210629558272'),
pageheap_decommit_count: 161407512,
pageheap_total_decommit_bytes: Long('68135160963072'),
pageheap_reserve_count: 36834,
pageheap_total_reserve_bytes: Long('77213437952'),
spinlock_total_delay_ns: Long('166869211835'),
release_rate: 10,
formattedString: '------------------------------------------------\\n' +
'MALLOC: 72638897112 (69273.9 MiB) Bytes in use by application\\n' +
'MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist\\n' +
'MALLOC: + 2672203592 ( 2548.4 MiB) Bytes in central cache freelist\\n' +
'MALLOC: + 197248 ( 0.2 MiB) Bytes in transfer cache freelist\\n' +
'MALLOC: + 157297248 ( 150.0 MiB) Bytes in thread cache freelists\\n' +
'MALLOC: + 459014144 ( 437.8 MiB) Bytes in malloc metadata\\n' +
'MALLOC: ------------\\n' +
'MALLOC: = 75927609344 (72410.2 MiB) Actual memory used (physical + swap)\\n' +
'MALLOC: + 1744842752 ( 1664.0 MiB) Bytes released to OS (aka unmapped)\\n' +
'MALLOC: ------------\\n' +
'MALLOC: = 77672452096 (74074.2 MiB) Virtual address space used\\n' +
'MALLOC:\\n' +
'MALLOC: 6117591 Spans in use\\n' +
'MALLOC: 404 Thread heaps in use\\n' +
'MALLOC: 4096 Tcmalloc page size\\n' +
'------------------------------------------------\\n' +
'Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\\n' +
'Bytes released to the OS take up virtual address space but no physical memory.\\n'
}
}
cds_prod_repl [direct: secondary] test> Q6: Could it be a bug in MongoDB?
Checklist:
✅ Upgraded to the latest stable release in the 7.x series
Outcome: The problem persisted. This prompted us to check MongoDB JIRA for known issues and consider upgrading to MongoDB 8.0.
Q7: Did we attempt to force MongoDB to release memory back to the OS?
Checklist:
✅ Applied tcmallocAggressiveDecommit=1
✅ Increased release_rate to 10 (10x faster than default)
Outcome: Memory growth continued. Free heap memory dropped to 0, and cache size exceeded configured limits.
Q8: Did we escalate the issue to MongoDB community & forums?
Checklist:
✅ Created a support ticket
✅ Provided diagnostic logs with (heapProfilingEnabled=true)
Outcome: Analysis revealed:
- Stable WiredTiger cache size
- The memory used by tcmalloc outside of the WiredTiger cache was increasing over time
- The memory stacks were growing as well.
Q9: Did traffic levels impact the issue?
Checklist:
✅ Monitored memory behavior during low-traffic periods (Diwali week)
Outcome: Memory growth slowed significantly during low traffic, indicating a correlation between traffic and memory consumption.
MongoDB Plan Cache Analysis: Finding the Root Cause
Based on the analysis conducted so far, we observed the following
- Indexing for the queries is functioning correctly.
- Cache eviction and activity are operating as expected.
- Heap memory is being released back to the operating system properly.
- Memory usage increases in correlation with traffic to the server.
It appears that traffic plays a role, with certain read or write operations contributing to the issue. We began analyzing the read queries, and while reviewing the MongoDB logs with heapProfilingEnabled=true, we observed logs related to memory allocation. Two key findings emerged:
- There was a high number of memory allocation logs for the query planner.
- For the query below, the queryHash and planCacheKey values were changing with each query, even though the query itself remained the same.
{"t":{"$date":"2024-12-23T06:56:01.516+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936922,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":1407,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"44AD4417","planCacheKey":"917F67C1","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1815350,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":2}root@db1# zgrep -i 'prod.test' /var/log/mongodb/mongod.log | grep -oE '"queryHash":"[a-zA-Z0-9]+"' | awk '!seen[$0]++' | wc -l
896Q10: Why were so many query plan hashes generated, and where are they stored?
Checklist:
✅ Counted query plans for specific collections
Outcome: A single collection had 52,891 query plans for one query shape, causing memory usage to rise with planCache growth. Caches are stored in RAM, and it is clear that they are cleared upon a MongoDB restart.
> db.test.getPlanCache().list().length
52891 Q11: Did clearing the cache help reclaim memory?
Checklist:
✅ Cleared cache with the command:
Outcome: Clearing the plan cache reduced memory usage slightly (~5%) and increased free heap memory significantly. Memory stabilized after releasing the heap back to the OS.
> db.getSiblingDB("test").test.getPlanCache().list().length
113702
> (db.serverStatus().tcmalloc.tcmalloc["pageheap_free_bytes"] / 1024 / 1024 / 1024).toFixed(2)
2.48
> db.test.getPlanCache().clear()
> (db.serverStatus().tcmalloc.tcmalloc["pageheap_free_bytes"] / 1024 / 1024 / 1024).toFixed(2)
28.91Q12: What caused query shapes to change?
Checklist:
✅ Identified problematic queries involving arrays:
Outcome: Adding or removing elements in arrays triggered new query shapes unnecessarily. This is a critical consideration for MongoDB aggregation pipeline optimization when working with complex array queries.
db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })
db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })Q13: Is this issue present in other MongoDB versions?
Checklist:
✅ Tested behavior across versions 6.0, 7.0, and 8.0
Outcome: The issue occurred in MongoDB 7.0 but not in 6.0 or 8.0. It was identified as a bug in the SBE (Slot-Based Execution) engine introduced in 7.0. The issue is tracked in MongoDB JIRA SERVER-96924.
db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })6.0
{"t":{"$date":"2024-12-23T06:51:21.247+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn21","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"8fccbac3-db39-4a59-93db-486dd7cdb1b0"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936619,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":694,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"9CBC6A40","planCacheKey":"A0F72FAE","queryFramework":"classic","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58730","protocol":"op_msg","durationMillis":1}}7.0
{"t":{"$date":"2024-12-23T06:55:31.715+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936872,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":44923,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":1,"nreturned":0,"queryHash":"5D460A72","planCacheKey":"5C28D72D","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":3}},"Global":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":2840,"timeReadingMicros":16}},"cpuNanos":13822765,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":55}}8.0
{"t":{"$date":"2024-12-23T07:08:43.967+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn20","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"prodtest","collectionType":"normal","appName":"mongosh 2.2.5","queryShapeHash":"52A4EBE4B3770E13CB82532CF55BDB3CAAA5A0AE00D6011A8003B4FF56E36E71","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"74110071-ba35-47b5-abae-6511fb32dce1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734937693,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":12215,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":1,"nreturned":0,"planCacheShapeHash":"432E6D6C","planCacheKey":"F047721F","queryFramework":"sbe","reslen":227,"locks":{"Global":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":2840,"timeReadingMicros":17}},"cpuNanos":8305201,"remote":"127.0.0.1:38936","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":4}},"workingMillis":27,"durationMillis":27}}Running with fewer array elements:
db.test.aggregate([{"$match":{"num":24,"red":{"$in":[{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') },{ a1: Long('1234567'), a2: Long('1234567') },{ a1: Long('1234567'), a2: Long('-1234567') },{ a1: Long('-1234567'), a2: Long('1234567') },{ a1: Long('-1234567'), a2: Long('-1234567') }]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],{ allowDiskUse: true })6.0
{"t":{"$date":"2024-12-23T06:52:02.128+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn21","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"8fccbac3-db39-4a59-93db-486dd7cdb1b0"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936679,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":354,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"9CBC6A40","planCacheKey":"A0F72FAE","queryFramework":"classic","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58730","protocol":"op_msg","durationMillis":0}}7.0
{"t":{"$date":"2024-12-23T06:56:01.516+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn32","msg":"Slow query","attr":{"type":"command","ns":"prod.test","appName":"mongosh 2.2.5","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"a734f5b6-9dd4-44ec-ba89-5fd6a85f57c1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734936922,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":1407,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"queryHash":"44AD4417","planCacheKey":"917F67C1","queryFramework":"sbe","reslen":227,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1815350,"remote":"127.0.0.1:26106","protocol":"op_msg","durationMillis":2}}8.0
{"t":{"$date":"2024-12-23T07:09:19.255+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn20","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"prodtest","collectionType":"normal","appName":"mongosh 2.2.5","queryShapeHash":"52A4EBE4B3770E13CB82532CF55BDB3CAAA5A0AE00D6011A8003B4FF56E36E71","command":{"aggregate":"test","pipeline":[{"$match":{"num":24,"red":{"$in":[{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":-1234567},{"a1":1234567,"a2":1234567},{"a1":1234567,"a2":-1234567},{"a1":-1234567,"a2":1234567},{"a1":-1234567,"a2":-1234567}]}}},{"$group":{"_id":{"num":"$num","red":"$red","green":"$green","blue":"$blue"},"count":{"$sum":1}}},{"$group":{"_id":{"blue":"$_id.blue","green":"$_id.green"},"count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":3}],"allowDiskUse":true,"cursor":{},"lsid":{"id":{"$uuid":"74110071-ba35-47b5-abae-6511fb32dce1"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1734937723,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"primaryPreferred"},"$db":"prod"},"planSummary":"IXSCAN { num: 1, red: 1, green: 1, blue: 1 }","planningTimeMicros":956,"keysExamined":0,"docsExamined":0,"hasSortStage":true,"cursorExhausted":true,"numYields":0,"nreturned":0,"planCacheShapeHash":"432E6D6C","planCacheKey":"F047721F","queryFramework":"sbe","reslen":227,"locks":{"Global":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"cpuNanos":1703772,"remote":"127.0.0.1:38936","protocol":"op_msg","queues":{"ingress":{"admissions":1},"execution":{"admissions":3}},"workingMillis":1,"durationMillis":1}}MongoDB 8.0 fixed this issue with improved query shape hashing and plan cache management. If you're considering an upgrade, check out our MongoDB 8.0 migration guide.
Q14: What is the resolution for this issue?
Checklist:
✅ Applied MongoDB support recommendations:
We tested two solutions:
1. Forced the classic query engine:
db.adminCommand({ setParameter: 1, internalQueryFrameworkControl: "forceClassicEngine" });This server parameter forces MongoDB to use the classic query engine instead of SBE, bypassing the bug entirely.
2. Tested an upgrade to MongoDB 8.0
Outcome: Both solutions worked. Memory usage stabilized with forceClassicEngine, and the issue was fully resolved in MongoDB 8.0. For production deployments, we recommend MongoDB managed services support during such critical changes.
Through persistent investigation and collaboration with MongoDB support, we identified and resolved the root cause of high memory usage in our MongoDB cluster. The culprit was excessive query plan generation due to a bug in the SBE engine in MongoDB 7.0.
Key Takeaways:
- Analyze diagnostic logs thoroughly: Use tools like heapProfilingEnabled to uncover hidden memory issues.
- Monitor query shapes and cache behavior: Excessive plan generation can lead to memory bloat.
- Stay updated on MongoDB versions: Upgrade to leverage fixes and improvements.
Need Help with MongoDB Issues?
Facing memory issues, performance problems, or planning a MongoDB upgrade? Contact Mydbops for expert MongoDB consulting and managed database services.

.avif)




