Troubleshooting GCP Cloud SQL Restarts: Incident Response Guide

Mydbops
Jun 15, 2026
6
Mins to Read
All
GCP Cloud SQL Restarts
GCP Cloud SQL Restarts

When a GCP Cloud SQL instance restarts, the first question is usually simple:

Was this expected, or did something go wrong?

The answer determines the entire direction of your investigation. A planned maintenance event might require nothing more than internal documentation, while an unexpected crash could indicate an underlying infrastructure or memory issue that needs immediate attention.

If you are used to self-managed MySQL environments, you are likely used to SSHing into a server and tailing a log file. GCP Cloud SQL takes that ability away. Instead, you rely entirely on what Cloud Logging exposes. The catch is that not all critical events end up in the specific log stream you would naturally check first.

The primary challenge is that most restart events look identical from the application side. Whether the restart was caused by routine maintenance, a manual user action, an out-of-memory (OOM) condition, or a MySQL internal failure, the immediate symptom is always the same: connections drop, the instance becomes temporarily unavailable, and eventually, it comes back online.

The evidence is usually there. You are just looking in the wrong place. Searching the wrong logs can easily lead to a "no results" response, even when the actual cause has been recorded elsewhere in the system.

In the sections below, we will walk through a structured approach to determining whether a Cloud SQL restart was planned or unplanned, exactly where to look for the evidence, and how to identify the underlying cause as quickly as possible.

The Incident That Inspired This Investigation

When a client's instance started restarting intermittently, the obvious first stop was mysql.err, that is where any database administrator expects a crash to show up.

Attempt 1 - MySQL error log, pinned to one stream

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT_ID:INSTANCE_NAME"
log_name="PROJECT_ID/nc-ce-inf-prd-db-mum-gdbin/logs/cloudsql.googleapis.com%2Fmysql.err"

Result: No entries returned.

The log_name filter locks the query to a single stream. That stream only contains messages written directly by the MySQL engine itself. If MySQL was terminated by the host operating system before it could flush its logs, there is nothing here to find. The query is perfectly valid; the stream simply does not contain the event.

Rather than assuming nothing happened, the search was widened. The log_name filter was removed entirely and replaced with basic keyword terms. There was no stream pinning, just text matching across every log stream attached to that specific instance.

Attempt 2 - No log_name filter, keyword search across all streams

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT_ID:INSTANCE_NAME"
"OOM" OR "Out of memory" OR "OOM-killer" OR "Killed process"
OR "killed database process" OR "SIGKILL" OR "fatal signal."

Result: INFO 2026-06-02T06:29:25.253260Z Out of memory: killed database process: mysqld

This message was written by the host kernel's OOM killer, not by MySQL. When memory pressure forced the OS to free resources, it terminated mysqld and logged that action to a system-level stream. That stream only shows up when you search without a log_name filter. MySQL never got a chance to write anything because it was gone before it could.

The two queries differ in exactly one way: the log_name filter. The same instance and the same timeframe produced a completely different outcome just by changing the scope.

The Two Categories of Shutdown

Every Cloud SQL shutdown falls into one of two buckets. The investigation approach for each is fundamentally different.

Planned Shutdowns

A planned shutdown is anything where a user made a deliberate call to GCP maintenance, initiated a manual restart, or triggered an HA failover. The goal here is confirmation and documentation, not a technical investigation.

Unplanned Shutdowns

Unplanned shutdowns occur when MySQL or the underlying infrastructure encounters a problem that forces the instance to stop unexpectedly. These events require deeper troubleshooting because the restart is a symptom of a larger issue.

Planned
Unplanned
  • GCP maintenance window
  • Manual restart from Console or CLI
  • Instance configuration change requiring restart
  • Scheduled failover (HA instances)
  • Storage auto-resize restart
  • Software/patch update by GCP
  • OOM kill (kernel kills mysqld)
  • MySQL internal crash/assertion failure
  • Underlying host failure
  • Disk full causing hard stop
  • Corruption or InnoDB-level failure
  • Connection storm/resource exhaustion

From the application side, both categories look identical. The difference lies in what they leave behind in the logs.

Know Your Log Streams First

Cloud SQL records different events in different streams. Before you start querying Google Cloud Logging, it helps to know which stream captures what.

While mysql.err is often useful, it only captures events generated by the MySQL engine itself. Different restart causes are recorded in different log streams. Understanding which stream captures which event type significantly reduces investigation time.

Cloud Logging Stream Registry

Interactive target lookup matrix for platform validation diagnostics

Log Stream Target
Telemetry Contents
Capture Type
cloudsql.googleapis.com/mysql.err
Engine diagnostics, clean exits, InnoDB checkpoints, replication sync.
MySQL Layer
cloudaudit.googleapis.com/system_event
Platform-directed infrastructure shifts, automated maintenance windows.
GCP Platform
cloudaudit.googleapis.com/activity
User-driven API configurations, CLI mutations, hard operational restarts.
User Action
Global Search (No log_name Filter)
Host OS kernel events, severe out-of-memory structural drops, low-level crashes.
Kernel Layer

The following log streams are the primary sources of evidence during Cloud SQL restart investigations:

Log Stream What It Records Captures
cloudsql.googleapis.com/mysql.err MySQL engine messages: startup, shutdown, InnoDB, replication, errors Clean shutdowns, crash recovery, MySQL-layer failures
cloudaudit.googleapis.com/system_event GCP platform actions: maintenance, failover, auto-restart All GCP-managed planned operations
cloudaudit.googleapis.com/activity User/API-initiated actions: restarts, config changes, deletions Manual actions via Console, CLI, or API
(No log_name filter — full-text search) All streams, including OS/kernel-level messages OOM kills, host-level events not written by MySQL
cloudsql.googleapis.com/mysql-slow.log Slow queries (if enabled) Useful for correlating workload spikes before a crash

Step-by-Step Investigation Workflow

The workflow below is ordered to eliminate planned restart scenarios first before investigating potential failures. In many cases, you will find the answer within the first few steps.

RESTART TRIAGE 1 GCP Event 2 Manual 3 Clean Exit 4 Crash Rec. 5 Broad Srch

Step 1: Check for a GCP-managed event

Rules out planned platform operations.

Review the system_event logs for maintenance operations, failovers, and platform-initiated restarts.

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT:INSTANCE"
log_name="projects/PROJECT/logs/cloudaudit.googleapis.com%2Fsystem_event"

Look for: Maintenance started, Maintenance completed, DatabaseInstanceRestart, Failover.

Step 2: Check for a manual restart via Console or CLI

Identifies human-initiated planned actions.

If someone on your team restarted the instance manually, it shows up in the activity audit log, not the system event log.

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT:INSTANCE"
log_name="projects/PROJECT/logs/cloudaudit.googleapis.com%2Factivity"

Look for: cloudsql.instances.restart, cloudsql.instances.update, the principal email that triggered it, and the exact timestamp.

Step 3: Check the MySQL error log for a clean shutdown message

Confirms a graceful exit regardless of who triggered it.

Whether the restart came from GCP maintenance or a manual action, MySQL should have written a clean shutdown message to mysql.err if it exited properly.

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT:INSTANCE"
log_name="projects/PROJECT/logs/cloudsql.googleapis.com%2Fmysql.err"
"Shutdown complete" OR "Received SIGTERM" OR "Normal shutdown"

If this message is present, the database process was not killed hard.

Step 4: Check for InnoDB crash recovery on the next startup

Catches unclean shutdown indicators and MySQL-level InnoDB crash causes.

Crash recovery indicates that the previous shutdown was not clean. If no planned event is found in Steps 1–3, this confirms an unplanned restart.

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT:INSTANCE"
log_name="projects/PROJECT/logs/cloudsql.googleapis.com%2Fmysql.err"
"Starting crash recovery" OR "Log scan progressed past the checkpoint"
OR "InnoDB: Mutexes and rw_locks use" OR "crash recovery" OR "Assertion failure" OR
"mysqld got signal" OR "InnoDB: Corruption"
OR "Table is marked as crashed" OR "error writing"

If internal failure signatures are found here, they usually point directly to the root cause (e.g., storage, corruption, or engine failure).

Step 5: Broad text search - no log_name filter

Catches OOM kills, kernel events, and host-layer messages.

Use a broad search without a log_name filter to identify host-level events.

resource.type="cloudsql_database"
resource.labels.database_id="PROJECT:INSTANCE"
"OOM" OR "Out of memory" OR "OOM-killer" OR "Killed process"
OR "killed database process" OR "SIGKILL" OR "fatal signal"
OR "Aborted" OR "host failure" OR "disk full"
CRITICAL: Do not add log_name to this query. Pinning the log name will cause these events to be missed entirely, as they are generated by the host kernel and will not appear in mysql.err or standard audit logs.

Adjust the time window to the period surrounding the restart. If this surfaces a hit, that message tells you the cause category; then look at metrics to understand the underlying pressure.

What the Evidence Tells You

Log entries rarely tell the full story in isolation. The table below summarizes the most common evidence patterns and the conclusions you can draw from them.

Evidence Found Conclusion
System event shows maintenance/failover + clean shutdown in mysql.err The restart was initiated by Google Cloud as part of a planned operation.
Activity log shows manual restart by a principal The restart was manually triggered through the Console, CLI, API, or automation.
Crash recovery on restart + OOM in broad text search The instance experienced an unplanned shutdown caused by memory pressure.
mysql.err shows assertion failure or InnoDB error (with or without crash recovery indicators) The restart originated from a MySQL-level failure.
Crash recovery + disk full or read-only filesystem error The restart was associated with storage or filesystem-related issues.
Crash recovery but nothing in broad text search or mysql.err The shutdown was unplanned, but the root cause is not visible in the available logs and may have occurred at the infrastructure layer.
Nothing in any stream — instance just came back The available logs are insufficient to determine the cause. Expand the time range and correlate with Cloud Monitoring metrics.

Final Thoughts

Most database restarts leave enough evidence across audit logs, mysql.err, and broad Cloud Logging searches to give you a clear answer. When logs come up short, Cloud Monitoring metrics usually fill the gap, memory and connection charts do not depend on log propagation.

While these steps won't cover every single edge case, following this structured workflow will help you find the right answers faster than starting from scratch each time. For more tips on managing database anomalies, explore our other database technical blogs.

Need Expert Help Managing Your Cloud SQL Environment?

Unplanned downtime hurts business. If your team is spending too much time investigating database crashes, optimizing slow queries, or managing infrastructure limits, Mydbops can help. We provide 24/7 proactive monitoring, root cause analysis, and tailored scaling strategies for your GCP Cloud SQL environments.

No items found.

About the Author

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.