Troubleshooting Elasticsearch Upgrades#6396
Conversation
Elastic Docs AI PR menuCheck the box to run an AI review for this pull request.
Powered by GitHub Agentic Workflows and docs-actions. For more information, reach out to the docs team. |
There was a problem hiding this comment.
Docs review summary
Focus areas
- Style and clarity: Typos found in two files (
earlier ersioninelasticsearch.md,due dilligencein the new file); grammar issue indiscovery-troubleshooting.md. "Kindly" should be dropped per style guide. - Jargon: No unexplained jargon introduced.
- Frontmatter and applies_to:
applies_to: stack:is missing a lifecycle value (e.g.,ga) in the new file — this will likely fail validation or render incorrectly. - Content type fit: The new page is declared
type: troubleshooting, which is appropriate. However, the required Symptoms and Resolution sections are entirely unfilled template placeholders. The Resolution section still contains the literal template stepper block. The page is linked from the upgrade guide, meaning users who follow that tip will land on an incomplete page. - Parent issue satisfaction: This PR is a follow-up to #5848. The page structure is in place but the core content (Symptoms, Resolution) is not yet written, so the issue is partially satisfied.
Notes
- The empty
##heading on line 18 oftroubleshooting-upgrades.mdwill likely cause a build or rendering failure — this is the highest-priority fix. - Vale found no findings (eligible-files list was empty in the pre-fetch), so no Vale-sourced nits to report.
Generated by Docs review agent for issue #6396 · ● 325.5K
| description: "Common upgrade issues and resolutions." | ||
| type: troubleshooting | ||
| applies_to: | ||
| stack: |
There was a problem hiding this comment.
applies_to: stack: is missing a lifecycle value. Refer to the cumulative-docs reference for valid values (e.g., ga, beta, coming). Without a value this will likely render incorrectly or fail validation.
|
|
||
| # Troubleshoot upgrades [troubleshooting-upgrades] | ||
|
|
||
| Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due dilligence. |
There was a problem hiding this comment.
Typo: due dilligence → due diligence.
| Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due dilligence. | |
| Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due diligence. |
| * _(Recommended)_ **A rolling restart** | ||
|
|
||
| This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the old-version. Running more than two versions of {{es}} in the same cluster is not supported. | ||
| This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier ersion. Running more than two versions of {{es}} in the same cluster is not supported. |
There was a problem hiding this comment.
Typo: earlier ersion should be earlier version.
| This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier ersion. Running more than two versions of {{es}} in the same cluster is not supported. | |
| This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier version. Running more than two versions of {{es}} in the same cluster is not supported. |
| Master elections only involve master-eligible nodes, so focus your attention on the master-eligible nodes in this situation. These nodes' logs indicate the requirements for a master election, such as the discovery of a certain set of nodes. The [Health]({{es-apis}}operation/operation-health-report) API on these nodes also provides useful information about the situation. | ||
| If there is no elected master node and no node can win an election, all nodes repeatedly log messages about the problem using a [logger](/deploy-manage/monitor/logging-configuration.md) called `org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper`. By default, this happens every 10 seconds. | ||
|
|
||
| During this time the {{es}} will induce `MasterNotDiscoveredException` errors and which its API will report like: |
There was a problem hiding this comment.
Grammar issue: the {{es}} will induce ... errors and which its API will report like: has two problems — the article "the" before {{es}} and the spurious "and which".
Suggested rewrite:
| During this time the {{es}} will induce `MasterNotDiscoveredException` errors and which its API will report like: | |
| During this time, {{es}} returns `MasterNotDiscoveredException` errors. Its API reports: |
| This guide outlines {{es}} logs which indicate either upgrade blocking issues or fatal node start-up errors. | ||
|
|
||
|
|
||
| ## |
There was a problem hiding this comment.
This H2 heading is empty (## with no title). It should either be given a title (e.g., ## Monitor upgrade progress) or removed. An untitled heading will also likely fail docs build validation.
| {{es}} maintains the data in the data paths of the older nodes and will recover the cluster to health using this data after the nodes are fully upgraded. Therefore, to bring these nodes back into the cluster, upgrade them. | ||
|
|
||
| :::{note} :applies_to: { ece:, ess: } | ||
| Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, kindly [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment. |
There was a problem hiding this comment.
Avoid "kindly" — the Elastic style guide treats it the same as "please", which should be omitted unless asking users to wait or tolerate inconvenience.
| Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, kindly [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment. | |
| Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment. |
| Avoid linking to GitHub issues, pull requests, or internal discussions. Resources should be stable, user-facing documentation. | ||
| --> | ||
|
|
||
| - [Related documentation link] |
There was a problem hiding this comment.
Placeholder links should be filled in with real targets or removed before publishing:
[Related documentation link][Contrib/upstream reference]
| In a testing or development environment with only one or two master-eligible nodes, you cannot avoid stopping half or more of the master-eligible nodes, so the cluster will always become unavailable at some point during the upgrade. When you restart the master-eligible nodes after this unavailability, the cluster will re-form with a single upgraded node, which is therefore fully-upgraded and will reject older nodes' attempts to re-join the cluster. Upgrade the master-eligible nodes last to avoid these rejections. | ||
|
|
||
|
|
||
| ## Symptoms |
There was a problem hiding this comment.
The required Symptoms and Resolution sections (and the optional Diagnosis, Best practices, Resources sections) contain only template placeholder comments. The resolution section still has the literal stepper code block from the template. These need to be filled in before the page goes live — the page is currently non-functional for users who land on it from the link added in deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md.
Summary
Follow-up of #5848 to create separate troubleshooting page.
Generative AI disclosure