Skip to content

Commit e8de717

Browse files
authored
Add support for moving URLs and setting canonical URL correctly (#4115)
1 parent c61dd55 commit e8de717

File tree

4 files changed

+65
-1
lines changed

4 files changed

+65
-1
lines changed

app/_plugins/generators/canonical_url_generator.rb

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
11
# frozen_string_literal: true
22

3+
require 'yaml'
34
module CanonicalUrl
45
class Generator < Jekyll::Generator # rubocop:disable Metrics/ClassLength
56
priority :low
67
def generate(site)
8+
# We need to keep track of renamed pages in some cases
9+
# Usually when we rework the IA between major releases but want to keep
10+
# the SEO juice for a given page
11+
@moved_pages = YAML.load_file("#{__dir__}/../../moved_urls.yml")
12+
713
# Generate the all_pages entres for the Plugin Hub
814
all_pages = generate_plugin_hub(site)
915

@@ -108,6 +114,11 @@ def set_canonical_and_noindex(site, all_pages) # rubocop:disable Metrics/AbcSize
108114
url = page.url.gsub(url_segments[1], 'VERSION')
109115
urls_to_check << url
110116

117+
# If a page has been renamed between versions, then we need to check
118+
# for the new URL in later versions too
119+
moved_url = resolve_moved_url(url)
120+
urls_to_check << moved_url if moved_url
121+
111122
# As before, legacy endpoints might match newer /gateway/ URLs so
112123
# we also need to check for the path under the /gateway/ docs too
113124
legacy_gateway_endpoints = ['/gateway-oss/', '/enterprise/']
@@ -117,11 +128,18 @@ def set_canonical_and_noindex(site, all_pages) # rubocop:disable Metrics/AbcSize
117128

118129
# There will usually only be one URL to check, but gateway-oss
119130
# and enterprise URLs will contain two here, so we have to loop
131+
latest_version = to_version('0.0.x')
132+
canonical_url = nil
133+
120134
urls_to_check.each do |u|
121135
# Otherwise look up the URL and link to the latest version
122136
matching_url = all_pages[u]
123-
page.data['canonical_url'] = matching_url['url'] if matching_url
137+
next unless matching_url && matching_url['version'] > latest_version
138+
139+
latest_version = matching_url['version']
140+
canonical_url = matching_url['url']
124141
end
142+
page.data['canonical_url'] = canonical_url if canonical_url
125143

126144
# If a page has a canonical URL and is not the /latest/ page,
127145
# we don't want it in the sitemap or indexable by Google
@@ -138,6 +156,17 @@ def set_canonical_and_noindex(site, all_pages) # rubocop:disable Metrics/AbcSize
138156
end
139157
end
140158

159+
def resolve_moved_url(url)
160+
resolved_url = nil
161+
loop do
162+
url = @moved_pages[url]
163+
break unless url
164+
165+
resolved_url = url
166+
end
167+
resolved_url
168+
end
169+
141170
def versioned_page?(url_segments)
142171
/^\d+\.\d+\.x$/.match(url_segments[1]) || url_segments[1] == 'latest'
143172
end

app/moved_urls.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
---
2+
/gateway-oss/VERSION/configuration/: "/gateway/VERSION/reference/configuration/"

docs/seo.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# SEO
2+
3+
The canonical URL for each page is automatically set by the `canonical_url_generator` plugin. It works by looking for a URL that matches the current page, but where the version is higher than the current URL. As an example:
4+
5+
1. `/gateway/2.3.x/configuration` would have a canonical URL of `/gateway/2.5.x/configuration`. `2.5.x` is the last version in which this URL existed.
6+
7+
2. In `2.6.x` the page moved to `/gateway/2.6.x/reference/configuration`, which means the canonical URL would normally be `/gateway/2.8.x/reference/configuration`.
8+
9+
3. _However_, we have a special `/latest/` URL which is always the latest version, so the canonical URL for the 2.6.x link above would actually be `/gateway/latest/reference/configuration`.
10+
11+
## Tracking file renames
12+
13+
It is possible to track pages through renames using the `moved_urls.yml` file. This is a key:value file that contains the old URL and the URL that it should be mapped to. e.g.
14+
15+
```yaml
16+
---
17+
/gateway-oss/VERSION/configuration/: "/gateway/VERSION/reference/configuration/" # 2.5.x
18+
```
19+
20+
It is possible for a URL to be forwarded multiple times. In this instance, the final URL will be set as canonical on all pages rather than creating a canonical chain:
21+
22+
```yaml
23+
---
24+
/gateway-oss/VERSION/configuration/: "/foo/bar/" # 2.5.x
25+
/foo/bar/: "/gateway/VERSION/reference/configuration/" # 4.2.x
26+
```
27+
28+
Each line ends with a comment. This is the latest version that uses the source URL. This is useful to keep track of, as it allows us to remove entries from `moved_urls.yml` when archiving old content. In this example, when `2.5.x` is archived we can remove the `/gateway-oss/VERSION/configuration/` canonical redirect.

tests/seo.test.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,11 @@ test.describe("Canonical links", () => {
4040
src: "/hub/",
4141
href: "/hub/",
4242
},
43+
{
44+
title: "page using moved_urls.yml to track renamed files",
45+
src: "/gateway-oss/2.5.x/configuration/",
46+
href: "/gateway/latest/reference/configuration/",
47+
},
4348
].forEach((t) => {
4449
test(t.title, async ({ page }) => {
4550
await page.goto(t.src);

0 commit comments

Comments
 (0)