René's URL Explorer Experiment
Title: GH-103484: Fix broken links reported by linkcheck by rffontenelle · Pull Request #103608 · python/cpython · GitHub
Open Graph Title: GH-103484: Fix broken links reported by linkcheck by rffontenelle · Pull Request #103608 · python/cpython
X Title: GH-103484: Fix broken links reported by linkcheck by rffontenelle · Pull Request #103608 · python/cpython
Description: This is another patch required to fix the current state of make linkcheck in Python Docs, see #103484.
This pull request fixes some broken links reported by make linkcheck. The backport to 3.11 has a few differences and I have already a patch ready for it, just waiting for any change in this one.
Find below the reported error and what solution I applied in this PR:
distributing/index.rst:128: [broken] https://packaging.python.org/tutorials/packaging-projects/#creating-the-package-files:
distributing/index.rst:127: [broken] https://packaging.python.org/tutorials/packaging-projects/#packaging-python-projects:
distributing/index.rst:129: [broken] https://packaging.python.org/tutorials/packaging-projects/#uploading-the-distribution-archives:
distributing/index.rst:130: [broken] https://packaging.python.org/specifications/pypirc/:
The link is fine, but for some reason a newline in the doc resulted in being considered as broken for linkcheck, even though it is not broken in the documentation. I removed that newline and this made linkcheck happy.
library/stdtypes.rst:1607: [broken] http://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G53253: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
library/stdtypes.rst:1767: [broken] https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf#G91002: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
library/stdtypes.rst:1906: [broken] https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G34078: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
That's sphinx-doc/sphinx#11041. I removed the anchor and added the section name next to it's [section] number so the reader has no doubt of what section the text is talking about.
whatsnew/changelog.rst:18176: [broken] https://: Invalid URL 'https://': No host supplied
This is the code sample urllib.request.urlopen('https://...'). at Misc/NEWS.d/3.9.0a1.rst. Added it to ignored list as 'https:\/\/$' ($ to not match any other link).
howto/urllib2.rst:457: [broken] http://www.voidspace.org.uk/python/articles/authentication.shtml: HTTPConnectionPool(host='www.voidspace.org.uk', port=80): Max retries exceeded with url: /python/articles/authentication.shtml (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
www.voidspace.org.uk is down, so I replaced it with a Wayback Machine link. There was a code sample that also used this broken link and I replaced with a valid link: http://www.python.org following the previous example in the same file
whatsnew/2.6.rst:174: [broken] http://www.upfrontsoftware.co.za: HTTPSConnectionPool(host='www.upfrontsoftware.co.za', port=443): Max retries exceeded with url: / (Caused by SSLError(CertificateError("hostname 'www.upfrontsoftware.co.za' doesn't match either of 'agibase.com', 'gazette.co.za', 'icinga.siyavula.com', 'icinga.upfronthosting.co.za', 'lists.agibase.com', 'test.agibase.com', 'upfrontsoftware.co.za', 'www.agibase.com', 'www.gazette.co.za'")))
Fix removing 'www', replacing with https://upfrontsoftware.co.za
using/mac.rst:20: [broken] https://developer.apple.com/documentation/macos-release-notes/macos-12_3-release-notes#Python: Anchor 'Python' not found
bugs.rst:39: [broken] https://devguide.python.org/docquality/#helping-with-documentation: Anchor 'helping-with-documentation' not found
whatsnew/3.4.rst:1962: [broken] https://devguide.python.org/coverage/#measuring-coverage-of-c-code-with-gcov-and-lcov: Anchor 'measuring-coverage-of-c-code-with-gcov-and-lcov' not found
bugs.rst:41: [broken] https://devguide.python.org/documenting/#translating: Anchor 'translating' not found
library/gc.rst:101: [broken] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation: Anchor 'collecting-the-oldest-generation' not found
using/unix.rst:69: [broken] https://devguide.python.org/setup/#get-the-source-code: Anchor 'get-the-source-code' not found
These links lead to the expected anchors without issue, so I added ignore entries to these links.
whatsnew/3.8.rst:77: [broken] https://en.wikipedia.org/wiki/Walrus#/media/File:Pacific_Walrus_-_Bull_(8247646168).jpg: Anchor '/media/File:Pacific_Walrus_-_Bull_(8247646168).jpg' not found
The link is works, but the #/... is considered by linkcheck as invalid anchor. Added it to ignored anchors.
whatsnew/changelog.rst:16408: [broken] https://fishshell.com/docs/current/commands.html#source: Anchor 'source' not found
The source command is now on another page, so I updated the URL.
whatsnew/3.11.rst:1320: [broken] https://github.com/faster-cpython/ideas#published-results: Anchor 'published-results' not found
Anchors from Markdown files in GitHub repositories are not recognized, even though they work just fine. Hence I added this case to ignored links.
whatsnew/changelog.rst:15577: [broken] https://importlib-metadata.readthedocs.io/en/latest/changelog (links).html#v1-5-0: 404 Client Error: Not Found for url: https://importlib-metadata.readthedocs.io/en/latest/changelog (links).html
Updated the URL with the new page containing the versions history.
howto/functional.rst:1210: [broken] https://mitpress.mit.edu/sicp/: 404 Client Error: Not Found for url: https://mitpress.mit.edu/sicp/
Removing the trailing '/' solves the 404 Client Error.
However, there is another issue: The book is no longer freely available (wayback machine disagrees), so I updated the text to say "The book can be found at" instead of "Full text at".
whatsnew/2.7.rst:2105: [broken] https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html: 404 Client Error: Not Found for url: https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html
Used Wayback Machine because the paragraph mentions GDB 7, so I linked to the latest GDB online docs available, from 2011.
using/windows.rst:554: [broken] https://support.enthought.com/hc/en-us/articles/360038600051-Canopy-GUI-end-of-life-transition-to-the-Enthought-Deployment-Manager-EDM-and-Visual-Studio-Code: 403 Client Error: Forbidden for url: https://support.enthought.com/hc/en-us/articles/360038600051-Canopy-GUI-end-of-life-transition-to-the-Enthought-Deployment-Manager-EDM-and-Visual-Studio-Code
Looks like crawling in this website is not allowed: link is ok in the browser, but fails with curl or sphinx's linkcheck. Added to ignored links.
library/readline.rst:20: [broken] https://tiswww.cwru.edu/php/chet/readline/rluserman.html#SEC9: Anchor 'SEC9' not found
The SEC9 anchor was about "Readline Init File" (wayback machine link). I updated the anchor to match the same subject in the updated documentation.
faq/library.rst:780: [broken] https://twistedmatrix.com/trac/: 404 Client Error: Not Found for url: https://twisted.org/trac/
Updated URL to https://twisted.org/
whatsnew/2.5.rst:879: [broken] https://unix.org/version2/whatsnew/lp64_wp.html: HTTPSConnectionPool(host='unix.org', port=443): Max retries exceeded with url: /version2/whatsnew/lp64_wp.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1129)')))
GoDaddy-hosted website, and GoDaddy's certificate chain is not installed causing curl and linkcheck to fail. Using a web browser works, though. So I added this to the ignored links list.
whatsnew/changelog.rst:22339: [broken] https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_set_min_proto_version.html: 404 Client Error: Not Found for url: https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_set_min_proto_version.html
Updated the URL, using the version 1.1.1 published as the closest possible to 1.1.0 mentioned in the paragraph.
library/zipfile.rst:10: [broken] https://github.com/python/cpython/tree/main/Lib/zipfile.py: 404 Client Error: Not Found for url: https://github.com/python/cpython/tree/main/Lib/zipfile.py
zipfile is a package since #98103. This change is post-3.11, hence a backport must not include this or will cause another 'broken' entry by linkcheck.
Issue: gh-103484
Open Graph Description: This is another patch required to fix the current state of make linkcheck in Python Docs, see #103484.
This pull request fixes some broken links reported by make linkcheck. The backport to 3.11 has...
X Description: This is another patch required to fix the current state of make linkcheck in Python Docs, see #103484.
This pull request fixes some broken links reported by make linkcheck. The backport to 3.11 has...
Opengraph URL: https://github.com/python/cpython/pull/103608
X: @github
direct link
Domain: github.com
| route-pattern | /:user_id/:repository/pull/:id/checks(.:format) |
| route-controller | pull_requests |
| route-action | checks |
| fetch-nonce | v2:cbcc2126-2bd7-b9c7-865f-38e5da153f9f |
| current-catalog-service-hash | 87dc3bc62d9b466312751bfd5f889726f4f1337bdff4e8be7da7c93d6c00a25a |
| request-id | C2A6:84862:10CE630:1787A0C:696A372E |
| html-safe-nonce | 363af22f626e82fa8960bdd1d26c1a4b2d1769b1f67948a691d2c3df297c9ec9 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDMkE2Ojg0ODYyOjEwQ0U2MzA6MTc4N0EwQzo2OTZBMzcyRSIsInZpc2l0b3JfaWQiOiI3MTg3MTQxOTY3ODczNzE4MDYyIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | b1102681eb0dbb3c19e903a1359669f1d3b1099c9e7b65b83b66719b23b6fb35 |
| hovercard-subject-tag | pull_request:1318443762 |
| github-keyboard-shortcuts | repository,pull-request-list,pull-request-conversation,pull-request-files-changed,checks,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | ///pull_requests/show/checks |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/python/cpython/pull/103608/checks |
| twitter:image | https://avatars.githubusercontent.com/u/1571783?s=400&v=4 |
| twitter:card | summary_large_image |
| og:image | https://avatars.githubusercontent.com/u/1571783?s=400&v=4 |
| og:image:alt | This is another patch required to fix the current state of make linkcheck in Python Docs, see #103484.
This pull request fixes some broken links reported by make linkcheck. The backport to 3.11 has... |
| og:site_name | GitHub |
| og:type | object |
| hostname | github.com |
| expected-hostname | github.com |
| None | 321736bfdb3f591415ae895a0459bec204b26a76caf47ba5c980634cfacc4538 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive full-width full-width-p-0 |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 7a9163cefd1ea4bd06f8eb7c082f43e4e53f626f |
| ui-target | canary-2 |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width
URLs of crawlers that visited me.