René's URL Explorer Experiment


Title: validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels) · Issue #78 · python-validators/validators · GitHub

Open Graph Title: validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels) · Issue #78 · python-validators/validators

X Title: validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels) · Issue #78 · python-validators/validators

Description: As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing valid IDNs in A-label form: In [1]: import v...

Open Graph Description: As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing vali...

X Description: As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing vali...

Opengraph URL: https://github.com/python-validators/validators/issues/78

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels)","articleBody":"As the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing valid IDNs in A-label form:\r\n\r\n```\r\nIn [1]: import validators\r\nIn [2]: validators.url('http://xn--j1ail.xn--p1ai')\r\nOut[2]: ValidationFailure(func=url, args={'public': False, 'value': 'http://xn--j1ail.xn--p1ai'})\r\n```\r\n\r\nThis failure is caused by the fact that the regex for validators.url only allows for repetition of hyphens as part of larger groups within the host and domain name sections. These groups must begin with a non-hyphen character, thus preventing sequential hyphens. For the TLD section no such group even exists; hyphens aren't permitted at all. The relevant portion of the regex is found on lines 36-41 of url.py:\r\n\r\n```\r\n# host name\r\nu\"(?:(?:[a-z\\u00a1-\\uffff0-9]-?)*[a-z\\u00a1-\\uffff0-9]+)\"\r\n# domain name\r\nu\"(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-?)*[a-z\\u00a1-\\uffff0-9]+)*\"\r\n# TLD identifier\r\nu\"(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\"\r\n```\r\n\r\nThe issue also occurs when processing URLs of valid domains that have consecutive hyphens in their name. While such domain names are less common and may be frowned upon by certain registries, they are still technically valid according to the RFC. Here are the dig and whois results for one such domain:\r\n\r\n```\r\n; \u003c\u003c\u003e\u003e DiG 9.10.3-P4-Ubuntu \u003c\u003c\u003e\u003e @8.8.8.8 online--trading.com\r\n; (1 server found)\r\n;; global options: +cmd\r\n;; Got answer:\r\n;; -\u003e\u003eHEADER\u003c\u003c- opcode: QUERY, status: NOERROR, id: 31443\r\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1\r\n\r\n;; OPT PSEUDOSECTION:\r\n; EDNS: version: 0, flags:; udp: 512\r\n;; QUESTION SECTION:\r\n;online--trading.com.\t\tIN\tA\r\n\r\n;; ANSWER SECTION:\r\nonline--trading.com.\t899\tIN\tA\t195.110.124.133\r\n\r\n;; Query time: 167 msec\r\n;; SERVER: 8.8.8.8#53(8.8.8.8)\r\n;; WHEN: Tue Apr 03 15:03:25 PDT 2018\r\n;; MSG SIZE  rcvd: 64\r\n```\r\n\r\n```\r\nDomain Name: ONLINE--TRADING.COM\r\nRegistry Domain ID: 2171387112_DOMAIN_COM-VRSN\r\nRegistrar WHOIS Server: whois.register.it\r\nRegistrar URL: http://www.register.it\r\nUpdated Date: 2017-10-06T18:54:58Z\r\nCreation Date: 2017-10-06T18:54:58Z\r\nRegistry Expiry Date: 2018-10-06T18:54:58Z\r\nRegistrar: Register.it SPA\r\nRegistrar IANA ID: 168\r\nRegistrar Abuse Contact Email: abuse@register.it\r\nRegistrar Abuse Contact Phone: +39.5520021555\r\nDomain Status: ok https://icann.org/epp#ok\r\nName Server: NS1.REGISTER.IT\r\nName Server: NS2.REGISTER.IT\r\nDNSSEC: unsigned\r\nURL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/\r\n```\r\n\r\nIt's arguable whether domains like this _should_ pass validators.url since they're somewhat of an edge case for everyday users. It may not be worth letting potentially erroneous URLs through just to prevent a few oddball domains from failing validation. The IDNA A-labels are a different story though -- those should absolutely pass without requiring the user to convert them beforehand. Python's built-in IDNA decoder cannot properly convert IDNA domains that are contained within URLs, so it's fairly onerous to expect the user to do that before using validators.url.  \r\n\r\nModifying the regex to match anything that follows the IDNA A-label format is not an ideal solution since invalid A-labels can be generated using valid characters (e.g. \"xn--aaaa\"). Since the existing regex already checks for the Unicode characters used by IDNA U-labels, I think the ideal solution would be to isolate and convert possible IDNA hostnames before reassembling the URL and matching it against the existing regex. I've made a version of url.py that should make this fairly painless; expect my PR shortly.","author":{"url":"https://github.com/nullripper","@type":"Person","name":"nullripper"},"datePublished":"2018-04-04T01:51:24.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/78/validators/issues/78"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:41af6a61-9171-7820-42be-2eeeead782a3
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idA0EE:24665B:107B829:15322D8:69924C35
html-safe-nonceb301d01b6832376fcfcd518d93734281ab77710e7683b1403480f67a4d1b1f3c
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBMEVFOjI0NjY1QjoxMDdCODI5OjE1MzIyRDg6Njk5MjRDMzUiLCJ2aXNpdG9yX2lkIjoiNDg5NjAzMDI0MTIyMTM5NzU1NyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmaccbe4f2a1c41215812269b897f311c03a00742aa7c3effce14e8576111c1510f7
hovercard-subject-tagissue:311057259
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python-validators/validators/78/issue_layout
twitter:imagehttps://opengraph.githubassets.com/9666ee77a0fbcefda90e233fb1eeccb4b8ee671336e53cdc2e13a13edfb54a28/python-validators/validators/issues/78
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/9666ee77a0fbcefda90e233fb1eeccb4b8ee671336e53cdc2e13a13edfb54a28/python-validators/validators/issues/78
og:image:altAs the title implies, validators.url chokes on URLs that contain a domain, hostname, or TLD with two or more consecutive hyphens. The issue is most troublesome when it involves URLs containing vali...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamenullripper
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-preview
go-importgithub.com/python-validators/validators git https://github.com/python-validators/validators.git
octolytics-dimension-user_id113113270
octolytics-dimension-user_loginpython-validators
octolytics-dimension-repository_id13642984
octolytics-dimension-repository_nwopython-validators/validators
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id13642984
octolytics-dimension-repository_network_root_nwopython-validators/validators
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release848bc6032dcc93a9a7301dcc3f379a72ba13b96e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/python-validators/validators/issues/78#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython-validators%2Fvalidators%2Fissues%2F78
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython-validators%2Fvalidators%2Fissues%2F78
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=python-validators%2Fvalidators
Reloadhttps://github.com/python-validators/validators/issues/78
Reloadhttps://github.com/python-validators/validators/issues/78
Reloadhttps://github.com/python-validators/validators/issues/78
python-validators https://github.com/python-validators
validatorshttps://github.com/python-validators/validators
Notifications https://github.com/login?return_to=%2Fpython-validators%2Fvalidators
Fork 171 https://github.com/login?return_to=%2Fpython-validators%2Fvalidators
Star 1.1k https://github.com/login?return_to=%2Fpython-validators%2Fvalidators
Code https://github.com/python-validators/validators
Issues 4 https://github.com/python-validators/validators/issues
Pull requests 4 https://github.com/python-validators/validators/pulls
Discussions https://github.com/python-validators/validators/discussions
Actions https://github.com/python-validators/validators/actions
Projects 0 https://github.com/python-validators/validators/projects
Wiki https://github.com/python-validators/validators/wiki
Security 0 https://github.com/python-validators/validators/security
Insights https://github.com/python-validators/validators/pulse
Code https://github.com/python-validators/validators
Issues https://github.com/python-validators/validators/issues
Pull requests https://github.com/python-validators/validators/pulls
Discussions https://github.com/python-validators/validators/discussions
Actions https://github.com/python-validators/validators/actions
Projects https://github.com/python-validators/validators/projects
Wiki https://github.com/python-validators/validators/wiki
Security https://github.com/python-validators/validators/security
Insights https://github.com/python-validators/validators/pulse
New issuehttps://github.com/login?return_to=https://github.com/python-validators/validators/issues/78
New issuehttps://github.com/login?return_to=https://github.com/python-validators/validators/issues/78
#245https://github.com/python-validators/validators/pull/245
validators.url fails any URL whose FQDN includes consecutive hyphens (e.g. IDNA A-labels)https://github.com/python-validators/validators/issues/78#top
#245https://github.com/python-validators/validators/pull/245
bugIssue: Works not as designedhttps://github.com/python-validators/validators/issues?q=state%3Aopen%20label%3A%22bug%22
outdatedIssue/PR: Open for more than 3 monthshttps://github.com/python-validators/validators/issues?q=state%3Aopen%20label%3A%22outdated%22
https://github.com/nullripper
https://github.com/nullripper
nullripperhttps://github.com/nullripper
on Apr 4, 2018https://github.com/python-validators/validators/issues/78#issue-311057259
bugIssue: Works not as designedhttps://github.com/python-validators/validators/issues?q=state%3Aopen%20label%3A%22bug%22
outdatedIssue/PR: Open for more than 3 monthshttps://github.com/python-validators/validators/issues?q=state%3Aopen%20label%3A%22outdated%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.