René's URL Explorer Experiment


Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python · GitHub

Open Graph Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python

X Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python

Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unhandled exceptions that actually turned ou...

Open Graph Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unha...

X Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz projec...

Opengraph URL: https://github.com/html5lib/html5lib-python/issues/568

X: @github

direct link

Domain: patch-diff.githubusercontent.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Fuzzing reveals a number of parse errors","articleBody":"I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unhandled exceptions that actually turned out to be problems in html5lib. There wasn't much I could do with these errors, but now that it looks like html5lib maintenance is picking up, I can pass them on to you. (Sorry. :crying_cat_face:)\r\n\r\nI've incorporated the fuzz reports into [the Beautiful Soup test suite](https://git.launchpad.net/beautifulsoup/tree/bs4/tests/test_fuzz.py), and [the test cases themselves are here](https://git.launchpad.net/beautifulsoup/tree/bs4/tests/fuzz), but here's a general picture of what problems I see. In each case, I believe just parsing the bad markup is enough to trigger the error.\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-4999465949331456**\r\n\r\nMarkup: `b')\u003ca\u003e\u003cmath\u003e\u003cTR\u003e\u003ca\u003e\u003cmI\u003e\u003ca\u003e\u003cp\u003e\u003ca\u003e'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml\u003e, node = \u003cp\u003e, refNode = None\r\n\r\n    def insertBefore(self, node, refNode):\r\n\u003e       index = self.element.index(refNode.element)\r\nE       AttributeError: 'NoneType' object has no attribute 'element'\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-5843991618256896**\r\n\r\nMarkup: `b'-\u003cmath\u003e\u003csElect\u003e\u003cmi\u003e\u003csElect\u003e\u003csElect\u003e'`\r\n\r\nError:\r\n\r\n```\r\n    def resetInsertionMode(self):\r\n    ...\r\n            # Check for conditions that should only happen in the innerHTML\r\n            # case\r\n            if nodeName in (\"select\", \"colgroup\", \"head\", \"html\"):\r\n\u003e               assert self.innerHTML\r\nE               AssertionError\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-6241471367348224**\r\n\r\nMarkup: `b'ñ\u003ctable\u003e\u003csvg\u003e\u003chtml\u003e'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml5lib.html5parser.getPhases.\u003clocals\u003e.InTablePhase object at 0x7f8f405ad440\u003e\r\n\r\n    def processEOF(self):\r\n        if self.tree.openElements[-1].name != \"html\":\r\n            self.parser.parseError(\"eof-in-table\")\r\n        else:\r\n\u003e           assert self.parser.innerHTML\r\nE           AssertionError\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-6600557255327744**\r\n\r\nMarkup: `b'\\t\u003cTABLE\u003e\u003c\u003c!\u003e;\u003c!\u003e\u003c\u003c!\u003e.\u003clec\u003e\u003cth\u003ei\u003e\u003ca\u003e\u003cmat\\x00\\x01\u003cmi\\x00a\u003e\u003cmath\u003e\u003e\u003cth\u003e\u003cmI\u003echardeta\\xff\\xff\\xff\\xff\u003c\u003e\u003cth\u003e\u003cmI\u003e\u003c||||||||A\u003cselect\u003e\u003c\u003equ?\\xbemath\u003e\u003cth\u003e\u003cmie\u003equ'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml5lib.html5parser.getPhases.\u003clocals\u003e.InTableBodyPhase object at 0x7f8f4184ce00\u003e\r\n\r\n    def clearStackToTableBodyContext(self):\r\n        while self.tree.openElements[-1].name not in (\"tbody\", \"tfoot\",\r\n                                                      \"thead\", \"html\"):\r\n            # self.parser.parseError(\"unexpected-implied-end-tag-in-table\",\r\n            #  {\"name\": self.tree.openElements[-1].name})\r\n            self.tree.openElements.pop()\r\n        if self.tree.openElements[-1].name == \"html\":\r\n\u003e           assert self.parser.innerHTML\r\nE           AssertionError\r\n```\r\n\r\nAlso reported to me recently was the issue that was reported to you as issue #557.","author":{"url":"https://github.com/leonardr","@type":"Person","name":"leonardr"},"datePublished":"2023-03-20T13:40:48.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/568/html5lib-python/issues/568"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:4c5d5fdf-abc7-2f26-8bb5-9239570d6ba0
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idD70C:317C8C:2861B71:37B3AAB:6972A0EE
html-safe-nonce12e34f95b12e96403212ae445e68d8ff7eca4ece83b41d44e1083b18e124dc8b
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJENzBDOjMxN0M4QzoyODYxQjcxOjM3QjNBQUI6Njk3MkEwRUUiLCJ2aXNpdG9yX2lkIjoiNTkxMTg0MzYzNzk3OTk0NzI0NiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmacc09dc75b6c8150f1c65780ccc98e374c6a8ca1263c870bcd2211affcd8449f84
hovercard-subject-tagissue:1632141214
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/html5lib/html5lib-python/568/issue_layout
twitter:imagehttps://opengraph.githubassets.com/928e1b30b067ab40ff872a9b0ce99ca7206d48b9687c95f7e25c0feff33dd537/html5lib/html5lib-python/issues/568
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/928e1b30b067ab40ff872a9b0ce99ca7206d48b9687c95f7e25c0feff33dd537/html5lib/html5lib-python/issues/568
og:image:altI'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unha...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernameleonardr
hostnamegithub.com
expected-hostnamegithub.com
None72bb1c46bb1ebdc0dc83a0a57b64c3b4d668c125d1125d94898213a4c9db8da2
turbo-cache-controlno-preview
go-importgithub.com/html5lib/html5lib-python git https://github.com/html5lib/html5lib-python.git
octolytics-dimension-user_id4092973
octolytics-dimension-user_loginhtml5lib
octolytics-dimension-repository_id9322649
octolytics-dimension-repository_nwohtml5lib/html5lib-python
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id9322649
octolytics-dimension-repository_network_root_nwohtml5lib/html5lib-python
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasee746f1a3ddb5c0a91290ff3d5889b5247e0e519e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues/568#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fhtml5lib%2Fhtml5lib-python%2Fissues%2F568
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fhtml5lib%2Fhtml5lib-python%2Fissues%2F568
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=html5lib%2Fhtml5lib-python
Reloadhttps://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues/568
Reloadhttps://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues/568
Reloadhttps://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues/568
html5lib https://patch-diff.githubusercontent.com/html5lib
html5lib-pythonhttps://patch-diff.githubusercontent.com/html5lib/html5lib-python
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fhtml5lib%2Fhtml5lib-python
Fork 302 https://patch-diff.githubusercontent.com/login?return_to=%2Fhtml5lib%2Fhtml5lib-python
Star 1.2k https://patch-diff.githubusercontent.com/login?return_to=%2Fhtml5lib%2Fhtml5lib-python
Code https://patch-diff.githubusercontent.com/html5lib/html5lib-python
Issues 81 https://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues
Pull requests 12 https://patch-diff.githubusercontent.com/html5lib/html5lib-python/pulls
Discussions https://patch-diff.githubusercontent.com/html5lib/html5lib-python/discussions
Actions https://patch-diff.githubusercontent.com/html5lib/html5lib-python/actions
Projects 0 https://patch-diff.githubusercontent.com/html5lib/html5lib-python/projects
Security 0 https://patch-diff.githubusercontent.com/html5lib/html5lib-python/security
Insights https://patch-diff.githubusercontent.com/html5lib/html5lib-python/pulse
Code https://patch-diff.githubusercontent.com/html5lib/html5lib-python
Issues https://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues
Pull requests https://patch-diff.githubusercontent.com/html5lib/html5lib-python/pulls
Discussions https://patch-diff.githubusercontent.com/html5lib/html5lib-python/discussions
Actions https://patch-diff.githubusercontent.com/html5lib/html5lib-python/actions
Projects https://patch-diff.githubusercontent.com/html5lib/html5lib-python/projects
Security https://patch-diff.githubusercontent.com/html5lib/html5lib-python/security
Insights https://patch-diff.githubusercontent.com/html5lib/html5lib-python/pulse
New issuehttps://patch-diff.githubusercontent.com/login?return_to=https://github.com/html5lib/html5lib-python/issues/568
New issuehttps://patch-diff.githubusercontent.com/login?return_to=https://github.com/html5lib/html5lib-python/issues/568
Fuzzing reveals a number of parse errorshttps://patch-diff.githubusercontent.com/html5lib/html5lib-python/issues/568#top
https://github.com/leonardr
https://github.com/leonardr
leonardrhttps://github.com/leonardr
on Mar 20, 2023https://github.com/html5lib/html5lib-python/issues/568#issue-1632141214
the Beautiful Soup test suitehttps://git.launchpad.net/beautifulsoup/tree/bs4/tests/test_fuzz.py
the test cases themselves are herehttps://git.launchpad.net/beautifulsoup/tree/bs4/tests/fuzz
#557https://github.com/html5lib/html5lib-python/issues/557
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.