Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python · GitHub
Open Graph Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python
X Title: Fuzzing reveals a number of parse errors · Issue #568 · html5lib/html5lib-python
Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unhandled exceptions that actually turned ou...
Open Graph Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unha...
X Description: I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz projec...
Opengraph URL: https://github.com/html5lib/html5lib-python/issues/568
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Fuzzing reveals a number of parse errors","articleBody":"I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unhandled exceptions that actually turned out to be problems in html5lib. There wasn't much I could do with these errors, but now that it looks like html5lib maintenance is picking up, I can pass them on to you. (Sorry. :crying_cat_face:)\r\n\r\nI've incorporated the fuzz reports into [the Beautiful Soup test suite](https://git.launchpad.net/beautifulsoup/tree/bs4/tests/test_fuzz.py), and [the test cases themselves are here](https://git.launchpad.net/beautifulsoup/tree/bs4/tests/fuzz), but here's a general picture of what problems I see. In each case, I believe just parsing the bad markup is enough to trigger the error.\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-4999465949331456**\r\n\r\nMarkup: `b')\u003ca\u003e\u003cmath\u003e\u003cTR\u003e\u003ca\u003e\u003cmI\u003e\u003ca\u003e\u003cp\u003e\u003ca\u003e'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml\u003e, node = \u003cp\u003e, refNode = None\r\n\r\n def insertBefore(self, node, refNode):\r\n\u003e index = self.element.index(refNode.element)\r\nE AttributeError: 'NoneType' object has no attribute 'element'\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-5843991618256896**\r\n\r\nMarkup: `b'-\u003cmath\u003e\u003csElect\u003e\u003cmi\u003e\u003csElect\u003e\u003csElect\u003e'`\r\n\r\nError:\r\n\r\n```\r\n def resetInsertionMode(self):\r\n ...\r\n # Check for conditions that should only happen in the innerHTML\r\n # case\r\n if nodeName in (\"select\", \"colgroup\", \"head\", \"html\"):\r\n\u003e assert self.innerHTML\r\nE AssertionError\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-6241471367348224**\r\n\r\nMarkup: `b'ñ\u003ctable\u003e\u003csvg\u003e\u003chtml\u003e'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml5lib.html5parser.getPhases.\u003clocals\u003e.InTablePhase object at 0x7f8f405ad440\u003e\r\n\r\n def processEOF(self):\r\n if self.tree.openElements[-1].name != \"html\":\r\n self.parser.parseError(\"eof-in-table\")\r\n else:\r\n\u003e assert self.parser.innerHTML\r\nE AssertionError\r\n```\r\n\r\n**clusterfuzz-testcase-minimized-bs4_fuzzer-6600557255327744**\r\n\r\nMarkup: `b'\\t\u003cTABLE\u003e\u003c\u003c!\u003e;\u003c!\u003e\u003c\u003c!\u003e.\u003clec\u003e\u003cth\u003ei\u003e\u003ca\u003e\u003cmat\\x00\\x01\u003cmi\\x00a\u003e\u003cmath\u003e\u003e\u003cth\u003e\u003cmI\u003echardeta\\xff\\xff\\xff\\xff\u003c\u003e\u003cth\u003e\u003cmI\u003e\u003c||||||||A\u003cselect\u003e\u003c\u003equ?\\xbemath\u003e\u003cth\u003e\u003cmie\u003equ'`\r\n\r\nError:\r\n\r\n```\r\nself = \u003chtml5lib.html5parser.getPhases.\u003clocals\u003e.InTableBodyPhase object at 0x7f8f4184ce00\u003e\r\n\r\n def clearStackToTableBodyContext(self):\r\n while self.tree.openElements[-1].name not in (\"tbody\", \"tfoot\",\r\n \"thead\", \"html\"):\r\n # self.parser.parseError(\"unexpected-implied-end-tag-in-table\",\r\n # {\"name\": self.tree.openElements[-1].name})\r\n self.tree.openElements.pop()\r\n if self.tree.openElements[-1].name == \"html\":\r\n\u003e assert self.parser.innerHTML\r\nE AssertionError\r\n```\r\n\r\nAlso reported to me recently was the issue that was reported to you as issue #557.","author":{"url":"https://github.com/leonardr","@type":"Person","name":"leonardr"},"datePublished":"2023-03-20T13:40:48.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/568/html5lib-python/issues/568"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:4c5d5fdf-abc7-2f26-8bb5-9239570d6ba0 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D70C:317C8C:2861B71:37B3AAB:6972A0EE |
| html-safe-nonce | 12e34f95b12e96403212ae445e68d8ff7eca4ece83b41d44e1083b18e124dc8b |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJENzBDOjMxN0M4QzoyODYxQjcxOjM3QjNBQUI6Njk3MkEwRUUiLCJ2aXNpdG9yX2lkIjoiNTkxMTg0MzYzNzk3OTk0NzI0NiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | c09dc75b6c8150f1c65780ccc98e374c6a8ca1263c870bcd2211affcd8449f84 |
| hovercard-subject-tag | issue:1632141214 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/html5lib/html5lib-python/568/issue_layout |
| twitter:image | https://opengraph.githubassets.com/928e1b30b067ab40ff872a9b0ce99ca7206d48b9687c95f7e25c0feff33dd537/html5lib/html5lib-python/issues/568 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/928e1b30b067ab40ff872a9b0ce99ca7206d48b9687c95f7e25c0feff33dd537/html5lib/html5lib-python/issues/568 |
| og:image:alt | I'm the lead developer of Beautiful Soup, which has html5lib as an optional dependency. Over the past couple of years I've gotten a number of notifications from Google's oss-fuzz project about unha... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | leonardr |
| hostname | github.com |
| expected-hostname | github.com |
| None | 72bb1c46bb1ebdc0dc83a0a57b64c3b4d668c125d1125d94898213a4c9db8da2 |
| turbo-cache-control | no-preview |
| go-import | github.com/html5lib/html5lib-python git https://github.com/html5lib/html5lib-python.git |
| octolytics-dimension-user_id | 4092973 |
| octolytics-dimension-user_login | html5lib |
| octolytics-dimension-repository_id | 9322649 |
| octolytics-dimension-repository_nwo | html5lib/html5lib-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 9322649 |
| octolytics-dimension-repository_network_root_nwo | html5lib/html5lib-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | e746f1a3ddb5c0a91290ff3d5889b5247e0e519e |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width