Title: Assert and incorrect error message when loading source file containing invalid UTF-8 · Issue #96268 · python/cpython · GitHub
Open Graph Title: Assert and incorrect error message when loading source file containing invalid UTF-8 · Issue #96268 · python/cpython
X Title: Assert and incorrect error message when loading source file containing invalid UTF-8 · Issue #96268 · python/cpython
Description: Bug report When loading a file containing invalid UTF-8, without a source encoding marker, an assert is emitted. If assert is removed, the error message is close, but not quite correct and has incorrect position information. To reproduce...
Open Graph Description: Bug report When loading a file containing invalid UTF-8, without a source encoding marker, an assert is emitted. If assert is removed, the error message is close, but not quite correct and has inco...
X Description: Bug report When loading a file containing invalid UTF-8, without a source encoding marker, an assert is emitted. If assert is removed, the error message is close, but not quite correct and has inco...
Opengraph URL: https://github.com/python/cpython/issues/96268
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Assert and incorrect error message when loading source file containing invalid UTF-8","articleBody":"# Bug report\r\n\r\nWhen loading a file containing invalid UTF-8, without a source encoding marker, an assert is emitted. If assert is removed, the error message is close, but not quite correct and has incorrect position information.\r\n\r\nTo reproduce, create a file with an invalid UTF-8 start byte followed by a valid continuation byte, for example, the sort of contrived and grammatically incorrect:\r\n\r\n```\r\nopen(\"x.py\", \"w\", encoding=\"latin-1\").write(\"\\n\\n\\n'HOLÀ¡'\\n\")\r\n```\r\n\r\nresults in:\r\n\r\n```\r\n$ ./python x.py \r\npython: Parser/pegen_errors.c:335: _PyPegen_raise_error_known_location: Assertion `p-\u003etok-\u003efp == NULL || p-\u003etok-\u003efp == stdin || p-\u003etok-\u003edone == E_EOF' failed.\r\n[1] 16448 IOT instruction ./python x.py\r\n```\r\n\r\n\u003cdetails\u003e\u003csummary\u003eBacktrace\u003c/summary\u003e\r\n\r\n```\r\n#0 __pthread_kill_implementation (threadid=\u003coptimized out\u003e, signo=signo@entry=6, no_tid=no_tid@entry=0)\r\n at ./nptl/pthread_kill.c:44\r\n#1 0x00007ffff7d6389f in __pthread_kill_internal (signo=6, threadid=\u003coptimized out\u003e)\r\n at ./nptl/pthread_kill.c:78\r\n#2 0x00007ffff7d17a52 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26\r\n#3 0x00007ffff7d02469 in __GI_abort () at ./stdlib/abort.c:79\r\n#4 0x00007ffff7d02395 in __assert_fail_base (fmt=0x7ffff7e8fc30 \"%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n\",\r\n assertion=0x5555558a5a10 \"p-\u003etok-\u003efp == NULL || p-\u003etok-\u003efp == stdin || p-\u003etok-\u003edone == E_EOF\",\r\n file=0x5555558a5898 \"Parser/pegen_errors.c\", line=335, function=\u003coptimized out\u003e) at ./assert/assert.c:92\r\n#5 0x00007ffff7d10b02 in __GI___assert_fail (\r\n assertion=assertion@entry=0x5555558a5a10 \"p-\u003etok-\u003efp == NULL || p-\u003etok-\u003efp == stdin || p-\u003etok-\u003edone == E_EOF\", file=file@entry=0x5555558a5898 \"Parser/pegen_errors.c\", line=line@entry=335,\r\n function=function@entry=0x5555558a5c40 \u003c__PRETTY_FUNCTION__.1\u003e \"_PyPegen_raise_error_known_location\")\r\n at ./assert/assert.c:101\r\n#6 0x000055555564c7be in _PyPegen_raise_error_known_location (p=0x7ffff7a8c880,\r\n errtype=0x555555a1cb20 \u003c_PyExc_SyntaxError\u003e, lineno=4, col_offset=8, end_lineno=4, end_col_offset=8,\r\n errmsg=0x5555558a5946 \"(%s) %U\", va=0x7fffffffd058) at Parser/pegen_errors.c:335\r\n#7 0x000055555564cc08 in _PyPegen_raise_error (p=p@entry=0x7ffff7a8c880, errtype=\u003coptimized out\u003e,\r\n errmsg=errmsg@entry=0x5555558a5946 \"(%s) %U\") at Parser/pegen_errors.c:235\r\n#8 0x000055555564cedb in _Pypegen_raise_decode_error (p=p@entry=0x7ffff7a8c880) at Parser/pegen_errors.c:133\r\n#9 0x000055555564eb1f in _PyPegen_concatenate_strings (p=p@entry=0x7ffff7a8c880,\r\n strings=strings@entry=0x555555c13290) at Parser/action_helpers.c:957\r\n#10 0x00005555556598cf in strings_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:15481\r\n#11 0x000055555566362f in atom_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:14292\r\n#12 0x000055555567c149 in t_primary_raw (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:18126\r\n#13 0x000055555567c4ac in t_primary_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:17916\r\n#14 0x0000555555680978 in single_subscript_attribute_target_rule (p=p@entry=0x7ffff7a8c880)\r\n at Parser/parser.c:17805\r\n#15 0x00005555556813e4 in _tmp_12_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:24220\r\n#16 0x00005555556a5bb0 in assignment_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:2245\r\n#17 0x00005555556a9f37 in simple_stmt_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:1652\r\n#18 0x00005555556ab747 in simple_stmts_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:1547\r\n#19 0x00005555556ac6f6 in statement_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:1370\r\n#20 0x00005555556ac969 in _loop1_3_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:23651\r\n#21 0x00005555556acb51 in statements_rule (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:1302\r\n#22 0x00005555556b4ea0 in file_rule (p=0x7ffff7a8c880) at Parser/parser.c:1061\r\n#23 0x00005555556b6953 in _PyPegen_parse (p=p@entry=0x7ffff7a8c880) at Parser/parser.c:38772\r\n#24 0x000055555564bfc6 in _PyPegen_run_parser (p=p@entry=0x7ffff7a8c880) at Parser/pegen.c:811\r\n#25 0x000055555564c1b2 in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x555555be4310,\r\n start_rule=start_rule@entry=257, filename_ob=filename_ob@entry=0x7ffff7aa9150, enc=enc@entry=0x0,\r\n ps1=ps1@entry=0x0, ps2=ps2@entry=0x0, flags=0x7fffffffd8a8, errcode=0x0, arena=0x7ffff7a939a0)\r\n at Parser/pegen.c:884\r\n#26 0x00005555556b8c2f in _PyParser_ASTFromFile (fp=fp@entry=0x555555be4310,\r\n filename_ob=filename_ob@entry=0x7ffff7aa9150, enc=enc@entry=0x0, mode=mode@entry=257, ps1=ps1@entry=0x0,\r\n ps2=ps2@entry=0x0, flags=0x7fffffffd8a8, errcode=0x0, arena=0x7ffff7a939a0) at Parser/peg_api.c:26\r\n#27 0x0000555555813ca1 in pyrun_file (fp=fp@entry=0x555555be4310, filename=filename@entry=0x7ffff7aa9150,\r\n start=start@entry=257, globals=globals@entry=0x7ffff7a45010, locals=locals@entry=0x7ffff7a45010,\r\n closeit=closeit@entry=1, flags=0x7fffffffd8a8) at Python/pythonrun.c:1620\r\n#28 0x0000555555816941 in _PyRun_SimpleFileObject (fp=fp@entry=0x555555be4310,\r\n filename=filename@entry=0x7ffff7aa9150, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd8a8)\r\n at Python/pythonrun.c:439\r\n#29 0x0000555555816af5 in _PyRun_AnyFileObject (fp=fp@entry=0x555555be4310,\r\n filename=filename@entry=0x7ffff7aa9150, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd8a8)\r\n at Python/pythonrun.c:78\r\n#30 0x00005555558332bf in pymain_run_file_obj (program_name=program_name@entry=0x7ffff7aa91c0,\r\n filename=filename@entry=0x7ffff7aa9150, skip_source_first_line=0) at Modules/main.c:360\r\n#31 0x00005555558333c1 in pymain_run_file (config=config@entry=0x555555b29490 \u003c_PyRuntime+69264\u003e)\r\n at Modules/main.c:379\r\n#32 0x0000555555833a99 in pymain_run_python (exitcode=exitcode@entry=0x7fffffffd9fc) at Modules/main.c:610\r\n#33 0x0000555555833cea in Py_RunMain () at Modules/main.c:689\r\n#34 0x0000555555833d3f in pymain_main (args=args@entry=0x7fffffffda40) at Modules/main.c:719\r\n#35 0x0000555555833dc4 in Py_BytesMain (argc=\u003coptimized out\u003e, argv=\u003coptimized out\u003e) at Modules/main.c:743\r\n#36 0x000055555564a742 in main (argc=\u003coptimized out\u003e, argv=\u003coptimized out\u003e) at ./Programs/python.c:15\r\n\r\n```\r\n\u003c/details\u003e\r\n\r\nIn the above assert, `p-\u003etok-\u003edone` is `E_OK`.\r\n\r\nBypassing the assert, you get an incorrect position (it should be 7, not 3):\r\n\r\n```\r\n File \"x.py\", line 4\r\n 'HOL��'\r\n ^\r\nSyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc0 in position 3: invalid start byte\r\n```\r\n\r\n# Your environment\r\n\r\nLinux, Python 3.11\r\n\r\nThis bug was found because the coverage-related test added in #94856 failed on some buildbots. For them to fail, they need to both (1) build with `--with-pydebug` and (2) have coredumps enabled in the OS. That test should probably be fixed so it fails consistently across all buildbots, even the ones that aren't set up that way.","author":{"url":"https://github.com/mdboom","@type":"Person","name":"mdboom"},"datePublished":"2022-08-25T13:55:51.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/96268/cpython/issues/96268"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:07f7648d-2014-14b9-5c03-65c298c72a0a |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9E6A:23762:A7E458:DFD9FB:696991E9 |
| html-safe-nonce | 90854b04a4bf63dd6d3b07b9f30ed81b72c1577e80c2617b768ddddfdfebd452 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5RTZBOjIzNzYyOkE3RTQ1ODpERkQ5RkI6Njk2OTkxRTkiLCJ2aXNpdG9yX2lkIjoiNTM2NjU3MzcxOTg2MzUyOTk2MSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 58e476fa1ce52c58f7f55fa75e133fdfff923936b18196d4e3f085c3631e4b03 |
| hovercard-subject-tag | issue:1350944341 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/96268/issue_layout |
| twitter:image | https://opengraph.githubassets.com/5dd73080346fb9cfc19a6eb78620b7f838f2f22b78dba289cbf193c4a2136f8e/python/cpython/issues/96268 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/5dd73080346fb9cfc19a6eb78620b7f838f2f22b78dba289cbf193c4a2136f8e/python/cpython/issues/96268 |
| og:image:alt | Bug report When loading a file containing invalid UTF-8, without a source encoding marker, an assert is emitted. If assert is removed, the error message is close, but not quite correct and has inco... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | mdboom |
| hostname | github.com |
| expected-hostname | github.com |
| None | 3542e147982176a7ebaa23dfb559c8af16f721c03ec560c68c56b64a0f35e751 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | af80af7cc9e3de9c336f18b208a600950a3c187c |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width