Title: euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError · Issue #101863 · python/cpython · GitHub
Open Graph Title: euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError · Issue #101863 · python/cpython
X Title: euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError · Issue #101863 · python/cpython
Description: Bug report char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001') def main(): code_point = 0x3164 c = chr(code_point) raw = c.encode('ksx1001') c2 = raw.decode('ksx1001') # <--- this cause error print(f'{c} {c2}') if __nam...
Open Graph Description: Bug report char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001') def main(): code_point = 0x3164 c = chr(code_point) raw = c.encode('ksx1001') c2 = raw.decode('ksx1001') # <--- this...
X Description: Bug report char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001') def main(): code_point = 0x3164 c = chr(code_point) raw = c.encode('ksx1001') c2 = r...
Opengraph URL: https://github.com/python/cpython/issues/101863
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError ","articleBody":"\u003c!--\r\n If you're new to Python and you're not sure whether what you're experiencing is a bug, the CPython issue tracker is not\r\n the right place to seek help. Consider the following options instead:\r\n\r\n - reading the Python tutorial: https://docs.python.org/3/tutorial/\r\n - posting in the \"Users\" category on discuss.python.org: https://discuss.python.org/c/users/7\r\n - emailing the Python-list mailing list: https://mail.python.org/mailman/listinfo/python-list\r\n - searching our issue tracker (https://github.com/python/cpython/issues) to see if\r\n your problem has already been reported\r\n--\u003e\r\n\r\n# Bug report\r\n\r\nchar '0x3164' can be `encode('ksx1001')`, but can not `decode('ksx1001')`\r\n\r\n```python\r\n\r\ndef main():\r\n code_point = 0x3164\r\n c = chr(code_point)\r\n raw = c.encode('ksx1001')\r\n c2 = raw.decode('ksx1001') # \u003c--- this cause error \r\n print(f'{c} {c2}')\r\n\r\nif __name__ == '__main__':\r\n main()\r\n```\r\n\r\n```\r\nTraceback (most recent call last):\r\n File \"/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py\", line 11, in \u003cmodule\u003e\r\n main()\r\n File \"/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py\", line 6, in main\r\n c2 = raw.decode('ksx1001')\r\n ^^^^^^^^^^^^^^^^^^^^^\r\nUnicodeDecodeError: 'euc_kr' codec can't decode bytes in position 0-1: incomplete multibyte sequence\r\n\r\n```\r\n\r\nThe char is `Hangul Compatibility Jamo -\u003e Hangul Filler`\r\n\r\nhttps://unicode-table.com/en/3164/\r\n\r\n\u003cimg width=\"676\" alt=\"image\" src=\"https://user-images.githubusercontent.com/6064962/218377969-f6a6dc6e-3464-448a-ae2d-89dea00a8efc.png\"\u003e\r\n\r\nThe following code is get the zone in ks-x-1001:\r\n\r\n```python\r\ndef main():\r\n code_point = 0x3164\r\n c = chr(code_point)\r\n raw = c.encode('ksx1001')\r\n block_offset = 0xA0\r\n zone_1 = raw[0] - block_offset\r\n zone_2 = raw[1] - block_offset\r\n print(f'{zone_1} {zone_2}')\r\n\r\n\r\nif __name__ == '__main__':\r\n main()\r\n\r\n```\r\n\r\n```\r\nzone_1 = 4 \r\nzone_2 = 52\r\n```\r\n\r\nhttps://en.wikipedia.org/wiki/KS_X_1001#Hangul_Filler\r\n\u003cimg width=\"737\" alt=\"image\" src=\"https://user-images.githubusercontent.com/6064962/218380463-1cee6580-9dcf-4804-91f1-fb5ed59ee5e1.png\"\u003e\r\n\r\n\r\n\r\n\r\nother chars in ksx1001 encode an decode is ok, but only this.\r\n\r\n\r\n# Your environment\r\n\r\n\u003c!-- Include as many relevant details as possible about the environment you experienced the bug in --\u003e\r\n\r\n- CPython versions tested on: Python 3.11.1\r\n- Operating system and architecture: macOS 13.0\r\n\r\n\u003c!--\r\nYou can freely edit this text. Remove any lines you believe are unnecessary.\r\n--\u003e\r\n\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-102417\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/TakWolf","@type":"Person","name":"TakWolf"},"datePublished":"2023-02-13T05:32:03.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":14},"url":"https://github.com/101863/cpython/issues/101863"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:da79794c-a00f-9c3c-a347-3f5e286b4299 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A30E:FB3A6:522E90:6FF9BE:6969D9DB |
| html-safe-nonce | 301360ff03cb3bac4ca2f449516005e1a2563a865e397b38473e62e3caedd102 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBMzBFOkZCM0E2OjUyMkU5MDo2RkY5QkU6Njk2OUQ5REIiLCJ2aXNpdG9yX2lkIjoiMjE2NjkxNTMxODc2NjIyOTk3OSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | a235f92a8293ff7ad6b595bbe1a1061ac0bff84dd8c1e1b18ea8c06948cc4ed1 |
| hovercard-subject-tag | issue:1581692098 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/101863/issue_layout |
| twitter:image | https://opengraph.githubassets.com/01331b3cb08d08fcc52c6bb53bb1d061eabab34ce66a93b60cc3a225885c9b5c/python/cpython/issues/101863 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/01331b3cb08d08fcc52c6bb53bb1d061eabab34ce66a93b60cc3a225885c9b5c/python/cpython/issues/101863 |
| og:image:alt | Bug report char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001') def main(): code_point = 0x3164 c = chr(code_point) raw = c.encode('ksx1001') c2 = raw.decode('ksx1001') # <--- this... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | TakWolf |
| hostname | github.com |
| expected-hostname | github.com |
| None | 7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | bdde15ad1b403e23b08bbd89b53fbe6bdf688cad |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width