Title: Refresh big5hkscs mapping to HKSCS-2016 · Issue #93271 · python/cpython · GitHub
Open Graph Title: Refresh big5hkscs mapping to HKSCS-2016 · Issue #93271 · python/cpython
X Title: Refresh big5hkscs mapping to HKSCS-2016 · Issue #93271 · python/cpython
Description: While working on #84508 I noticed that the mapping for big5hkscs codec has not been updated in a while. The current version in CPython reflects the Big-5 mappings for HKSCS-2004. Since then, there have been some updates: HKSCS-2008 adds ...
Open Graph Description: While working on #84508 I noticed that the mapping for big5hkscs codec has not been updated in a while. The current version in CPython reflects the Big-5 mappings for HKSCS-2004. Since then, there ...
X Description: While working on #84508 I noticed that the mapping for big5hkscs codec has not been updated in a while. The current version in CPython reflects the Big-5 mappings for HKSCS-2004. Since then, there ...
Opengraph URL: https://github.com/python/cpython/issues/93271
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Refresh big5hkscs mapping to HKSCS-2016","articleBody":"While working on #84508 I noticed that the mapping for `big5hkscs` codec has not been updated in a while. The current version in CPython reflects the Big-5 mappings for *HKSCS-2004*.\r\n\r\nSince then, there have been [some updates](https://www.ccli.gov.hk/en/hkscs/what_is_hkscs.html):\r\n\r\n- *HKSCS-2008* adds 68 code points to the Big-5 encoding scheme\r\n- *HKSCS-2016* adds no code points to Big-5 (it's Unicode-only), but since new characters have been added to Unicode, the mapping can change\r\n- after 2016, at least one mapped code point has been changed in an amendment\r\n\r\nI can update the script and generate the mapping using the latest data available on the [CCLI](https://www.ccli.gov.hk/en/download/) website, since I was already looking into this.\r\n\r\nIf we care about refreshing `big5hkscs` at all, there are a couple questions about compatibility. In case mapping a Big-5 code X used to map to Unicode code point A (in HKSCS-2004), and is changed to map to B (in later versions):\r\n\r\n1) should we: decode X to A, or to B?\r\n\r\n2) should we: encode B to X, A to X, or both?\r\n\r\n---\r\n\r\nE.g. right now the Big-5 sequence 9D73 round-trips:\r\n\r\n```python\r\n\u003e\u003e\u003e x = bytes.fromhex('9D73')\r\n\u003e\u003e\u003e x.decode('big5hkscs') == '\\u4ca4'\r\nTrue\r\n\u003e\u003e\u003e '\\u4ca4'.encode('big5hkscs') == x\r\nTrue\r\n```\r\n\r\nIf we followed the new HKSCS-2016 mapping with no compatibility provisions, this round-trip would instead go through the newly mapped character `\\u9fd0`. This might be fine for some users, but it might break compatibility for others. So the questions are about what kind of compatibility we want to guarantee.\r\n\r\n---\r\n\r\nRelated question which should not block this issue. For the web platform, WHATWG defines a [Big5 encoding](https://encoding.spec.whatwg.org/#legacy-multi-byte-chinese-(traditional)-encodings) which includes HKSCS extensions, and already overlaps 99% with `big5hkscs`, but is incompatible in some cases. Since one of the users of the CPython CJK codecs is html5lib, this means that html5lib does not comply with the web platform specifications. Should CPython be concerned with this, since it already provides the codec and the mapping tables, and it could provide a web-compatible codec with just a few fixups? Or does this belong in third-party libraries?","author":{"url":"https://github.com/sorcio","@type":"Person","name":"sorcio"},"datePublished":"2022-05-26T20:13:44.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":3},"url":"https://github.com/93271/cpython/issues/93271"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:b652ade6-a9a2-1277-b687-a01243cc7859 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | ABAE:345820:10CC2D0:1720823:6969FB93 |
| html-safe-nonce | b433b5379b670d2566b0f4e5e081c73c7ebaaa1cea22e109b6746079d2e24b69 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBQkFFOjM0NTgyMDoxMENDMkQwOjE3MjA4MjM6Njk2OUZCOTMiLCJ2aXNpdG9yX2lkIjoiMTkwMzEzOTU2NzI3NDk0OTUyMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 526f075dd5427ff3186ab48b66c95da4b93097d678a9fb338ddc02c4a8bcc21f |
| hovercard-subject-tag | issue:1250003822 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/93271/issue_layout |
| twitter:image | https://opengraph.githubassets.com/78c9f0bf185840aa2ef547900387f9fadff154b4898884acb3c4b54d53d631fb/python/cpython/issues/93271 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/78c9f0bf185840aa2ef547900387f9fadff154b4898884acb3c4b54d53d631fb/python/cpython/issues/93271 |
| og:image:alt | While working on #84508 I noticed that the mapping for big5hkscs codec has not been updated in a while. The current version in CPython reflects the Big-5 mappings for HKSCS-2004. Since then, there ... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | sorcio |
| hostname | github.com |
| expected-hostname | github.com |
| None | 7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | bdde15ad1b403e23b08bbd89b53fbe6bdf688cad |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width