Title: Constant hash value for None to aid reproducibility · Issue #99540 · python/cpython · GitHub
Open Graph Title: Constant hash value for None to aid reproducibility · Issue #99540 · python/cpython
X Title: Constant hash value for None to aid reproducibility · Issue #99540 · python/cpython
Description: Feature or enhancement Fix hash(None) to a constant value. Pitch (Updated 2022.11.18) Under current behavior, the runtime leaks the ASLR offset, since the original address of the None singleton is fixed and _Py_HashPointerRaw is reversib...
Open Graph Description: Feature or enhancement Fix hash(None) to a constant value. Pitch (Updated 2022.11.18) Under current behavior, the runtime leaks the ASLR offset, since the original address of the None singleton is ...
X Description: Feature or enhancement Fix hash(None) to a constant value. Pitch (Updated 2022.11.18) Under current behavior, the runtime leaks the ASLR offset, since the original address of the None singleton is ...
Opengraph URL: https://github.com/python/cpython/issues/99540
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Constant hash value for None to aid reproducibility","articleBody":"# Feature or enhancement\r\n\r\nFix `hash(None)` to a constant value.\r\n\r\n# Pitch\r\n\r\n(Updated 2022.11.18)\r\n- Under current behavior, the runtime leaks the ASLR offset, since the original address of the `None` singleton is fixed and `_Py_HashPointerRaw` is reversible. Admittedly, there are other similar objects, like `NotImplemented` or `Ellipsis` that also have this problem, and need to be similarly fixed.\r\n\r\n- Because of ASLR, `hash(None)` changes every run; that consequently means the hash of many useful \"key\" types changes every run, particularly tuples, NamedTuples and frozen dataclasses that have `Optional` fields. \r\n\r\n- The other source of hash value instability across runs in common \"key\" types like str or Enum, can be fixed using the `PYTHONHASHSEED` environment var.\r\n\r\n- other singletons commonly used as (or as part of) mapping keys, `True` and `False` already have fixed hash values.\r\n\r\nCPython's builtin set classes, as do all other non-concurrent hash-tables, either open or closed, AFAIK, grant the user a certain stability property. Given a specific sequence of initialization and subsequent mutation (if any), and given specific inputs with certain hash values, if one were to \"replay\" it, the result set will be in the same observable state every time: not only have the same items (correctness), but also they would be retrieved from the set in the same order when iterated. \r\n\r\nThis property means that code that starts out with identical data, performs computations and makes decisions based on the results will behave identically between runs. For example, if based on some mathematical properties of the input, we have computed a set of N valid choices, they are given integer scores, then we pick the first choice that has maximal score. If the set guarantees the property described above, we are also guaranteed that the exact same choice will be made every time this code runs, even in case of ties. This is very helpful for reproducibility, especially in complex algorithmic code that makes a lot of combinatorial decisions of that kind.\r\n\r\n\r\nThere is a counterargument that we should simply just offer `StableSet` and `StableFrozenSet` that guarantee a specific order, the same way that `dict` does.\r\nA few things to note about that:\r\n- I've written such set classes as an adapter over `dict[T, None]`, there is a substantial perf overhead to that\r\n- Is it worth the extra \"weight\" in code inside the core? That's suspect - why hasn't it been added all those years?\r\n- In a large codebase, it requires automated code inspection and editing tools to enforce this. It's all too easy, and natural, to add a seemingly harmless set comprehension somewhere and defeat the whole effort\r\n- The insertion-order-as-iteration-order guarantee is stronger than what we actually require, in order to have the \"reproducability\" property I've described, so we're paying extra for something we don't really need.\r\n\r\n\r\nMy PR makes a small change to CPython, in `objects.c`, that sets the `tp_hash` descriptor of `NoneType` to a function that simply returns a constant value.\r\n\r\nAdmittedly, determinism between runs isn't a concern that most users/programs care about. It is rather niche. However, I argue that still, there is no externalized cost to this change.\r\n\r\n# Previous discussion\r\nhttps://discuss.python.org/t/constant-hash-for-none/21110\r\n\r\n\r\n\u003c!-- gh-linked-prs --\u003e\r\n### Linked PRs\r\n* gh-99541\r\n\u003c!-- /gh-linked-prs --\u003e\r\n","author":{"url":"https://github.com/yonillasky","@type":"Person","name":"yonillasky"},"datePublished":"2022-11-16T18:55:45.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":18},"url":"https://github.com/99540/cpython/issues/99540"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:5d325bc6-6e17-de92-2711-d9b9e42d1305 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A53C:345462:C3ED81:109CB94:6969373F |
| html-safe-nonce | 93b66b2fb85b406d6c6227b98154e139a25fbb29e7cc60a37bfa59edffb69e8d |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBNTNDOjM0NTQ2MjpDM0VEODE6MTA5Q0I5NDo2OTY5MzczRiIsInZpc2l0b3JfaWQiOiI1NTYxNjIzODE4NDU1ODI0MTkxIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | a8fbedb652deddff7458b1799346c61aa527959bbd8d6a5c7015ae30b6377393 |
| hovercard-subject-tag | issue:1452114471 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/99540/issue_layout |
| twitter:image | https://opengraph.githubassets.com/35d17404d419a1624dcd1d9b9a1ab94d1bb67164dd486306692cfa8f30cbcba3/python/cpython/issues/99540 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/35d17404d419a1624dcd1d9b9a1ab94d1bb67164dd486306692cfa8f30cbcba3/python/cpython/issues/99540 |
| og:image:alt | Feature or enhancement Fix hash(None) to a constant value. Pitch (Updated 2022.11.18) Under current behavior, the runtime leaks the ASLR offset, since the original address of the None singleton is ... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | yonillasky |
| hostname | github.com |
| expected-hostname | github.com |
| None | 54182691a21263b584d2e600b758e081b0ff1d10ffc0d2eefa51cf754b43b51d |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | d69ac0477df0f87da03b8b06cebd187012d7a930 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width