Title: Call design for Tier 2 (uops) interpreter · Issue #106581 · python/cpython · GitHub
Open Graph Title: Call design for Tier 2 (uops) interpreter · Issue #106581 · python/cpython
X Title: Call design for Tier 2 (uops) interpreter · Issue #106581 · python/cpython
Description: (Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC @markshannon, @brandtbucher) (This is a WIP until I have looked a bit deeper into t...
Open Graph Description: (Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC @markshannon, @brandtbucher) (This is a WIP...
X Description: (Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC @markshannon, @brandtbucher) (This is a...
Opengraph URL: https://github.com/python/cpython/issues/106581
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Call design for Tier 2 (uops) interpreter","articleBody":"(Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC @markshannon, @brandtbucher)\r\n\r\n(This is a WIP until I have looked a bit deeper into this.)\r\n\r\nFirst order of business is splitting some of the CALL specializations into multiple ops satisfying the uop requirement: either use oparg and no cache entries, or don't use oparg and use at most one cache entry. For example, one of the more important ones, CALL_PY_EXACT_ARGS, uses both `oparg` (the number of arguments) and a cache entry (`func_version`). Splitting it into a guard and an action op is problematic: even discounting the possibility of encountering a bound method (i.e., assuming `method` is `NULL`), it contains the following `DEOPT` calls:\r\n```\r\n // PyObject *callable = stack_pointer[-1-oparg];\r\n DEOPT_IF(tstate-\u003einterp-\u003eeval_frame, CALL);\r\n int argcount = oparg;\r\n PyFunctionObject *func = (PyFunctionObject *)callable;\r\n DEOPT_IF(!PyFunction_Check(callable), CALL);\r\n PyFunctionObject *func = (PyFunctionObject *)callable;\r\n DEOPT_IF(func-\u003efunc_version != func_version, CALL);\r\n PyCodeObject *code = (PyCodeObject *)func-\u003efunc_code;\r\n DEOPT_IF(code-\u003eco_argcount != argcount, CALL);\r\n DEOPT_IF(!_PyThreadState_HasStackSpace(tstate, code-\u003eco_framesize), CALL);\r\n```\r\nIf we wanted to combine all this in a single guard op, that guard would require access to both `oparg` (to dig out `callable`) and `func_version`. The fundamental problem is that the callable, which needs to be prodded and poked for the guard to pass, is buried under the arguments, and we need to use `oparg` to know how deep it is buried.\r\n\r\nWhat if we somehow reversed this so that the callable is _on top of the stack_, after the arguments? We could arrange for this by adding a `COPY n+1` opcode just before the `CALL` opcode (or its specializations). In fact, this could even be a blessing in disguise, since now we would no longer need to push a `NULL` before the callable to reserve space for `self` -- instead, if the callable is found to be a bound method, its `self` can overwrite the original callable (below the arguments) and the function extracted from the bound method can overwrite the copy of the callable _above_ the arguments. This has the advantage of no longer needing to have a \"push `NULL`\" bit in several other opcodes (the `LOAD_GLOBAL` and `LOAD_ATTR` families -- we'll have to review the logic in `LOAD_ATTR` a bit more to make sure this can work).\r\n\r\n(Note that the key reason why the callable is buried below the arguments is a requirement about evaluation order in expressions -- the language reference requires that in the expression `F(X)` where `F` and `X` themselves are possibly complex expressions, `F` is evaluated before `X`.)\r\n\r\nComparing before and after, currently we have the following arrangement on the stack when `CALL n` or any of its specializations is reached:\r\n```\r\n NULL\r\n callable\r\n arg[0]\r\n arg[1]\r\n ...\r\n arg[n-1]\r\n```\r\nThis is obtained by e.g.\r\n```\r\n PUSH_NULL\r\n LOAD_FAST callable\r\n \u003cload n args\u003e\r\n CALL n\r\n```\r\nor\r\n```\r\n LOAD_GLOBAL (NULL + callable)\r\n \u003cload n args\u003e\r\n CALL n\r\n```\r\nor\r\n```\r\n LOAD_ATTR (NULL|self + callable)\r\n \u003cload n args\u003e\r\n CALL n\r\n```\r\nUnder my proposal the arrangement would change to\r\n```\r\n callable\r\n arg[0]\r\n arg[1]\r\n ...\r\n arg[n-1]\r\n callable\r\n```\r\nand it would be obtained by\r\n```\r\n LOAD_FAST callable / LOAD_GLOBAL callable / LOAD_ATTR callable\r\n \u003cload n args\u003e\r\n COPY n+1\r\n CALL n\r\n```\r\nIt would (perhaps) even be permissible for the guard to overwrite both copies of the callable if a method is detected, since it would change from\r\n```\r\n self.func\r\n \u003cn args\u003e\r\n self.func\r\n```\r\nto\r\n```\r\n self\r\n \u003cn args\u003e\r\n func\r\n```\r\nwhere we would be assured that `func` has type `PyFunctionObject *`. (However, I think we ought to have separate specializations for the two cases, since the transformation would also require bumping `oparg`.)\r\n\r\nThe runtime cost would be an extra `COPY` instruction before each `CALL`; however I think this might actually be simpler than the dynamic check for bound methods, at least when using copy-and-patch.\r\n\r\nAnother cost would be requiring extra specializations for some cases that currently dynamically decide between function and method; but again I think that with copy-and-patch that is probably worth it, given that we expect that dynamic check to always go the same way for a specific location.\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-106707\n* gh-107760\n* gh-107793\n* gh-108067\n* gh-108380\n* gh-108462\n* gh-108493\n* gh-108895\n* gh-109338\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/gvanrossum","@type":"Person","name":"gvanrossum"},"datePublished":"2023-07-10T04:34:23.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":27},"url":"https://github.com/106581/cpython/issues/106581"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:65c9b6c8-320a-10f1-7238-5a67ad973707 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 8144:8D11F:20F43B3:2D31F6C:696ABEA3 |
| html-safe-nonce | a84acd47fdcf65ef2d07483e982ef731c80218970168aacfef2485fa84713b29 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4MTQ0OjhEMTFGOjIwRjQzQjM6MkQzMUY2Qzo2OTZBQkVBMyIsInZpc2l0b3JfaWQiOiI4MzM2MzQ0NjAyMDMyOTE0MDgzIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | 2058027ca1cadc0dc4f2a05f98f7bdacdbbdcb757777d746aca58cd1fc7fd45e |
| hovercard-subject-tag | issue:1795924058 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/106581/issue_layout |
| twitter:image | https://opengraph.githubassets.com/d8417f5a997d7ef3ce06f1e6a060763e6d2a9556dbcee32c7d937929236cf25c/python/cpython/issues/106581 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/d8417f5a997d7ef3ce06f1e6a060763e6d2a9556dbcee32c7d937929236cf25c/python/cpython/issues/106581 |
| og:image:alt | (Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC @markshannon, @brandtbucher) (This is a WIP... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | gvanrossum |
| hostname | github.com |
| expected-hostname | github.com |
| None | 46ce962e0e18113ea447391b6ace8b02d4d2861e57b4fbab3658698f73d8855b |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 30300f30bb3949de255e84a146706a3bdb5c19c9 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width