Title: Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules · Issue #101758 · python/cpython · GitHub
Open Graph Title: Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules · Issue #101758 · python/cpython
X Title: Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules · Issue #101758 · python/cpython
Description: Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each PyModul...
Open Graph Description: Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a var...
X Description: Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a...
Opengraph URL: https://github.com/python/cpython/issues/101758
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules","articleBody":"Typically each `PyModuleDef` for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each `PyModuleDef` is worth doing, especially if you consider we've already run into problems[^1] because of `m_copy`.\r\n\r\nThe main focus here is on `PyModuleDef.m_base.m_copy`[^2] specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization[^3] (likely the vast majority).\r\n\r\n\u003cdetails\u003e\r\n\u003csummary\u003e(expand for more context)\u003c/summary\u003e\r\n\r\n----\r\n`PyModuleDef` for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception is `m_copy`, which is problematic in some cases for modules imported in multiple interpreters.\r\n\r\nNote that `m_copy` is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does *not* support repeated initialization[^3]. It is never relevant for multi-phase init (PEP 489) modules.\r\n\r\n* initialization\r\n * `m_copy` is only set by `_PyImport_FixupExtensionObject()` (and thus indirectly `_PyImport_FixupBuiltin()` and `_imp.create_builtin()`)\r\n * `_PyImport_FixupExtensionObject() is called by `_PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loaded\r\n* usage\r\n * `m_copy` is only used in `import_find_extension()`, which is only called by `_imp.create_builtin()` and `_imp.create_dynamic()` (via the respective importers)\r\n\r\nWhen such a legacy module is imported for the first time, `m_copy` is set to a new copy of the just-imported module's `__dict__`, which is \"owned\" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted from `sys.modules` and then imported), a new empty module is created and `m_copy` is [shallow] copied into that object's `__dict__`.\r\n\r\nWhen `m_copy` is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case the `m_copy` from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for [quite a while](https://peps.python.org/pep-3121/).\r\n\r\nThe tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old `m_copy`, but that object is \"owned\" by the first interpreter. This is a problem[^1].\r\n\r\nFurthermore, even if the decref-in-the-wrong-interpreter problem was gone. When `m_copy` is copied into the new module's `__dict__` on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its `__dict__` filled with objects not owned by the correct interpreter.\r\n\r\n----\r\n\u003c/details\u003e\r\n\r\nHere are some possible approaches to isolating each module's `PyModuleDef` to the interpreter that imports it:\r\n\r\n1. keep a copy of `PyModuleDef` for each interpreter (would `_PyRuntimeState.imports.extensions` need to move to the interpreter?)\r\n2. keep just `m_copy` for/on each interpreter\r\n3. fix `_PyImport_FixupExtensionObject()` some other way...\r\n\r\n\r\n[^1]: see https://github.com/python/cpython/pull/101660#issuecomment-1424507393\r\n[^2]: We should probably consider isolating `PyModuleDef.m_base.m_index`, but for now we simply sync the `modules_by_index` list of each interpreter. (Also, `modules_by_index` and `m_index` are only used for single-phase init modules.)\r\n[^3]: specifically `def-\u003em_size == -1`; multi-phase init modules always have `def-\u003em_size \u003e= 0`; single-phase init modules can also have a non-negative `m_size`\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-101891\n* gh-101919\n* gh-101920\n* gh-101943\n* gh-101956\n* gh-101969\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/ericsnowcurrently","@type":"Person","name":"ericsnowcurrently"},"datePublished":"2023-02-09T20:54:48.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":10},"url":"https://github.com/101758/cpython/issues/101758"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:556f99f0-70f4-0542-1554-0e61bfb20ece |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D32C:11141E:F8A143:14BAB9F:696991EF |
| html-safe-nonce | 6783d0e47730e41e2190da65250d901f61fd5005dd9babcb55d88788af6d9a84 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEMzJDOjExMTQxRTpGOEExNDM6MTRCQUI5Rjo2OTY5OTFFRiIsInZpc2l0b3JfaWQiOiI1ODcyMTU1ODcyNDIxNTgxMjk2IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | a9d680af1aadf791c8f0b75c0959303f5c2d9e994168d173a74653716190a0a7 |
| hovercard-subject-tag | issue:1578607034 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/101758/issue_layout |
| twitter:image | https://opengraph.githubassets.com/c8abda5f70a9462cedf39582768016ab53d50076ff3112ad881ee286c9a60436/python/cpython/issues/101758 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/c8abda5f70a9462cedf39582768016ab53d50076ff3112ad881ee286c9a60436/python/cpython/issues/101758 |
| og:image:alt | Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a var... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ericsnowcurrently |
| hostname | github.com |
| expected-hostname | github.com |
| None | 3542e147982176a7ebaa23dfb559c8af16f721c03ec560c68c56b64a0f35e751 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | af80af7cc9e3de9c336f18b208a600950a3c187c |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width