Title: Dataclasses - Improve the performance of asdict/astuple for common types and default values · Issue #103000 · python/cpython · GitHub
Open Graph Title: Dataclasses - Improve the performance of asdict/astuple for common types and default values · Issue #103000 · python/cpython
X Title: Dataclasses - Improve the performance of asdict/astuple for common types and default values · Issue #103000 · python/cpython
Description: Feature or enhancement Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default dict_factory=dict to construct ...
Open Graph Description: Feature or enhancement Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the...
X Description: Feature or enhancement Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the...
Opengraph URL: https://github.com/python/cpython/issues/103000
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Dataclasses - Improve the performance of asdict/astuple for common types and default values","articleBody":"# Feature or enhancement\r\n\r\nImprove the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default `dict_factory=dict` to construct the dictionary directly.\r\n\r\nThe goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way.\r\n\r\n# Pitch\r\n\r\nIn cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops for `asdict` and `astuple` require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them to `deepcopy`. This proposes to special case and shortcut objects of types where `deepcopy` returns the object unchanged.\r\n\r\nIt is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used to prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlib `json` module.\r\n\r\nNote: Anything that skips deepcopy with this alteration is already unchanged as`deepcopy(obj) is obj` is always True for these types.\r\n\r\nCurrently when constructing the `dict` for a dataclass, a list of tuples is created and passed to the `dict_factory` constructor. In the case where the `dict_factory` constructor is the default - `dict` - it is faster to construct the dictionary directly.\r\n\r\n# Previous discussion\r\n\r\nDiscussed here with a few more details and earlier examples: https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662\r\n\r\n# Code Details\r\n## Types to skip deepcopy\r\n\r\nThis is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense for `dataclasses` than the original ordering copied from the `copy` module. These are known to be safe to skip as they are all sent to `_deepcopy_atomic` (which returns the original object) in the `copy` module. \r\n\r\n```python\r\n# Types for which deepcopy(obj) is known to return obj unmodified\r\n# Used to skip deepcopy in asdict and astuple for performance\r\n_ATOMIC_TYPES = {\r\n # Common JSON Serializable types\r\n types.NoneType,\r\n bool,\r\n int,\r\n float,\r\n complex,\r\n bytes,\r\n str,\r\n # Other types that are also unaffected by deepcopy\r\n types.EllipsisType,\r\n types.NotImplementedType,\r\n types.CodeType,\r\n types.BuiltinFunctionType,\r\n types.FunctionType,\r\n type,\r\n range,\r\n property,\r\n # weakref.ref, # weakref is not currently imported by dataclasses directly\r\n}\r\n```\r\n\r\n## Function changes\r\n\r\nWith that added the change is essentially replacing each instance of\r\n\r\n```python\r\n_asdict_inner(v, dict_factory)\r\n```\r\n\r\ninside `_asdict_inner`, with\r\n\r\n```python\r\nv if type(v) in _ATOMIC_TYPES else _asdict_inner(v, dict_factory)\r\n```\r\n\r\nInstances of subclasses of these types are not guaranteed to have `deepcopy(obj) is obj` so this checks specifically for instances of the base types.\r\n\r\n# Performance tests\r\n\r\nTest file: https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d\r\n\r\nResults on my development machine (not a perfectly stable test machine, but these differences are large enough).\r\n\r\n## Main\r\n\r\nCurrent Main python branch:\r\n```\r\nDataclasses asdict/astuple speed tests\r\n--------------------------------------\r\nPython v3.12.0alpha6\r\nGIT branch: main\r\nTest Iterations: 10000\r\nList of Int case asdict: 5.80s\r\n\r\nTest Iterations: 1000\r\nList of Decimal case asdict: 0.65s\r\n\r\nTest Iterations: 1000000\r\nBasic types case asdict: 3.76s\r\nBasic types astuple: 3.48s\r\n\r\nTest Iterations: 100000\r\nOpaque types asdict: 2.15s\r\nOpaque types astuple: 2.11s\r\n\r\nTest Iterations: 100\r\nMixed containers asdict: 3.66s\r\nMixed containers astuple: 3.28s\r\n```\r\n\r\n## Modified\r\n\r\n[Modified Branch](https://github.com/DavidCEllis/cpython/blob/faster_dataclasses_serialize/Lib/dataclasses.py):\r\n\r\n```\r\nDataclasses asdict/astuple speed tests\r\n--------------------------------------\r\nPython v3.12.0alpha6\r\nGIT branch: faster_dataclasses_serialize\r\nTest Iterations: 10000\r\nList of Int case asdict: 0.53s\r\n\r\nTest Iterations: 1000\r\nList of Decimal case asdict: 0.68s\r\n\r\nTest Iterations: 1000000\r\nBasic types case asdict: 1.33s\r\nBasic types astuple: 1.28s\r\n\r\nTest Iterations: 100000\r\nOpaque types asdict: 2.14s\r\nOpaque types astuple: 2.13s\r\n\r\nTest Iterations: 100\r\nMixed containers asdict: 1.99s\r\nMixed containers astuple: 1.84s\r\n```\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-103005\n* gh-104364\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/DavidCEllis","@type":"Person","name":"DavidCEllis"},"datePublished":"2023-03-24T12:09:49.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":5},"url":"https://github.com/103000/cpython/issues/103000"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:ae9ee0f5-77c7-14ce-8ff0-5d3ed0d3d9ef |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | E8A4:680AB:90E3B0:C3085C:6969BEAB |
| html-safe-nonce | eb4cb76673bbf3f06abc9c04128f9076bebde73a3b0f69f3754311507dd13664 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFOEE0OjY4MEFCOjkwRTNCMDpDMzA4NUM6Njk2OUJFQUIiLCJ2aXNpdG9yX2lkIjoiNjQ1MzI3ODI5MjAzMzc4MTQxOSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | e7eebe2a88fdd5e5c330639f448e026cc460a28903c424eedf778735dc8a49a9 |
| hovercard-subject-tag | issue:1639276455 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/103000/issue_layout |
| twitter:image | https://opengraph.githubassets.com/d877e44471749303d189a79e23354b02c5a6c2898664037935b576de79d190ad/python/cpython/issues/103000 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/d877e44471749303d189a79e23354b02c5a6c2898664037935b576de79d190ad/python/cpython/issues/103000 |
| og:image:alt | Feature or enhancement Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | DavidCEllis |
| hostname | github.com |
| expected-hostname | github.com |
| None | acedec8b5f975d9e3d494ddd8f949b0b8a0de59d393901e26f73df9dcba80056 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 83c08c21cdda978090dc44364b71aa5bc6dcea79 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width