Title: `base64` module: Link against SIMD library for 10x performance. · Issue #124951 · python/cpython · GitHub
Open Graph Title: `base64` module: Link against SIMD library for 10x performance. · Issue #124951 · python/cpython
X Title: `base64` module: Link against SIMD library for 10x performance. · Issue #124951 · python/cpython
Description: Performance enhancement Proposal: https://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra features (character translation)^, it links ag...
Open Graph Description: Performance enhancement Proposal: https://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra featur...
X Description: Performance enhancement Proposal: https://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra featur...
Opengraph URL: https://github.com/python/cpython/issues/124951
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"`base64` module: Link against SIMD library for 10x performance.","articleBody":"# Performance enhancement\r\n\r\n### Proposal:\r\n\r\nhttps://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra features (character translation)^, it links against https://github.com/aklomp/base64, a BSD licensed C99 library with SIMD acceleration giving 5-20x performance on base64 encoding and decoding operations vs our existing generic byte based base64 C code.\r\n\r\nWe could adopt a bunch of the pybase64 code to make the default base64 module experience better - it is relatively straight forward extension module code (as one would expect). On the other hand, I expect pybase64 to still be where new development and further improvements in this space continue to happen as people who care strongly about performance need the latest and greatest from PyPI regardless of their current CPython version. (looping in @mayeut for thoughts on that)\r\n\r\n**Practicalities**: Library availability? we'd vendor a libbase64 build for use on our binary distributions. I don't think it is currently widely available (? I only did a quick search on Ubuntu) as a package on Linux distributions though so we'd currently need to vendor our own copy in tree to be fair and match the good performance there (yuck, but ideally only temporary until distros pick it up as a package of its own, consider it similar to a Modules/_decimal/libmpdec/ situation - our configure.ac finds an installed one \u0026 distros link against that)\r\n\r\n**Risks**: It is a new C library dependency. Security concerns within it thus become our own. As `base64` is frequently used to process untrusted input. But its surface of possible problems is limited (very simple data format). We should ensure the library gets proper [oss-fuzz](https://github.com/google/oss-fuzz) test coverage before adoption (@aklomp for visibility).\r\n\r\n---\r\n\r\n^ `bytes.translate`, `bytearray.translate`, or `str.translate` might benefit from similar SIMD treatment - which would be better from a CPython perspective than only doing that within this module? If so, lets file a new issue just for that bit.\r\n\r\n---\r\n\r\n```\r\n❯ python -m pybase64 benchmark `which python`\r\npybase64 1.4.0 (C extension active - NEON) # running on my Apple M3\r\nbench: altchars=None, validate=False\r\npybase64._pybase64.encodebytes: 4776.815 MB/s (5,936,128 bytes -\u003e 8,018,983 bytes)\r\npybase64._pybase64.b64encode: 11989.872 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\npybase64._pybase64.b64decode: 3039.329 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbase64.encodebytes: 292.876 MB/s (5,936,128 bytes -\u003e 8,018,983 bytes)\r\nbase64.b64encode: 601.307 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\nbase64.b64decode: 492.088 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbench: altchars=None, validate=True\r\npybase64._pybase64.b64encode: 12327.286 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\npybase64._pybase64.b64decode: 8611.733 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbase64.b64encode: 597.389 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\nbase64.b64decode: 472.430 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbench: altchars=b'-_', validate=False\r\npybase64._pybase64.b64encode: 1287.615 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\npybase64._pybase64.b64decode: 2524.966 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbase64.b64encode: 473.320 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\nbase64.b64decode: 406.411 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbench: altchars=b'-_', validate=True\r\npybase64._pybase64.b64encode: 1283.111 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\npybase64._pybase64.b64decode: 6745.809 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\nbase64.b64encode: 464.526 MB/s (5,936,128 bytes -\u003e 7,914,840 bytes)\r\nbase64.b64decode: 391.959 MB/s (7,914,840 bytes -\u003e 5,936,128 bytes)\r\n```\r\n\r\n### Has this already been discussed elsewhere?\r\n\r\nNo response given\r\n\r\n### Links to previous discussion of this feature:\r\n\r\nIf we spawn Discuss threads around this, lets edit and drop links here.\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-143262\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/gpshead","@type":"Person","name":"gpshead"},"datePublished":"2024-10-03T21:02:46.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":6},"url":"https://github.com/124951/cpython/issues/124951"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:82c05fb4-362f-cad0-1b0d-a197c576974a |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | E8A2:3244AB:180C678:2050486:6969BFC2 |
| html-safe-nonce | 09b54b5ea21984cf843207f76cbd6145a15efef4c953324995e659dd24c6ed8f |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFOEEyOjMyNDRBQjoxODBDNjc4OjIwNTA0ODY6Njk2OUJGQzIiLCJ2aXNpdG9yX2lkIjoiNDIwODc0OTUwMTQ5ODM3NjEzMCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | c3f3f81467af3a3338ee7bc18fdfbdd404471a2839d6f01ac74f9aae20ef3ea8 |
| hovercard-subject-tag | issue:2564998356 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/124951/issue_layout |
| twitter:image | https://opengraph.githubassets.com/c0db3ddf8b54f89e01924cd68d33bbfe66f44423937ceea26d26ccccf004655a/python/cpython/issues/124951 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/c0db3ddf8b54f89e01924cd68d33bbfe66f44423937ceea26d26ccccf004655a/python/cpython/issues/124951 |
| og:image:alt | Performance enhancement Proposal: https://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra featur... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | gpshead |
| hostname | github.com |
| expected-hostname | github.com |
| None | acedec8b5f975d9e3d494ddd8f949b0b8a0de59d393901e26f73df9dcba80056 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 83c08c21cdda978090dc44364b71aa5bc6dcea79 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width