Title: gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] by gpshead · Pull Request #143262 · python/cpython · GitHub
Open Graph Title: gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] by gpshead · Pull Request #143262 · python/cpython
X Title: gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] by gpshead · Pull Request #143262 · python/cpython
Description: Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes: Add base64_encode_trio() and base64_decode_quad() helper functions that process complete groups independently Add base64_encode_fast() and base64_decode_fast() wrappers Update b2a_base64 and a2b_base64 to use fast path for complete groups The binasciibench I used measuring base64 encoding/decoding throughput is included in commit history, but i pulled it out of the PR in favor of adding to pyperformance. Performance gains (encode/decode speedup vs main, PGO builds): 64 bytes 64K 1M Zen2: 1.2x/1.8x 1.7x/2.8x 1.5x/2.8x Zen4: 1.2x/1.7x 1.6x/3.0x 1.5x/3.0x [old data, likely faster] M4: 1.3x/1.9x 2.3x/2.8x 2.4x/2.9x [old data, likely faster] RPi5-32: 1.2x/1.2x 2.4x/2.4x 2.0x/2.1x RPi4-64: 1.3x/2.0x 2.4x/5.0x 1.8x/5.0x Additional SIMD implementations (NEON, AVX-512 VBMI) can achieve +50% (M4) to +1500% (!! Zen4) further gains and are planned for follow-on work if deemed simple to maintain. Widely used third party libraries contain industry canonical SIMD accelerated variants such as simdutf (C++ based unfortunately) so the decision of how to link and use those and when is best kept separate. This PR's simple pure better use of modern CPU functional unit pipelining wins make sense regardless. Based on my exploratory work done in main...gpshead:cpython:claude/vectorize-base64-c-S7Hku Issue: gh-124951
Open Graph Description: Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes: Add base64_encode_trio() and base64_decode_quad() helper functions that process complete groups independent...
X Description: Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes: Add base64_encode_trio() and base64_decode_quad() helper functions that process complete groups independent...
Opengraph URL: https://github.com/python/cpython/pull/143262
X: @github
Domain: github.com
| route-pattern | /:user_id/:repository/pull/:id/checks(.:format) |
| route-controller | pull_requests |
| route-action | checks |
| fetch-nonce | v2:57330e3f-4a1e-10a0-26e7-013519d76ade |
| current-catalog-service-hash | 87dc3bc62d9b466312751bfd5f889726f4f1337bdff4e8be7da7c93d6c00a25a |
| request-id | 82E4:D209A:C92AF5:104AAED:696B28F1 |
| html-safe-nonce | b77e6987265e9ed8d455c5ea99710930b844e47525a1bc9876a9b97f3e81f5cc |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4MkU0OkQyMDlBOkM5MkFGNToxMDRBQUVEOjY5NkIyOEYxIiwidmlzaXRvcl9pZCI6IjI3MjI3OTk2NzMwMjg1ODU3MTMiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 95ca90144b787fa5234686112606abf12a324df1675d401b774bf108b3df63c2 |
| hovercard-subject-tag | pull_request:3133321361 |
| github-keyboard-shortcuts | repository,pull-request-list,pull-request-conversation,pull-request-files-changed,checks,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/python/cpython/pull/143262/checks |
| twitter:image | https://avatars.githubusercontent.com/u/68491?s=400&v=4 |
| twitter:card | summary_large_image |
| og:image | https://avatars.githubusercontent.com/u/68491?s=400&v=4 |
| og:image:alt | Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes: Add base64_encode_trio() and base64_decode_quad() helper functions that process complete groups independent... |
| og:site_name | GitHub |
| og:type | object |
| hostname | github.com |
| expected-hostname | github.com |
| None | 5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d |
| turbo-cache-control | no-cache |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive full-width full-width-p-0 |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 82560a55c6b2054555076f46e683151ee28a19bc |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width