Title: gh-101178: refactor base64.b85encode to be memory friendly by romuald · Pull Request #112248 · python/cpython · GitHub
Open Graph Title: gh-101178: refactor base64.b85encode to be memory friendly by romuald · Pull Request #112248 · python/cpython
X Title: gh-101178: refactor base64.b85encode to be memory friendly by romuald · Pull Request #112248 · python/cpython
Description: Current description Rewrote the base64._85encode method logic in C, by plugging in the binascii module (already taking care of the bae64 methods) By using C and a single buffer, the memory use is reduced to a minimum, addressing the initial issue. It also greatly improves performance as a bonus: main SMALL (11 bytes): 1575427 iterations (1.27 µs per call, 115.41 ns per byte) MEDIUM (200 bytes): 204909 iterations (9.76 µs per call, 48.80 ns per byte) BIG (5000 bytes): 8623 iterations (231.94 µs per call, 46.39 ns per byte) VERYBIG (500000 bytes): 81 iterations (24.69 ms per call, 49.38 ns per byte) branch SMALL (11 bytes): 11230718 iterations (178.08 ns per call, 16.19 ns per byte) MEDIUM (200 bytes): 6004721 iterations (333.07 ns per call, 1.67 ns per byte) BIG (5000 bytes): 458005 iterations (4.37 µs per call, 873.35 ps per byte) VERYBIG (500000 bytes): 4772 iterations (419.11 µs per call, 838.22 ps per byte) Script used to test: https://gist.github.com/romuald/7aeba5f40693bb351da4abe62ad7321d Previous description (python refactor) not up to date with current PR Refactor code to make use of generators instead of allocating 2 potentially huge lists for large datasets Memory gain only measured using macOS and a 5Mb input. Using main: Before encoding Physical footprint: 16.3M Physical footprint (peak): 21.3M After encoding Physical footprint: 45.0M Physical footprint (peak): 244.1M With refactor: Before encoding Physical footprint: 14.6M Physical footprint (peak): 19.6M After encoding Physical footprint: 28.5M Physical footprint (peak): 34.4M The execution time is more than doubled, which may not be acceptable. However the memory used is reduced by more than 90% edit: changed the algorithm to be more efficient, the performance decrease now seems to be negligible I also have no idea how (and if) I should test this Here is the script I've used to measure the execution time, the memdebug can probably be adapted to read /proc/{pid} on Linux edit: updated to work on Linux too import os import sys import random import hashlib import platform import subprocess from time import time from base64 import b85encode def memdebug(): if platform.system == "Darwin": if not os.environ.get("MallocStackLogging"): return res = subprocess.check_output(["malloc_history", str(os.getpid()), "-highWaterMark", "-allBySize"]) for line in res.splitlines(): if line.startswith(b"Physical"): print(line.decode()) elif platform.system() == "Linux": with open(f"/proc/{os.getpid()}/status") as reader: for line in reader: if line.startswith("VmPeak:"): print(line, end="") def main(): # use a stable input rnd = random.Random() rnd.seed(42) data = rnd.randbytes(5_000_000) memdebug() start = time() import pdb try: res = b85encode(data) except Exception: # pdb.post_mortem() raise end = time() memdebug() print("Data length:", len(data)) print("Output length:", len(res)) print(f"Decode time: {end-start:.3f}s") h = hashlib.md5(res).hexdigest() print("Hashed result", h) assert h == "ad97e45ba085865e70f7aa05c9a31388" if __name__ == '__main__': main() Issue: gh-101178
Open Graph Description: Current description Rewrote the base64._85encode method logic in C, by plugging in the binascii module (already taking care of the bae64 methods) By using C and a single buffer, the memory use is r...
X Description: Current description Rewrote the base64._85encode method logic in C, by plugging in the binascii module (already taking care of the bae64 methods) By using C and a single buffer, the memory use is r...
Opengraph URL: https://github.com/python/cpython/pull/112248
X: @github
Domain: github.com
| route-pattern | /:user_id/:repository/pull/:id/checks(.:format) |
| route-controller | pull_requests |
| route-action | checks |
| fetch-nonce | v2:ee68a3f2-8bd4-f6f3-52a3-af4c27d93b8d |
| current-catalog-service-hash | 87dc3bc62d9b466312751bfd5f889726f4f1337bdff4e8be7da7c93d6c00a25a |
| request-id | B272:36099A:1E732D7:2ACA6A9:696A5C1F |
| html-safe-nonce | f3a378a0d1fe95abf15adc452041130a907b40f2486a1de0fdd6c77d5bc96836 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCMjcyOjM2MDk5QToxRTczMkQ3OjJBQ0E2QTk6Njk2QTVDMUYiLCJ2aXNpdG9yX2lkIjoiNjUyMTgzMzIwNzA5Nzg3NTQ4NyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | f3fac9c101772e9c07400c83c912966cf93c906b1d29581ed3e5f9ae3fcd4de2 |
| hovercard-subject-tag | pull_request:1607720156 |
| github-keyboard-shortcuts | repository,pull-request-list,pull-request-conversation,pull-request-files-changed,checks,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/python/cpython/pull/112248/checks |
| twitter:image | https://avatars.githubusercontent.com/u/113200?s=400&v=4 |
| twitter:card | summary_large_image |
| og:image | https://avatars.githubusercontent.com/u/113200?s=400&v=4 |
| og:image:alt | Current description Rewrote the base64._85encode method logic in C, by plugging in the binascii module (already taking care of the bae64 methods) By using C and a single buffer, the memory use is r... |
| og:site_name | GitHub |
| og:type | object |
| hostname | github.com |
| expected-hostname | github.com |
| None | 3f871c8e07f0ae1886fa8dac284166d28b09ad5bada6476fc10b674e489788ef |
| turbo-cache-control | no-cache |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive full-width full-width-p-0 |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 63c426b30d262aba269ef14c40e3c817b384cd61 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width