René's URL Explorer Experiment


Title: gh-95534: Improve gzip reading speed by 10% by rhpvorderman · Pull Request #97664 · python/cpython · GitHub

Open Graph Title: gh-95534: Improve gzip reading speed by 10% by rhpvorderman · Pull Request #97664 · python/cpython

X Title: gh-95534: Improve gzip reading speed by 10% by rhpvorderman · Pull Request #97664 · python/cpython

Description: For motivation check the github issue. #95534 Performance figures before (the used gzip file is the tar source distribution from github): ./python -m pyperf timeit -s "import gzip; g=gzip.open('cpython-3.10.7.tar.gz', 'rb'); it=iter(lambda:g.read(128*1024), b'');" "for _ in it: pass" ..................... Mean +- std dev: 301 ms +- 2 ms after: ./python -m pyperf timeit -s "import gzip; g=gzip.open('cpython-3.10.7.tar.gz', 'rb'); it=iter(lambda:g.read(128*1024), b'');" "for _ in it: pass" ..................... Mean +- std dev: 270 ms +- 1 ms Performance tests where run with all optimizations enabled. I found that --enable-optimizations did not influence the result. So it can be verified without all the PGO stuff. Change summary: There is now a gzip.READ_BUFFER_SIZE constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks. a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this. The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case. GzipFile.read has been optimized. There is no longer a unconsumed_tail member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. _add_read_data has been inlined, as it was just two calls. EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks. Results: before: Benchmark 1: cat cpython-3.10.7.tar.gz | ./python -m gzip -d > /dev/null Time (mean ± σ): 389.1 ms ± 12.0 ms [User: 372.7 ms, System: 19.9 ms] Range (min … max): 370.9 ms … 410.2 ms 20 runs After: Benchmark 1: cat cpython-3.10.7.tar.gz | ./python -m gzip -d > /dev/null Time (mean ± σ): 320.5 ms ± 12.1 ms [User: 306.4 ms, System: 17.6 ms] Range (min … max): 300.0 ms … 339.1 ms 20 runs For comparison: pigz, the fastest zlib utilizing gzip decompressor, on a single thread. (igzip is faster, but utilizes ISA-L). Benchmark 1: cat cpython-3.10.7.tar.gz | pigz -p 1 -d > /dev/null Time (mean ± σ): 293.8 ms ± 8.4 ms [User: 288.5 ms, System: 17.0 ms] Range (min … max): 277.5 ms … 302.7 ms 20 runs If we take the pure C pigz program as baseline, the amount of python overhead is reduced drastically from 30% to 10%. Issue: gh-95534

Open Graph Description: For motivation check the github issue. #95534 Performance figures before (the used gzip file is the tar source distribution from github): ./python -m pyperf timeit -s "import gzip; g=gzip.open...

X Description: For motivation check the github issue. #95534 Performance figures before (the used gzip file is the tar source distribution from github): ./python -m pyperf timeit -s "import gzip; g=gzip....

Opengraph URL: https://github.com/python/cpython/pull/97664

X: @github

direct link

Domain: github.com

route-pattern/:user_id/:repository/pull/:id/checks(.:format)
route-controllerpull_requests
route-actionchecks
fetch-noncev2:4b12ad7b-f586-5bad-648a-dcaa1bfe0b8e
current-catalog-service-hash87dc3bc62d9b466312751bfd5f889726f4f1337bdff4e8be7da7c93d6c00a25a
request-id8AF4:32EF48:F87094:1550BE1:6969F769
html-safe-nonce339e1f0e2fd2d6ca4b907e863a3ce17e4df68b85000ac8408f3f4034168fa26b
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4QUY0OjMyRUY0ODpGODcwOTQ6MTU1MEJFMTo2OTY5Rjc2OSIsInZpc2l0b3JfaWQiOiI0OTQxMzY2OTM1NDc5ODg4NDEiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac4972a24f9ccebd61bfcbb236b91be03eb707a243e6a56543d247da4bfb9a6e3d
hovercard-subject-tagpull_request:1072324132
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,checks,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///pull_requests/show/checks
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/python/cpython/pull/97664/checks
twitter:imagehttps://avatars.githubusercontent.com/u/26142226?s=400&v=4
twitter:cardsummary_large_image
og:imagehttps://avatars.githubusercontent.com/u/26142226?s=400&v=4
og:image:altFor motivation check the github issue. #95534 Performance figures before (the used gzip file is the tar source distribution from github): ./python -m pyperf timeit -s "import gzip; g=gzip.open...
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8
turbo-cache-controlno-preview
go-importgithub.com/python/cpython git https://github.com/python/cpython.git
octolytics-dimension-user_id1525981
octolytics-dimension-user_loginpython
octolytics-dimension-repository_id81598961
octolytics-dimension-repository_nwopython/cpython
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id81598961
octolytics-dimension-repository_network_root_nwopython/cpython
turbo-body-classeslogged-out env-production page-responsive full-width full-width-p-0
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasebdde15ad1b403e23b08bbd89b53fbe6bdf688cad
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/python/cpython/pull/97664/checks#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fpull%2F97664%2Fchecks
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fpull%2F97664%2Fchecks
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fpull_requests%2Fshow%2Fchecks&source=header-repo&source_repo=python%2Fcpython
Reloadhttps://github.com/python/cpython/pull/97664/checks
Reloadhttps://github.com/python/cpython/pull/97664/checks
Reloadhttps://github.com/python/cpython/pull/97664/checks
python https://github.com/python
cpythonhttps://github.com/python/cpython
Please reload this pagehttps://github.com/python/cpython/pull/97664/checks
Notifications https://github.com/login?return_to=%2Fpython%2Fcpython
Fork 33.9k https://github.com/login?return_to=%2Fpython%2Fcpython
Star 71.1k https://github.com/login?return_to=%2Fpython%2Fcpython
Code https://github.com/python/cpython
Issues 5k+ https://github.com/python/cpython/issues
Pull requests 2.1k https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects 31 https://github.com/python/cpython/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/python/cpython/security
Please reload this pagehttps://github.com/python/cpython/pull/97664/checks
Insights https://github.com/python/cpython/pulse
Code https://github.com/python/cpython
Issues https://github.com/python/cpython/issues
Pull requests https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects https://github.com/python/cpython/projects
Security https://github.com/python/cpython/security
Insights https://github.com/python/cpython/pulse
Sign up for GitHub https://github.com/signup?return_to=%2Fpython%2Fcpython%2Fissues%2Fnew%2Fchoose
terms of servicehttps://docs.github.com/terms
privacy statementhttps://docs.github.com/privacy
Sign inhttps://github.com/login?return_to=%2Fpython%2Fcpython%2Fissues%2Fnew%2Fchoose
gpsheadhttps://github.com/gpshead
python:mainhttps://github.com/python/cpython/tree/main
rhpvorderman:gh-95534https://github.com/rhpvorderman/cpython/tree/gh-95534
Conversation 27 https://github.com/python/cpython/pull/97664
Commits 33 https://github.com/python/cpython/pull/97664/commits
Checks 0 https://github.com/python/cpython/pull/97664/checks
Files changed https://github.com/python/cpython/pull/97664/files
Please reload this pagehttps://github.com/python/cpython/pull/97664/checks
Please reload this pagehttps://github.com/python/cpython/pull/97664/checks
gh-95534: Improve gzip reading speed by 10% https://github.com/python/cpython/pull/97664/checks#top
Please reload this pagehttps://github.com/python/cpython/pull/97664/checks
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.