René's URL Explorer Experiment


Title: performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster? · Issue #117151 · python/cpython · GitHub

Open Graph Title: performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster? · Issue #117151 · python/cpython

X Title: performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster? · Issue #117151 · python/cpython

Description: Bug report Bug description: Hello, I was doing some benchmarking of python and package installation. That got me down a rabbit hole of buffering optimizations between between pip, requests, urllib and the cpython interpreter. TL;DR I wou...

Open Graph Description: Bug report Bug description: Hello, I was doing some benchmarking of python and package installation. That got me down a rabbit hole of buffering optimizations between between pip, requests, urllib ...

X Description: Bug report Bug description: Hello, I was doing some benchmarking of python and package installation. That got me down a rabbit hole of buffering optimizations between between pip, requests, urllib ...

Opengraph URL: https://github.com/python/cpython/issues/117151

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster?","articleBody":"# Bug report\n\n### Bug description:\n\nHello, \n\nI was doing some benchmarking of python and package installation.\nThat got me down a rabbit hole of buffering optimizations between between pip, requests, urllib and the cpython interpreter.\n\nTL;DR I would like to discuss updating the value of io.DEFAULT_BUFFER_SIZE. It was set to 8192 since 16 years ago.\noriginal commit: https://github.com/python/cpython/blame/main/Lib/_pyio.py#L27\n\nIt was a reasonable size given hardware and OS at the time. It's far from optimal today.\nRemember, in 2008 you'd run a 32 bits operating system with less than 2 GB memory available and to share between all running applications. \nBuffers had to be small, few kB, it wasn't conceivable to have buffer measured in entire MB.\n\nI will attach benchmarks in the next messages showing 3 to 5 times write performance improvement when adjusting the buffer size.\n\nI think the python interpreter can adopt a buffer size somewhere between 64k to 256k by default.\nI think 64k is the minimum for python and it should be safe to adjust to.\nHigher is better for performance in most cases, though there may be some cases where it's unwanted \n(seek and small read/writes, unwanted trigger of write ahead, slow devices with throughput in measured in kB/s where you don't want to block for long)\n\nIn addition, I think there is a bug in open() on Linux.\nopen() sets the buffer size to the device block size on Linux when available (st_blksize, 4k on most disks), instead of io.DEFAULT_BUFFER_SIZE=8k.\nI believe this is unwanted behavior, the block size is the minimal size for IO operations on the IO device, it's not the optimal size and it should not be preferred.\nI think open() on Linux should be corrected to use a default buffer size of `max(st_blksize, io.DEFAULT_BUFFER_SIZE)` instead of `st_blksize`?\n\nRelated, the doc might be misleading for saying st_blksize is the preferred size for efficient I/O. https://github.com/python/cpython/blob/main/Doc/library/os.rst#L3181\nThe GNU doc was updated to clarify: \"This is not guaranteed to give optimum performance\" https://www.gnu.org/software/gnulib/manual/html_node/stat_002dsize.html\n\nThoughts?\n\n\n\nAnnex: some historical context and technical considerations around buffering.\n\nOn the hardware side:\n* HDD had 512 bytes blocks historically, then HDD moved to 4096 bytes blocks in the 2010s.\n* SSD have 4096 bytes blocks as far as I know.\n\nOn filesystems:\n* buffer size should never be smaller than device and filesystem blocksize\n* I think ext3, ext4, xfs, ntfs, etc... follow the device block size of 4k as default, though they can be configured for any block size.\n* NTFS is capped to 16TB maximum disk size with 4k blocks.\n* microsoft recommends 64k block size for windows server 2019+ and larger disks https://learn.microsoft.com/en-us/windows-server/storage/file-server/ntfs-overview\n* RAID setups and assimilated with zfs/btrfs/xfs can have custom block size, I think anywhere 4kB-1MB. I don't know if there is any consensus, I think anything 16k-32k-64k-128k can be seen in the wild.\n\nOn network filesystems:\n* shared network home directories are common on linux (NFS share) and windows (SMB share).\n* entreprise storage vendors like Pure/Vast/NetApp recommend 524488 or 1048576 bytes for IO. \n* see rsize wsize in mount settings: \n* `host:path on path type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,acregmin=60,acdirmin=60,hard,proto=tcp,nconnect=8,mountproto=tcp, ...)`\n* for windows I cannot find documentation for network clients, though the windows server should have the NTFS filesystem with at least 64k block size as per microsoft recommendation above.\n\nOn pipes:\n* buffering is used by pipes and for interprocess communications. see subprocess.py\n* posix guarantees that writes to pipes are atomic up to PIPE_BUF, 4096 bytes on Linux kernel, guaranteed to be at least 512 bytes by posix.\n* Python had a default of io.DEFAULT_BUFFER_SIZE=8192 so it never benefitted from that atomic property :D\n\non compression code, they probably all need to be adjusted:\n* the buffer size is used by compression code in cpython: gzip.py lzma.py bz2.py\n* I think lzma and bz2 are using the default size.\n* gzip is using a 128kb read buffer, somebody realized it was very slow 2 years ago and rewrote the buffering to 128k.\n* then somebody else realized last year it was still very slow to write and added an arbitrary write buffer 4*io.DEFAULT_BUFFR_SIZE.\n* https://github.com/python/cpython/commit/eae7dad40255bad42e4abce53ff8143dcbc66af5\n* https://github.com/python/cpython/issues/89550\n* base64 is reading in chunks of 76 characters???\n* https://github.com/python/cpython/blob/main/Lib/base64.py#L532\n\nOn network IO:\n* On Linux, TCP read and write buffers were a minimum of 16k historically. The read buffer was increased to 64k in kernel v4.20, year 2018\n* the buffer is resized dynamically with the TCP window upto 4MB write 6M read, let's not get into TCP.  see sysctl_tcp_rmem sysctl_tcp_wmem\n* linux code: https://github.com/torvalds/linux/blame/master/net/ipv4/tcp.c#L4775\n* commit Sep 2018: https://github.com/torvalds/linux/commit/a337531b942bd8a03e7052444d7e36972aac2d92\n* I think socket buffers are managed separately by the kernel, the io.DEFAULT_BUFFER_SIZE matters when you read a file and write to network, or read from network and write to file.\n\n\non HTTP, a large subset of networking:\n* HTTP is large file transfer and would benefit from a much larger buffer, but that's probably more of a concern for urllib/requests. \n* requests.content is 10k chunk by default.\n* requests iter_lines(chunk_size=512, decode_unicode=False, delimiter=None) is 512 chunk by default.\n* requests iter_content(chunk_size=1, decode_unicode=False) is 1 byte by default\n* source: set in 2012 https://github.com/psf/requests/blame/8dd3b26bf59808de24fd654699f592abf6de581e/src/requests/models.py#L80\n\nnote to self: remember to publish code and result in next message\n\n\n### CPython versions tested on:\n\n3.11\n\n### Operating systems tested on:\n\nOther\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-118037\n* gh-118144\n* gh-119783\n* gh-131052\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/morotti","@type":"Person","name":"morotti"},"datePublished":"2024-03-22T11:41:23.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":11},"url":"https://github.com/117151/cpython/issues/117151"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:e3ed0ef9-8c12-d164-c3d0-016b73554f9a
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idB84E:1EFC6:2310807:2FC055F:696AD09A
html-safe-noncecdcd5c94a08b5af43e300340a5163341273b9f72328723baac2589f455cfa951
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCODRFOjFFRkM2OjIzMTA4MDc6MkZDMDU1Rjo2OTZBRDA5QSIsInZpc2l0b3JfaWQiOiI5MDY1NDIyMzA1NjIwNTc4NDU4IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmacf099476ce574972aee02c140f69ed36b31fefb5d820dd10c7cf5eec212d02d00
hovercard-subject-tagissue:2202305662
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/117151/issue_layout
twitter:imagehttps://opengraph.githubassets.com/287a5f3858467fa3c12c7ebebf00d2c887fbb4dd0c025ffee8ba1acf1614b4f7/python/cpython/issues/117151
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/287a5f3858467fa3c12c7ebebf00d2c887fbb4dd0c025ffee8ba1acf1614b4f7/python/cpython/issues/117151
og:image:altBug report Bug description: Hello, I was doing some benchmarking of python and package installation. That got me down a rabbit hole of buffering optimizations between between pip, requests, urllib ...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamemorotti
hostnamegithub.com
expected-hostnamegithub.com
Nonec785f4ce187e9e7331257791b36ddee01625bb8e292a9b4fe2c16d4c006abf5d
turbo-cache-controlno-preview
go-importgithub.com/python/cpython git https://github.com/python/cpython.git
octolytics-dimension-user_id1525981
octolytics-dimension-user_loginpython
octolytics-dimension-repository_id81598961
octolytics-dimension-repository_nwopython/cpython
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id81598961
octolytics-dimension-repository_network_root_nwopython/cpython
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasec718a376fcf780eb22089171adb84a543f660bf7
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/python/cpython/issues/117151#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F117151
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F117151
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=python%2Fcpython
Reloadhttps://github.com/python/cpython/issues/117151
Reloadhttps://github.com/python/cpython/issues/117151
Reloadhttps://github.com/python/cpython/issues/117151
python https://github.com/python
cpythonhttps://github.com/python/cpython
Please reload this pagehttps://github.com/python/cpython/issues/117151
Notifications https://github.com/login?return_to=%2Fpython%2Fcpython
Fork 33.9k https://github.com/login?return_to=%2Fpython%2Fcpython
Star 71.1k https://github.com/login?return_to=%2Fpython%2Fcpython
Code https://github.com/python/cpython
Issues 5k+ https://github.com/python/cpython/issues
Pull requests 2.1k https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects 31 https://github.com/python/cpython/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/python/cpython/security
Please reload this pagehttps://github.com/python/cpython/issues/117151
Insights https://github.com/python/cpython/pulse
Code https://github.com/python/cpython
Issues https://github.com/python/cpython/issues
Pull requests https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects https://github.com/python/cpython/projects
Security https://github.com/python/cpython/security
Insights https://github.com/python/cpython/pulse
New issuehttps://github.com/login?return_to=https://github.com/python/cpython/issues/117151
New issuehttps://github.com/login?return_to=https://github.com/python/cpython/issues/117151
performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster?https://github.com/python/cpython/issues/117151#top
https://github.com/gpshead
performancePerformance or resource usagehttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22performance%22
stdlibStandard Library Python modules in the Lib/ directoryhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22stdlib%22
topic-IOhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22topic-IO%22
https://github.com/morotti
https://github.com/morotti
morottihttps://github.com/morotti
on Mar 22, 2024https://github.com/python/cpython/issues/117151#issue-2202305662
https://github.com/python/cpython/blame/main/Lib/_pyio.py#L27https://github.com/python/cpython/blame/main/Lib/_pyio.py#L27
https://github.com/python/cpython/blob/main/Doc/library/os.rst#L3181https://github.com/python/cpython/blob/main/Doc/library/os.rst#L3181
https://www.gnu.org/software/gnulib/manual/html_node/stat_002dsize.htmlhttps://www.gnu.org/software/gnulib/manual/html_node/stat_002dsize.html
https://learn.microsoft.com/en-us/windows-server/storage/file-server/ntfs-overviewhttps://learn.microsoft.com/en-us/windows-server/storage/file-server/ntfs-overview
eae7dadhttps://github.com/python/cpython/commit/eae7dad40255bad42e4abce53ff8143dcbc66af5
GzipFile.write should be buffered #89550https://github.com/python/cpython/issues/89550
https://github.com/python/cpython/blob/main/Lib/base64.py#L532https://github.com/python/cpython/blob/main/Lib/base64.py#L532
https://github.com/torvalds/linux/blame/master/net/ipv4/tcp.c#L4775https://github.com/torvalds/linux/blame/master/net/ipv4/tcp.c#L4775
torvalds/linux@a337531https://github.com/torvalds/linux/commit/a337531b942bd8a03e7052444d7e36972aac2d92
https://github.com/psf/requests/blame/8dd3b26bf59808de24fd654699f592abf6de581e/src/requests/models.py#L80https://github.com/psf/requests/blame/8dd3b26bf59808de24fd654699f592abf6de581e/src/requests/models.py#L80
gh-117151: optimize BufferedWriter(), do not buffer writes that are the buffer size #118037https://github.com/python/cpython/pull/118037
gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k #118144https://github.com/python/cpython/pull/118144
gh-117151: increase default buffer size of shutil.copyfileobj() to 256k. #119783https://github.com/python/cpython/pull/119783
gh-117151: optimize algorithm to grow the buffer size for readall() on files #131052https://github.com/python/cpython/pull/131052
gpsheadhttps://github.com/gpshead
performancePerformance or resource usagehttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22performance%22
stdlibStandard Library Python modules in the Lib/ directoryhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22stdlib%22
topic-IOhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22topic-IO%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.