René's URL Explorer Experiment


Title: GH-102613: Fast recursive globbing in `pathlib.Path.glob()` by barneygale · Pull Request #104512 · python/cpython · GitHub

Open Graph Title: GH-102613: Fast recursive globbing in `pathlib.Path.glob()` by barneygale · Pull Request #104512 · python/cpython

X Title: GH-102613: Fast recursive globbing in `pathlib.Path.glob()` by barneygale · Pull Request #104512 · python/cpython

Description: This PR introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal ** wildcard, such as **/*.py. For this example, the previous implementation recursively walked directories using os.scandir() when it expanded the ** component, and then scanned those same directories again when expanded the *.py component. This is wasteful. In the new implementation, any components following a ** wildcard are used to build a re.Pattern object, which is used to filter the results of the recursive walk. A pattern like **/*.py uses half the number of os.scandir() calls; a pattern like **/*/*.py a third, etc. This new algorithm does not apply if either: The follow_symlinks argument is set to None (its default), or The pattern contains .. components. In these cases we fall back to the old implementation. This PR also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal Path._glob() method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain. Performance for the original #102613 repro case, with 400 nested a/ directories, and matching treatment of symlinks and hidden files: $ ../python -m timeit -s 'import glob' 'print(glob.glob("**/*", recursive=True, include_hidden=True))' 5 loops, best of 5: 66.2 msec per loop $ ../python -m timeit -s 'from pathlib import Path' 'print(list(Path(".").rglob("**/*", follow_symlinks=True)))' 10 loops, best of 5: 22.7 msec per loop # before this PR 10 loops, best of 5: 16.5 msec per loop # after this PR These results were from an SSD. The improvement will be greater for slow storage (e.g. network-mounted volumes). Issue: gh-102613

Open Graph Description: This PR introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal ** wildcard, such as **/*.py. For this example, the previous implementation recursively...

X Description: This PR introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal ** wildcard, such as **/*.py. For this example, the previous implementation rec...

Opengraph URL: https://github.com/python/cpython/pull/104512

X: @github

direct link

Domain: github.com

route-pattern/:user_id/:repository/pull/:id/files(.:format)
route-controllerpull_requests
route-actionfiles
fetch-noncev2:dfb86328-1aad-7ce5-67bb-dd15762e9c52
current-catalog-service-hashae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b
request-idAE56:511EE:11CF33D:17DF1DE:696998FF
html-safe-nonce46a588af4b8bccfda9999bf83b35deab27a09dc3c41f456bd21501a0056ffefc
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBRTU2OjUxMUVFOjExQ0YzM0Q6MTdERjFERTo2OTY5OThGRiIsInZpc2l0b3JfaWQiOiI0MDczMjg5NjMwMzAxNjU3MzQzIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmac02b25eb3b258b84b29d32ba907db3001d09636adced8373b5fec84c3c69653f1
hovercard-subject-tagpull_request:1351240285
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///pull_requests/show/files
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/python/cpython/pull/104512/files
twitter:imagehttps://avatars.githubusercontent.com/u/960340?s=400&v=4
twitter:cardsummary_large_image
og:imagehttps://avatars.githubusercontent.com/u/960340?s=400&v=4
og:image:altThis PR introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal ** wildcard, such as **/*.py. For this example, the previous implementation recursively...
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None3542e147982176a7ebaa23dfb559c8af16f721c03ec560c68c56b64a0f35e751
turbo-cache-controlno-preview
diff-viewunified
go-importgithub.com/python/cpython git https://github.com/python/cpython.git
octolytics-dimension-user_id1525981
octolytics-dimension-user_loginpython
octolytics-dimension-repository_id81598961
octolytics-dimension-repository_nwopython/cpython
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id81598961
octolytics-dimension-repository_network_root_nwopython/cpython
turbo-body-classeslogged-out env-production page-responsive full-width
disable-turbotrue
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releaseaf80af7cc9e3de9c336f18b208a600950a3c187c
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/python/cpython/pull/104512/files#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fpull%2F104512%2Ffiles
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fpull%2F104512%2Ffiles
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fpull_requests%2Fshow%2Ffiles&source=header-repo&source_repo=python%2Fcpython
Reloadhttps://github.com/python/cpython/pull/104512/files
Reloadhttps://github.com/python/cpython/pull/104512/files
Reloadhttps://github.com/python/cpython/pull/104512/files
python https://github.com/python
cpythonhttps://github.com/python/cpython
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
Notifications https://github.com/login?return_to=%2Fpython%2Fcpython
Fork 33.9k https://github.com/login?return_to=%2Fpython%2Fcpython
Star 71.1k https://github.com/login?return_to=%2Fpython%2Fcpython
Code https://github.com/python/cpython
Issues 5k+ https://github.com/python/cpython/issues
Pull requests 2.1k https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects 31 https://github.com/python/cpython/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/python/cpython/security
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
Insights https://github.com/python/cpython/pulse
Code https://github.com/python/cpython
Issues https://github.com/python/cpython/issues
Pull requests https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects https://github.com/python/cpython/projects
Security https://github.com/python/cpython/security
Insights https://github.com/python/cpython/pulse
Sign up for GitHub https://github.com/signup?return_to=%2Fpython%2Fcpython%2Fissues%2Fnew%2Fchoose
terms of servicehttps://docs.github.com/terms
privacy statementhttps://docs.github.com/privacy
Sign inhttps://github.com/login?return_to=%2Fpython%2Fcpython%2Fissues%2Fnew%2Fchoose
barneygalehttps://github.com/barneygale
python:mainhttps://github.com/python/cpython/tree/main
barneygale:gh-102613-remove-selector-classeshttps://github.com/barneygale/cpython/tree/gh-102613-remove-selector-classes
Conversation 13 https://github.com/python/cpython/pull/104512
Commits 17 https://github.com/python/cpython/pull/104512/commits
Checks 0 https://github.com/python/cpython/pull/104512/checks
Files changed https://github.com/python/cpython/pull/104512/files
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
GH-102613: Fast recursive globbing in pathlib.Path.glob() https://github.com/python/cpython/pull/104512/files#top
Show all changes 17 commits https://github.com/python/cpython/pull/104512/files
7a58251 GH-102613: Simplify implementation of `pathlib.Path.glob()` barneygale May 15, 2023 https://github.com/python/cpython/pull/104512/commits/7a582518ba3c9965622bc1b19199b0d3b857ae9e
d5b1836 Speed up matching `*` barneygale May 15, 2023 https://github.com/python/cpython/pull/104512/commits/d5b1836733e8400068d441f02dfe10c5dcbcb3e4
d5c86c6 Add comments, docstrings. barneygale May 15, 2023 https://github.com/python/cpython/pull/104512/commits/d5c86c6e003b529d9c367815615c7644d2f7d5f1
53dcb79 Merge branch 'main' into gh-102613-remove-selector-classes barneygale May 18, 2023 https://github.com/python/cpython/pull/104512/commits/53dcb795582a2295ce4e6db7c290239f6ad8ffba
6da6a83 Merge branch 'main' into gh-102613-remove-selector-classes barneygale May 29, 2023 https://github.com/python/cpython/pull/104512/commits/6da6a83f779126fdfa20072097b736feb7c6d50a
73ed81d Merge branch 'main' into gh-102613-remove-selector-classes barneygale May 30, 2023 https://github.com/python/cpython/pull/104512/commits/73ed81d2fa8aac0950218ea124772e81bbf6be05
4005619 Add support for matching files recursively. barneygale May 31, 2023 https://github.com/python/cpython/pull/104512/commits/40056194ebaac26f30c917bf32b34eacfee99d4a
9401d36 Implement walk-and-match algorithm. barneygale May 31, 2023 https://github.com/python/cpython/pull/104512/commits/9401d36e63ae1741fd3b03d0c40e8b92995d4d88
b217587 Fix up docs, news blurb. barneygale May 31, 2023 https://github.com/python/cpython/pull/104512/commits/b217587651fd53d2783ca918979f2165377ffb07
de7e857 Fix comment barneygale May 31, 2023 https://github.com/python/cpython/pull/104512/commits/de7e8570ae501759d7dabe7b3c3b0e4dd6447759
d1023c7 Fix handling of newlines in filenames. barneygale May 31, 2023 https://github.com/python/cpython/pull/104512/commits/d1023c7c324d43e3a655b7748f76d9bd23680a17
ad33eec Speed up recursive selection barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/ad33eece664095b3b990687d29da101a04499bfe
064efdb Exclude `self` from walk-and-match matching. barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/064efdb452f1e0db040d1bda2b2faa9ef57df8c2
14c6a58 Optimize walk-and-match logic. barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/14c6a587a22b2492438d1b5dbe90189bd8ee0c24
04720bd Consume adjacent '**' segments before considering use of matching. barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/04720bdfb8e0df75991e0bfeb8da19bb8840eb62
9c6b44f Add some more tests for complex patterns. barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/9c6b44f54e56aa07edd1b03c29cc5c4e1de0681a
4cfb836 Drop test case that doesn't work on Windows. barneygale Jun 1, 2023 https://github.com/python/cpython/pull/104512/commits/4cfb83670d43ed62956c0c3b6c0473803a25993d
Clear filters https://github.com/python/cpython/pull/104512/files
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
pathlib.rst https://github.com/python/cpython/pull/104512/files#diff-1134e36a94ecfde1df43bee5efd285de5d67426fbca086425201ecc753f9139c
pathlib.py https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
test_pathlib.py https://github.com/python/cpython/pull/104512/files#diff-3dd97d2dc8816848d0b0c442e8fdeec9650b3de77935289a93f37ac6396ee17f
2023-05-15-18-57-42.gh-issue-102613.YD9yx-.rst https://github.com/python/cpython/pull/104512/files#diff-2c5dc3ec6b96f80cd5734406d27a7ec5044b37490a623f574b3606876b8b11bc
https://github.com/python/cpython/blob/main/.github/CODEOWNERS#L525
Doc/library/pathlib.rsthttps://github.com/python/cpython/pull/104512/files#diff-1134e36a94ecfde1df43bee5efd285de5d67426fbca086425201ecc753f9139c
View file https://github.com/barneygale/cpython/blob/4cfb83670d43ed62956c0c3b6c0473803a25993d/Doc/library/pathlib.rst
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/python/cpython/pull/104512/{{ revealButtonHref }}
https://github.com/python/cpython/pull/104512/files#diff-1134e36a94ecfde1df43bee5efd285de5d67426fbca086425201ecc753f9139c
https://github.com/python/cpython/pull/104512/files#diff-1134e36a94ecfde1df43bee5efd285de5d67426fbca086425201ecc753f9139c
https://github.com/python/cpython/pull/104512/files#diff-1134e36a94ecfde1df43bee5efd285de5d67426fbca086425201ecc753f9139c
Lib/pathlib.pyhttps://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
View file https://github.com/barneygale/cpython/blob/4cfb83670d43ed62956c0c3b6c0473803a25993d/Lib/pathlib.py
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/python/cpython/pull/104512/{{ revealButtonHref }}
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
https://github.com/python/cpython/pull/104512/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c
Lib/test/test_pathlib.pyhttps://github.com/python/cpython/pull/104512/files#diff-3dd97d2dc8816848d0b0c442e8fdeec9650b3de77935289a93f37ac6396ee17f
View file https://github.com/barneygale/cpython/blob/4cfb83670d43ed62956c0c3b6c0473803a25993d/Lib/test/test_pathlib.py
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/python/cpython/pull/104512/{{ revealButtonHref }}
https://github.com/python/cpython/pull/104512/files#diff-3dd97d2dc8816848d0b0c442e8fdeec9650b3de77935289a93f37ac6396ee17f
https://github.com/python/cpython/pull/104512/files#diff-3dd97d2dc8816848d0b0c442e8fdeec9650b3de77935289a93f37ac6396ee17f
https://github.com/python/cpython/pull/104512/files#diff-3dd97d2dc8816848d0b0c442e8fdeec9650b3de77935289a93f37ac6396ee17f
Please reload this pagehttps://github.com/python/cpython/pull/104512/files
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.