Title: Speed up `pathlib.Path.glob()` by removing redundant regex matching · Issue #115060 · python/cpython · GitHub
Open Graph Title: Speed up `pathlib.Path.glob()` by removing redundant regex matching · Issue #115060 · python/cpython
X Title: Speed up `pathlib.Path.glob()` by removing redundant regex matching · Issue #115060 · python/cpython
Description: In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use them to build a regex that is used to filt...
Open Graph Description: In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use th...
X Description: In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments ...
Opengraph URL: https://github.com/python/cpython/issues/115060
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Speed up `pathlib.Path.glob()` by removing redundant regex matching","articleBody":"In #104512 we made `pathlib.Path.glob()` use a \"walk-and-filter\" strategy for expanding `**` wildcards in patterns: when we encounter a `**` segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of `scandir()` calls.\r\n\r\nHowever! We actually build a regex for the _entire_ pattern given to `glob()`, rather than just the segments following `**` wildcards. And so when evaluating a pattern like `dir*/**/file*`, the `dir*` part is needlessly matched twice against each path. @zooba noted this in a [review comment](https://github.com/python/cpython/pull/104512#discussion_r1212825322) at the time.\r\n\r\nWe should be able to improve performance by building an `re.Pattern` only for segments following `**` wildcards, and not the entire `glob()` pattern.\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-115061\n* gh-116152\n* gh-117732\n* gh-117831\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/barneygale","@type":"Person","name":"barneygale"},"datePublished":"2024-02-06T03:53:20.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":4},"url":"https://github.com/115060/cpython/issues/115060"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:7a901e56-77e6-9e51-66cf-6ec924b0d5f2 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D1AA:36DBB6:20E71F:2E9691:696A1B64 |
| html-safe-nonce | f45d146a30e00ba62ac11c48345e6fd3aa06faea3b97702896cf4473496bfc7a |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEMUFBOjM2REJCNjoyMEU3MUY6MkU5NjkxOjY5NkExQjY0IiwidmlzaXRvcl9pZCI6IjM0NjU4MDIxMTU4NTYzNDE4NjAiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | a1abd5d300cac427acccba070c944eb944da198ca2534764b33505efa21325fe |
| hovercard-subject-tag | issue:2119906994 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/115060/issue_layout |
| twitter:image | https://opengraph.githubassets.com/d4121044a3507f7953df5e62eb455b08b39d756e4a24179566c571c2c15a0589/python/cpython/issues/115060 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/d4121044a3507f7953df5e62eb455b08b39d756e4a24179566c571c2c15a0589/python/cpython/issues/115060 |
| og:image:alt | In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use th... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | barneygale |
| hostname | github.com |
| expected-hostname | github.com |
| None | 014f3d193f36b7d393f88ca22d06fbacd370800b40a547c1ea67291e02dc8ea3 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | d515f6f09fa57a93bf90355cb894eb84ca4f458f |
| ui-target | canary-1 |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width