Title: Complementary re patterns such as [\s\S] or [\w\W] are much slower than . with DOTALL · Issue #111259 · python/cpython · GitHub
Open Graph Title: Complementary re patterns such as [\s\S] or [\w\W] are much slower than . with DOTALL · Issue #111259 · python/cpython
X Title: Complementary re patterns such as [\s\S] or [\w\W] are much slower than . with DOTALL · Issue #111259 · python/cpython
Description: Bug report Bug description: import re from time import perf_counter as time p1 = re.compile(r"[\s\S]*") p2 = re.compile(".*", re.DOTALL) s = "a"*10000 for p in (p1,p2): t0 = time() for i in range(10000): _=p.match(s) print(time()-t0) Run...
Open Graph Description: Bug report Bug description: import re from time import perf_counter as time p1 = re.compile(r"[\s\S]*") p2 = re.compile(".*", re.DOTALL) s = "a"*10000 for p in (p1,p2): t0 = time() for i in range(1...
X Description: Bug report Bug description: import re from time import perf_counter as time p1 = re.compile(r"[\s\S]*") p2 = re.compile(".*", re.DOTALL) s = "a"*10000 for p in (p1,p2)...
Opengraph URL: https://github.com/python/cpython/issues/111259
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Complementary re patterns such as [\\s\\S] or [\\w\\W] are much slower than . with DOTALL ","articleBody":"# Bug report\n\n### Bug description:\n\n```python\nimport re\nfrom time import perf_counter as time\n\np1 = re.compile(r\"[\\s\\S]*\")\np2 = re.compile(\".*\", re.DOTALL)\n\ns = \"a\"*10000\nfor p in (p1,p2):\n t0 = time()\n for i in range(10000): _=p.match(s)\n print(time()-t0)\n```\nRuntimes are 0.44 s vs 0.0016 s on my system. Instead of simplification, the [\\s\\S] is stepped through one after another. \\s does not match so then \\S is checked (the order [\\S\\s] is twice as fast for the string here). This is not solely an issue for larger matches. A 40 char string is processed half as fast when using [\\s\\S]. Even 10 chars take about 25% longer to process. I'm not completely sure whether this qualifies as a bug or an issue with documentation. Other languages don't have the DOTALL option and always rely on the first option. Plenty of posts on SO and elsewhere will thus advocate using [\\s\\S] as an all-matching regex pattern. Unsuspecting Python programmers such as @barneygale may expect [\\s\\S] to be identical to using a dot with DOTALL as seen below.\n\n@serhiy-storchaka\n\nhttps://github.com/python/cpython/blob/9bb202a1a90ef0edce20c495c9426d9766df11bb/Lib/pathlib.py#L126-L133\n\n### CPython versions tested on:\n\n3.11, 3.13\n\n### Operating systems tested on:\n\nLinux, Windows\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-111303\n* gh-120742\n* gh-120745\n* gh-120813\n* gh-120814\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/pan324","@type":"Person","name":"pan324"},"datePublished":"2023-10-24T11:10:09.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":3},"url":"https://github.com/111259/cpython/issues/111259"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:11f31b3c-b33d-3df1-a77a-34146d415c1e |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 8584:35B8D6:2456C9B:31CB256:696AC57F |
| html-safe-nonce | 1dbd9566c85ee3769a55c5ba725571dcc13d6d4993e915cc03c0097dad378d8b |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4NTg0OjM1QjhENjoyNDU2QzlCOjMxQ0IyNTY6Njk2QUM1N0YiLCJ2aXNpdG9yX2lkIjoiNjI5NDc5MjEwOTAyNTgzODQ2MyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 17aa360ed5738c581464370f8d652f0ee4d862aa791e60b29e71a1e6306eb5da |
| hovercard-subject-tag | issue:1959017462 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/111259/issue_layout |
| twitter:image | https://opengraph.githubassets.com/5f953b9df835065fa7524568e290f09d768755b09d14614d2395e8330161f8f0/python/cpython/issues/111259 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/5f953b9df835065fa7524568e290f09d768755b09d14614d2395e8330161f8f0/python/cpython/issues/111259 |
| og:image:alt | Bug report Bug description: import re from time import perf_counter as time p1 = re.compile(r"[\s\S]*") p2 = re.compile(".*", re.DOTALL) s = "a"*10000 for p in (p1,p2): t0 = time() for i in range(1... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | pan324 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 986b6a1d774985095564e64d6963d11f094da3d0e2bfda2ab1a27d63662eb033 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 89ad2112b9c4e11df6a0c13c8c1f8eedd36b0977 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width