René's URL Explorer Experiment


Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython · GitHub

Open Graph Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython

X Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython

Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text): """Remove any common leading whitespac...

Open Graph Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text)...

X Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) ...

Opengraph URL: https://github.com/python/cpython/issues/131791

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Improve speed of `textwrap.dedent`","articleBody":"# Feature or enhancement\n\n### Proposal:\n\nCurrent code:\n\n```python\n_whitespace_only_re = re.compile('^[ \\t]+$', re.MULTILINE)\n_leading_whitespace_re = re.compile('(^[ \\t]*)(?:[^ \\t\\n])', re.MULTILINE)\n\ndef dedent(text):\n    \"\"\"Remove any common leading whitespace from every line in `text`.\n\n    This can be used to make triple-quoted strings line up with the left\n    edge of the display, while still presenting them in the source code\n    in indented form.\n\n    Note that tabs and spaces are both treated as whitespace, but they\n    are not equal: the lines \"  hello\" and \"\\\\thello\" are\n    considered to have no common leading whitespace.\n\n    Entirely blank lines are normalized to a newline character.\n    \"\"\"\n    # Look for the longest leading string of spaces and tabs common to\n    # all lines.\n    margin = None\n    text = _whitespace_only_re.sub('', text)\n    indents = _leading_whitespace_re.findall(text)\n    for indent in indents:\n        if margin is None:\n            margin = indent\n\n        # Current line more deeply indented than previous winner:\n        # no change (previous winner is still on top).\n        elif indent.startswith(margin):\n            pass\n\n        # Current line consistent with and no deeper than previous winner:\n        # it's the new winner.\n        elif margin.startswith(indent):\n            margin = indent\n\n        # Find the largest common whitespace between current line and previous\n        # winner.\n        else:\n            for i, (x, y) in enumerate(zip(margin, indent)):\n                if x != y:\n                    margin = margin[:i]\n                    break\n\n    # sanity check (testing/debugging only)\n    if 0 and margin:\n        for line in text.split(\"\\n\"):\n            assert not line or line.startswith(margin), \\\n                   \"line = %r, margin = %r\" % (line, margin)\n\n    if margin:\n        text = re.sub(r'(?m)^' + margin, '', text)\n    return text\n```\n\nCan speed up the process for large files up to 4x the speed:\n\n```python\ndef dedent_faster(text: str, only_whitespace: bool = True) -\u003e str:\n    \"\"\"Remove any common leading whitespace from every line in `text`.\n\n    This can be used to make triple-quoted strings line up with the left\n    edge of the display, while still presenting them in the source code\n    in indented form.\n\n    Note that tabs and spaces are both treated as whitespace, but they\n    are not equal: the lines \"  hello\" and \"\\\\thello\" are\n    considered to have no common leading whitespace.\n    \n    If `only_whitespace` is `True`, the leading whitespaces are removed from the text. Otherwise, all the common leading text is removed.\n\n    Entirely blank lines are normalized to a newline character.\n    \"\"\"\n    # Early return for empty input\n    if not text:\n        return text\n\n    # Split into lines\n    lines = text.splitlines(True)\n\n    # Fast path for single line - but make sure we still dedent!\n    if len(lines) == 1:\n        line = lines[0]\n        stripped = line.strip()\n        if not stripped:  # Blank line\n            return \"\\n\" if line.endswith(\"\\n\") else \"\"\n\n        # Find leading whitespace for a single line\n        if only_whitespace:\n            i = 0\n            while i \u003c len(line) and line[i] in \" \\t\":\n                i += 1\n            if i \u003e 0:  # Has leading whitespace to remove\n                return line[i:]\n        else:\n            lead_size = len(line) - len(line.lstrip())\n            if lead_size \u003e 0:  # Has leading whitespace to remove\n                return line[lead_size:]\n        return line  # No whitespace to remove\n\n    # Cache method lookups for faster access\n    _strip = str.strip\n    _startswith = str.startswith\n    _endswith = str.endswith\n\n    # Find first two non-blank lines\n    non_blank = []\n    for line in lines:\n        if _strip(line):\n            non_blank.append(line)\n            if len(non_blank) == 2:\n                break\n\n    # All lines are blank\n    if not non_blank:\n        result = []\n        append = result.append\n        for line in lines:\n            append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n        return \"\".join(result)\n\n    # Calculate margin length efficiently\n    if len(non_blank) == 1:\n        # Single non-blank line\n        line = non_blank[0]\n        if only_whitespace:\n            # Manually find leading whitespace (faster than regex)\n            i = 0\n            line_len = len(line)\n            while i \u003c line_len and line[i] in \" \\t\":\n                i += 1\n            margin_len = i\n        else:\n            # Use built-in lstrip for non-whitespace case\n            margin_len = len(line) - len(line.lstrip())\n    else:\n        # Find common prefix of first two non-blank lines\n        a, b = non_blank\n        min_len = min(len(a), len(b))\n        i = 0\n\n        if only_whitespace:\n            # Manual loop is faster than character-by-character comparison\n            while i \u003c min_len and a[i] == b[i] and a[i] in \" \\t\":\n                i += 1\n        else:\n            while i \u003c min_len and a[i] == b[i]:\n                i += 1\n\n        margin_len = i\n\n    # No margin to remove - return original with blank line normalization\n    if margin_len == 0:\n        result = []\n        append = result.append\n        for line in lines:\n            if _strip(line):  # Non-blank line\n                append(line)\n            else:  # Blank line\n                append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n        return \"\".join(result)\n\n    # Get margin string once for repeated comparison\n    margin = non_blank[0][:margin_len]\n\n    # Pre-allocate result list with a size hint for better memory efficiency\n    result = []\n    append = result.append\n\n    # Process all lines with optimized operations\n    for line in lines:\n        if not _strip(line):  # Blank line (including whitespace-only lines)\n            append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n        elif _startswith(line, margin):  # Has margin\n            # Slice operation is very fast in Python\n            append(line[margin_len:])\n        else:  # No matching margin\n            append(line)\n\n    # Single join is faster than incremental string building\n    return \"\".join(result)\n```\n\nwhich has the following speed outputs:\n\n```python\nif __name__ == '__main__':\n    with open(\"../Objects/unicodeobject.c\") as f:\n        raw_text = f.read()\n\n\n    tests = []\n    test_names = []\n    test_names = ['', \"    \", \"\\t\", \"abc  \\t\", ' \\t  abc', ' \\t  abc  \\t  ']\n\n    index = 0\n\n    temp = dedent(raw_text)\n\n    for indent_v in test_names:\n        text = indent(raw_text, indent_v)\n\n        tests.append(text)\n\n\n        output = dedent_faster(text, only_whitespace=False)\n\n        print(\"Validating large text with not only whitespace indentation:\", indent_v.encode(\"ascii\"), output == temp)\n\n\n    # Basic indented text with empty lines\n    text = \"\"\"\n        def hello():\n            print(\"Hello, world!\")\n\n\n        \"\"\"\n\n    tests.append(text)\n    test_names.append(\"Basic indented text with empty lines\")\n\n    # Text with mixed indentation and blank lines\n    text = \"\"\"\n        This is a test.\n\n        Another line.\n\n    \"\"\"\n\n    tests.append(text)\n    test_names.append(\"Text with mixed indentation and blank lines\")\n\n    # No indentation (edge case)\n    text = \"\"\"No indents here.\n    Just normal text.\n\n    With empty lines.\"\"\"\n\n    tests.append(text)\n    test_names.append(\"No indentation (edge case)\")\n\n    # Only blank lines (should preserve them)\n    text = \"\"\"\n\n\n    \"\"\"\n\n    tests.append(text)\n    test_names.append(\"Only blank lines\")\n\n    # Edge case: No common prefix to remove\n    text = \"\"\"hello\n        world\n    \"\"\"\n\n    tests.append(text)\n    test_names.append(\"Edge case: No common prefix to remove\")\n\n    # Edge case: Single indented line\n    text = \"\"\"    Only one indented line\"\"\"\n\n    tests.append(text)\n    test_names.append(\"Edge case: Single indented line\")\n\n    # Edge case: Single indented line\n    text = \"\"\"    \"\"\"\n\n    tests.append(text)\n    test_names.append(\"Edge case: Single indented line only\")\n\n    # Edge case: Single indented line\n    text = \"\"\"\"\"\"\n\n    tests.append(text)\n    test_names.append(\"Edge case: Empty text\")\n\n    for text, name in zip(tests, test_names):\n        print(f\"========= Case: {name.encode('ascii')} =========\")\n\n        a = dedent(text)\n\n        for func in [dedent_faster, dedent]:\n            single_test = func(text)\n\n            print(func.__name__, a == single_test)\n\n            it = timeit.Timer(lambda: func(text))\n            result = it.repeat(number=10_000)\n            result.sort()\n            print(f\"{func.__name__} Min: {result[0]:.4f}msec\")\n```\n\nReturning the following:\n\n```\nValidating large text with not only whitespace indentation: b'' True\nValidating large text with not only whitespace indentation: b'    ' True\nValidating large text with not only whitespace indentation: b'\\t' True\nValidating large text with not only whitespace indentation: b'abc  \\t' True\nValidating large text with not only whitespace indentation: b' \\t  abc' True\nValidating large text with not only whitespace indentation: b' \\t  abc  \\t  ' True\n========= Case: b'' =========\ndedent_faster True\ndedent_faster Min: 1.5848msec\ndedent True\ndedent Min: 6.6143msec\n========= Case: b'    ' =========\ndedent_faster True\ndedent_faster Min: 2.5275msec\ndedent True\ndedent Min: 10.6884msec\n========= Case: b'\\t' =========\ndedent_faster True\ndedent_faster Min: 2.3215msec\ndedent True\ndedent Min: 9.9588msec\n========= Case: b'abc  \\t' =========\ndedent_faster True\ndedent_faster Min: 1.5026msec\ndedent True\ndedent Min: 6.7075msec\n========= Case: b' \\t  abc' =========\ndedent_faster True\ndedent_faster Min: 2.5997msec\ndedent True\ndedent Min: 10.6693msec\n========= Case: b' \\t  abc  \\t  ' =========\ndedent_faster True\ndedent_faster Min: 2.6204msec\ndedent True\ndedent Min: 11.7285msec\n========= Case: b'Basic indented text with empty lines' =========\ndedent_faster True\ndedent_faster Min: 0.0016msec\ndedent True\ndedent Min: 0.0022msec\n========= Case: b'Text with mixed indentation and blank lines' =========\ndedent_faster True\ndedent_faster Min: 0.0016msec\ndedent True\ndedent Min: 0.0020msec\n========= Case: b'No indentation (edge case)' =========\ndedent_faster True\ndedent_faster Min: 0.0008msec\ndedent True\ndedent Min: 0.0013msec\n========= Case: b'Only blank lines' =========\ndedent_faster True\ndedent_faster Min: 0.0006msec\ndedent True\ndedent Min: 0.0005msec\n========= Case: b'Edge case: No common prefix to remove' =========\ndedent_faster True\ndedent_faster Min: 0.0007msec\ndedent True\ndedent Min: 0.0008msec\n========= Case: b'Edge case: Single indented line' =========\ndedent_faster True\ndedent_faster Min: 0.0004msec\ndedent True\ndedent Min: 0.0013msec\n========= Case: b'Edge case: Single indented line only' =========\ndedent_faster True\ndedent_faster Min: 0.0002msec\ndedent True\ndedent Min: 0.0003msec\n========= Case: b'Edge case: Empty text' =========\ndedent_faster True\ndedent_faster Min: 0.0001msec\ndedent True\ndedent Min: 0.0002msec\n```\n\n\nWhich thus returns a 4x performance boost for large files. Another advantage is this this method allows for people to just remove all common prefixes to the file as an option instead of just whitespaces.\n\n( This was optimized iteratively using Claude + ChatGPT models )\n\n### Has this already been discussed elsewhere?\n\nNo response given\n\n### Links to previous discussion of this feature:\n\n_No response_\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-131792\n* gh-131919\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/Marius-Juston","@type":"Person","name":"Marius-Juston"},"datePublished":"2025-03-27T09:13:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":19},"url":"https://github.com/131791/cpython/issues/131791"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:6d6bdd33-1bcc-968e-145e-98da5d68fb93
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idE0EE:275F42:275809D:353C3C6:696ADD30
html-safe-nonceac4b16f29bd81d32d2b6af02931789c3af57f3be17d87158e492a80a56529405
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFMEVFOjI3NUY0MjoyNzU4MDlEOjM1M0MzQzY6Njk2QUREMzAiLCJ2aXNpdG9yX2lkIjoiMzY1ODcyOTc5NjIyOTg0ODM2OCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmacab1fe20ddfc4b63c0daa8fe3ed71f2d569f87ab796dbfa4eabb1ddd2f3cccc0f
hovercard-subject-tagissue:2952185138
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/131791/issue_layout
twitter:imagehttps://opengraph.githubassets.com/a09717b06fadeab70a601ead5aa7c2cf7a679ae3e41a9d370071d8517a36112a/python/cpython/issues/131791
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/a09717b06fadeab70a601ead5aa7c2cf7a679ae3e41a9d370071d8517a36112a/python/cpython/issues/131791
og:image:altFeature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text)...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernameMarius-Juston
hostnamegithub.com
expected-hostnamegithub.com
None5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d
turbo-cache-controlno-preview
go-importgithub.com/python/cpython git https://github.com/python/cpython.git
octolytics-dimension-user_id1525981
octolytics-dimension-user_loginpython
octolytics-dimension-repository_id81598961
octolytics-dimension-repository_nwopython/cpython
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id81598961
octolytics-dimension-repository_network_root_nwopython/cpython
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release524a93f2c1f36522a3b4be4c110467ee4172245d
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/python/cpython/issues/131791#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F131791
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F131791
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=python%2Fcpython
Reloadhttps://github.com/python/cpython/issues/131791
Reloadhttps://github.com/python/cpython/issues/131791
Reloadhttps://github.com/python/cpython/issues/131791
python https://github.com/python
cpythonhttps://github.com/python/cpython
Please reload this pagehttps://github.com/python/cpython/issues/131791
Notifications https://github.com/login?return_to=%2Fpython%2Fcpython
Fork 33.9k https://github.com/login?return_to=%2Fpython%2Fcpython
Star 71.1k https://github.com/login?return_to=%2Fpython%2Fcpython
Code https://github.com/python/cpython
Issues 5k+ https://github.com/python/cpython/issues
Pull requests 2.1k https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects 31 https://github.com/python/cpython/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/python/cpython/security
Please reload this pagehttps://github.com/python/cpython/issues/131791
Insights https://github.com/python/cpython/pulse
Code https://github.com/python/cpython
Issues https://github.com/python/cpython/issues
Pull requests https://github.com/python/cpython/pulls
Actions https://github.com/python/cpython/actions
Projects https://github.com/python/cpython/projects
Security https://github.com/python/cpython/security
Insights https://github.com/python/cpython/pulse
New issuehttps://github.com/login?return_to=https://github.com/python/cpython/issues/131791
New issuehttps://github.com/login?return_to=https://github.com/python/cpython/issues/131791
Improve speed of textwrap.dedenthttps://github.com/python/cpython/issues/131791#top
performancePerformance or resource usagehttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22performance%22
stdlibStandard Library Python modules in the Lib/ directoryhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22stdlib%22
type-featureA feature request or enhancementhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22type-feature%22
https://github.com/Marius-Juston
https://github.com/Marius-Juston
Marius-Justonhttps://github.com/Marius-Juston
on Mar 27, 2025https://github.com/python/cpython/issues/131791#issue-2952185138
gh-131791: Improve speed of textwrap.dedent by replacing re #131792https://github.com/python/cpython/pull/131792
gh-130167: Optimise textwrap.dedent() #131919https://github.com/python/cpython/pull/131919
performancePerformance or resource usagehttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22performance%22
stdlibStandard Library Python modules in the Lib/ directoryhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22stdlib%22
type-featureA feature request or enhancementhttps://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22type-feature%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.