Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython · GitHub
Open Graph Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython
X Title: Improve speed of `textwrap.dedent` · Issue #131791 · python/cpython
Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text): """Remove any common leading whitespac...
Open Graph Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text)...
X Description: Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) ...
Opengraph URL: https://github.com/python/cpython/issues/131791
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Improve speed of `textwrap.dedent`","articleBody":"# Feature or enhancement\n\n### Proposal:\n\nCurrent code:\n\n```python\n_whitespace_only_re = re.compile('^[ \\t]+$', re.MULTILINE)\n_leading_whitespace_re = re.compile('(^[ \\t]*)(?:[^ \\t\\n])', re.MULTILINE)\n\ndef dedent(text):\n \"\"\"Remove any common leading whitespace from every line in `text`.\n\n This can be used to make triple-quoted strings line up with the left\n edge of the display, while still presenting them in the source code\n in indented form.\n\n Note that tabs and spaces are both treated as whitespace, but they\n are not equal: the lines \" hello\" and \"\\\\thello\" are\n considered to have no common leading whitespace.\n\n Entirely blank lines are normalized to a newline character.\n \"\"\"\n # Look for the longest leading string of spaces and tabs common to\n # all lines.\n margin = None\n text = _whitespace_only_re.sub('', text)\n indents = _leading_whitespace_re.findall(text)\n for indent in indents:\n if margin is None:\n margin = indent\n\n # Current line more deeply indented than previous winner:\n # no change (previous winner is still on top).\n elif indent.startswith(margin):\n pass\n\n # Current line consistent with and no deeper than previous winner:\n # it's the new winner.\n elif margin.startswith(indent):\n margin = indent\n\n # Find the largest common whitespace between current line and previous\n # winner.\n else:\n for i, (x, y) in enumerate(zip(margin, indent)):\n if x != y:\n margin = margin[:i]\n break\n\n # sanity check (testing/debugging only)\n if 0 and margin:\n for line in text.split(\"\\n\"):\n assert not line or line.startswith(margin), \\\n \"line = %r, margin = %r\" % (line, margin)\n\n if margin:\n text = re.sub(r'(?m)^' + margin, '', text)\n return text\n```\n\nCan speed up the process for large files up to 4x the speed:\n\n```python\ndef dedent_faster(text: str, only_whitespace: bool = True) -\u003e str:\n \"\"\"Remove any common leading whitespace from every line in `text`.\n\n This can be used to make triple-quoted strings line up with the left\n edge of the display, while still presenting them in the source code\n in indented form.\n\n Note that tabs and spaces are both treated as whitespace, but they\n are not equal: the lines \" hello\" and \"\\\\thello\" are\n considered to have no common leading whitespace.\n \n If `only_whitespace` is `True`, the leading whitespaces are removed from the text. Otherwise, all the common leading text is removed.\n\n Entirely blank lines are normalized to a newline character.\n \"\"\"\n # Early return for empty input\n if not text:\n return text\n\n # Split into lines\n lines = text.splitlines(True)\n\n # Fast path for single line - but make sure we still dedent!\n if len(lines) == 1:\n line = lines[0]\n stripped = line.strip()\n if not stripped: # Blank line\n return \"\\n\" if line.endswith(\"\\n\") else \"\"\n\n # Find leading whitespace for a single line\n if only_whitespace:\n i = 0\n while i \u003c len(line) and line[i] in \" \\t\":\n i += 1\n if i \u003e 0: # Has leading whitespace to remove\n return line[i:]\n else:\n lead_size = len(line) - len(line.lstrip())\n if lead_size \u003e 0: # Has leading whitespace to remove\n return line[lead_size:]\n return line # No whitespace to remove\n\n # Cache method lookups for faster access\n _strip = str.strip\n _startswith = str.startswith\n _endswith = str.endswith\n\n # Find first two non-blank lines\n non_blank = []\n for line in lines:\n if _strip(line):\n non_blank.append(line)\n if len(non_blank) == 2:\n break\n\n # All lines are blank\n if not non_blank:\n result = []\n append = result.append\n for line in lines:\n append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n return \"\".join(result)\n\n # Calculate margin length efficiently\n if len(non_blank) == 1:\n # Single non-blank line\n line = non_blank[0]\n if only_whitespace:\n # Manually find leading whitespace (faster than regex)\n i = 0\n line_len = len(line)\n while i \u003c line_len and line[i] in \" \\t\":\n i += 1\n margin_len = i\n else:\n # Use built-in lstrip for non-whitespace case\n margin_len = len(line) - len(line.lstrip())\n else:\n # Find common prefix of first two non-blank lines\n a, b = non_blank\n min_len = min(len(a), len(b))\n i = 0\n\n if only_whitespace:\n # Manual loop is faster than character-by-character comparison\n while i \u003c min_len and a[i] == b[i] and a[i] in \" \\t\":\n i += 1\n else:\n while i \u003c min_len and a[i] == b[i]:\n i += 1\n\n margin_len = i\n\n # No margin to remove - return original with blank line normalization\n if margin_len == 0:\n result = []\n append = result.append\n for line in lines:\n if _strip(line): # Non-blank line\n append(line)\n else: # Blank line\n append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n return \"\".join(result)\n\n # Get margin string once for repeated comparison\n margin = non_blank[0][:margin_len]\n\n # Pre-allocate result list with a size hint for better memory efficiency\n result = []\n append = result.append\n\n # Process all lines with optimized operations\n for line in lines:\n if not _strip(line): # Blank line (including whitespace-only lines)\n append(\"\\n\" if _endswith(line, \"\\n\") else \"\")\n elif _startswith(line, margin): # Has margin\n # Slice operation is very fast in Python\n append(line[margin_len:])\n else: # No matching margin\n append(line)\n\n # Single join is faster than incremental string building\n return \"\".join(result)\n```\n\nwhich has the following speed outputs:\n\n```python\nif __name__ == '__main__':\n with open(\"../Objects/unicodeobject.c\") as f:\n raw_text = f.read()\n\n\n tests = []\n test_names = []\n test_names = ['', \" \", \"\\t\", \"abc \\t\", ' \\t abc', ' \\t abc \\t ']\n\n index = 0\n\n temp = dedent(raw_text)\n\n for indent_v in test_names:\n text = indent(raw_text, indent_v)\n\n tests.append(text)\n\n\n output = dedent_faster(text, only_whitespace=False)\n\n print(\"Validating large text with not only whitespace indentation:\", indent_v.encode(\"ascii\"), output == temp)\n\n\n # Basic indented text with empty lines\n text = \"\"\"\n def hello():\n print(\"Hello, world!\")\n\n\n \"\"\"\n\n tests.append(text)\n test_names.append(\"Basic indented text with empty lines\")\n\n # Text with mixed indentation and blank lines\n text = \"\"\"\n This is a test.\n\n Another line.\n\n \"\"\"\n\n tests.append(text)\n test_names.append(\"Text with mixed indentation and blank lines\")\n\n # No indentation (edge case)\n text = \"\"\"No indents here.\n Just normal text.\n\n With empty lines.\"\"\"\n\n tests.append(text)\n test_names.append(\"No indentation (edge case)\")\n\n # Only blank lines (should preserve them)\n text = \"\"\"\n\n\n \"\"\"\n\n tests.append(text)\n test_names.append(\"Only blank lines\")\n\n # Edge case: No common prefix to remove\n text = \"\"\"hello\n world\n \"\"\"\n\n tests.append(text)\n test_names.append(\"Edge case: No common prefix to remove\")\n\n # Edge case: Single indented line\n text = \"\"\" Only one indented line\"\"\"\n\n tests.append(text)\n test_names.append(\"Edge case: Single indented line\")\n\n # Edge case: Single indented line\n text = \"\"\" \"\"\"\n\n tests.append(text)\n test_names.append(\"Edge case: Single indented line only\")\n\n # Edge case: Single indented line\n text = \"\"\"\"\"\"\n\n tests.append(text)\n test_names.append(\"Edge case: Empty text\")\n\n for text, name in zip(tests, test_names):\n print(f\"========= Case: {name.encode('ascii')} =========\")\n\n a = dedent(text)\n\n for func in [dedent_faster, dedent]:\n single_test = func(text)\n\n print(func.__name__, a == single_test)\n\n it = timeit.Timer(lambda: func(text))\n result = it.repeat(number=10_000)\n result.sort()\n print(f\"{func.__name__} Min: {result[0]:.4f}msec\")\n```\n\nReturning the following:\n\n```\nValidating large text with not only whitespace indentation: b'' True\nValidating large text with not only whitespace indentation: b' ' True\nValidating large text with not only whitespace indentation: b'\\t' True\nValidating large text with not only whitespace indentation: b'abc \\t' True\nValidating large text with not only whitespace indentation: b' \\t abc' True\nValidating large text with not only whitespace indentation: b' \\t abc \\t ' True\n========= Case: b'' =========\ndedent_faster True\ndedent_faster Min: 1.5848msec\ndedent True\ndedent Min: 6.6143msec\n========= Case: b' ' =========\ndedent_faster True\ndedent_faster Min: 2.5275msec\ndedent True\ndedent Min: 10.6884msec\n========= Case: b'\\t' =========\ndedent_faster True\ndedent_faster Min: 2.3215msec\ndedent True\ndedent Min: 9.9588msec\n========= Case: b'abc \\t' =========\ndedent_faster True\ndedent_faster Min: 1.5026msec\ndedent True\ndedent Min: 6.7075msec\n========= Case: b' \\t abc' =========\ndedent_faster True\ndedent_faster Min: 2.5997msec\ndedent True\ndedent Min: 10.6693msec\n========= Case: b' \\t abc \\t ' =========\ndedent_faster True\ndedent_faster Min: 2.6204msec\ndedent True\ndedent Min: 11.7285msec\n========= Case: b'Basic indented text with empty lines' =========\ndedent_faster True\ndedent_faster Min: 0.0016msec\ndedent True\ndedent Min: 0.0022msec\n========= Case: b'Text with mixed indentation and blank lines' =========\ndedent_faster True\ndedent_faster Min: 0.0016msec\ndedent True\ndedent Min: 0.0020msec\n========= Case: b'No indentation (edge case)' =========\ndedent_faster True\ndedent_faster Min: 0.0008msec\ndedent True\ndedent Min: 0.0013msec\n========= Case: b'Only blank lines' =========\ndedent_faster True\ndedent_faster Min: 0.0006msec\ndedent True\ndedent Min: 0.0005msec\n========= Case: b'Edge case: No common prefix to remove' =========\ndedent_faster True\ndedent_faster Min: 0.0007msec\ndedent True\ndedent Min: 0.0008msec\n========= Case: b'Edge case: Single indented line' =========\ndedent_faster True\ndedent_faster Min: 0.0004msec\ndedent True\ndedent Min: 0.0013msec\n========= Case: b'Edge case: Single indented line only' =========\ndedent_faster True\ndedent_faster Min: 0.0002msec\ndedent True\ndedent Min: 0.0003msec\n========= Case: b'Edge case: Empty text' =========\ndedent_faster True\ndedent_faster Min: 0.0001msec\ndedent True\ndedent Min: 0.0002msec\n```\n\n\nWhich thus returns a 4x performance boost for large files. Another advantage is this this method allows for people to just remove all common prefixes to the file as an option instead of just whitespaces.\n\n( This was optimized iteratively using Claude + ChatGPT models )\n\n### Has this already been discussed elsewhere?\n\nNo response given\n\n### Links to previous discussion of this feature:\n\n_No response_\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-131792\n* gh-131919\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/Marius-Juston","@type":"Person","name":"Marius-Juston"},"datePublished":"2025-03-27T09:13:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":19},"url":"https://github.com/131791/cpython/issues/131791"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:6d6bdd33-1bcc-968e-145e-98da5d68fb93 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | E0EE:275F42:275809D:353C3C6:696ADD30 |
| html-safe-nonce | ac4b16f29bd81d32d2b6af02931789c3af57f3be17d87158e492a80a56529405 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFMEVFOjI3NUY0MjoyNzU4MDlEOjM1M0MzQzY6Njk2QUREMzAiLCJ2aXNpdG9yX2lkIjoiMzY1ODcyOTc5NjIyOTg0ODM2OCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | ab1fe20ddfc4b63c0daa8fe3ed71f2d569f87ab796dbfa4eabb1ddd2f3cccc0f |
| hovercard-subject-tag | issue:2952185138 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/131791/issue_layout |
| twitter:image | https://opengraph.githubassets.com/a09717b06fadeab70a601ead5aa7c2cf7a679ae3e41a9d370071d8517a36112a/python/cpython/issues/131791 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/a09717b06fadeab70a601ead5aa7c2cf7a679ae3e41a9d370071d8517a36112a/python/cpython/issues/131791 |
| og:image:alt | Feature or enhancement Proposal: Current code: _whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) _leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) def dedent(text)... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | Marius-Juston |
| hostname | github.com |
| expected-hostname | github.com |
| None | 5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 524a93f2c1f36522a3b4be4c110467ee4172245d |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width