Title: Use double caching for re._compile() · Issue #96346 · python/cpython · GitHub
Open Graph Title: Use double caching for re._compile() · Issue #96346 · python/cpython
X Title: Use double caching for re._compile() · Issue #96346 · python/cpython
Description: The caching algorithm for re._compile() is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes): #76519 #72480 #66700 #60593 #57436 d9e8cc6 #53642 5a63183 The patch for using lru_cache() was repeate...
Open Graph Description: The caching algorithm for re._compile() is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes): #76519 #72480 #66700 #60593 #57436 d9e8cc6 #53642 5a63183 The...
X Description: The caching algorithm for re._compile() is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes): #76519 #72480 #66700 #60593 #57436 d9e8cc6 #53642 5a63183 The...
Opengraph URL: https://github.com/python/cpython/issues/96346
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Use double caching for re._compile()","articleBody":"The caching algorithm for `re._compile()` is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes):\r\n* https://github.com/python/cpython/issues/76519\r\n* https://github.com/python/cpython/issues/72480\r\n* https://github.com/python/cpython/issues/66700\r\n* https://github.com/python/cpython/issues/60593\r\n* https://github.com/python/cpython/issues/57436\r\n* https://github.com/python/cpython/commit/d9e8cc6249c7ff4ceeff3217a7671bee623d88a7\r\n* https://github.com/python/cpython/issues/53642\r\n* https://github.com/python/cpython/commit/5a63183a8b8a9e177f97feac975850df5e6f98aa\r\n\r\nThe patch for using `lru_cache()` was repeatedly applied and reverted 3 times! Eventually it turned out to be slower than a simple dict lookup. It does not fit very well in this case, because some results should be not cached, and additional checks should be added before calling the cached function, adding significant overhead.\r\n\r\nBut the LRU caching algorithm can have advantage over the current simple algorithm if not all compiler patterns fit in the cache. It removes entries from the cache more smarty.\r\n\r\nI tested with random keys with exponential distribution. With the cache size 512, the largest difference is for lambda between 1/70 and 1/80. The LRU caching algorithm has 3 times less misses: 0.16% to 0.33% against 0.5 to 1.1% misses in the current algorithm. For significantly larger (almost no misses) or smaller (tens percent of misses) lambda both algorithms gives almost the same result.\r\n\r\nDirect implementation of the LRU caching algorithm using OrderedDict or just dict would be slower, because every hit requires moving the found entry to the end. But we can use double caching. The primary smaller and faster cache does not reorder entries. But if the key is not found in the primary cache, we look it up in the secondary LRU cache. This algorithm has the same number of misses as the LRU caching algorithm, but it has faster hits.","author":{"url":"https://github.com/serhiy-storchaka","@type":"Person","name":"serhiy-storchaka"},"datePublished":"2022-08-27T20:14:29.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":3},"url":"https://github.com/96346/cpython/issues/96346"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:769c6a4f-8c60-2383-77d6-8fcd719a4155 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A508:14A586:26B5F70:3605613:69694CAC |
| html-safe-nonce | dd0a92d459288732376ffc9f3cb3c3bcd1102eed58d46cfce6704e071359f60c |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBNTA4OjE0QTU4NjoyNkI1RjcwOjM2MDU2MTM6Njk2OTRDQUMiLCJ2aXNpdG9yX2lkIjoiNTQyMTAxMjY4ODAzODM0OTk5NiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 309113c71291f15963a4544e278d7b9185577cf7991883ad86e4f27804682832 |
| hovercard-subject-tag | issue:1353134810 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/96346/issue_layout |
| twitter:image | https://opengraph.githubassets.com/9f9bd8b5642f8b3a1a2c6c160f8e61e8419c6f3db122ba264a81b88da9f9f12e/python/cpython/issues/96346 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/9f9bd8b5642f8b3a1a2c6c160f8e61e8419c6f3db122ba264a81b88da9f9f12e/python/cpython/issues/96346 |
| og:image:alt | The caching algorithm for re._compile() is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes): #76519 #72480 #66700 #60593 #57436 d9e8cc6 #53642 5a63183 The... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | serhiy-storchaka |
| hostname | github.com |
| expected-hostname | github.com |
| None | 54182691a21263b584d2e600b758e081b0ff1d10ffc0d2eefa51cf754b43b51d |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | d69ac0477df0f87da03b8b06cebd187012d7a930 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width