Title: Issue with Python 3.11 and dask[distributed] with high number of threads · Issue #116969 · python/cpython · GitHub
Open Graph Title: Issue with Python 3.11 and dask[distributed] with high number of threads · Issue #116969 · python/cpython
X Title: Issue with Python 3.11 and dask[distributed] with high number of threads · Issue #116969 · python/cpython
Description: Bug report Bug description: I have noticed that the dask benchmark in pyperformance hangs when running it with Python 3.11 with a "high" number of cores on the machine. I have seen issues with 191 and 384 cores. I started investigated th...
Open Graph Description: Bug report Bug description: I have noticed that the dask benchmark in pyperformance hangs when running it with Python 3.11 with a "high" number of cores on the machine. I have seen issues with 191 ...
X Description: Bug report Bug description: I have noticed that the dask benchmark in pyperformance hangs when running it with Python 3.11 with a "high" number of cores on the machine. I have seen issues...
Opengraph URL: https://github.com/python/cpython/issues/116969
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Issue with Python 3.11 and dask[distributed] with high number of threads","articleBody":"# Bug report\r\n\r\n### Bug description:\r\n\r\nI have noticed that the dask benchmark in pyperformance hangs when running it with Python 3.11 with a \"high\" number of cores on the machine. I have seen issues with 191 and 384 cores.\r\n\r\nI started investigated the problem and seen that the issue manifested itself on a machine with a high number of cores.\r\nThe benchmarks that hangs is https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_dask/run_benchmark.py\r\n\r\nWhen the Worker class get instantiated, it sets the nthreads to the number of CPUs present on the system ([here the code](https://github.com/dask/distributed/blob/2022.02.0/distributed/worker.py#L840))\r\n\r\nWhen this number is relatively high, it causes Python3.11 to hang and all the underlying threads to deadlock on the GIL.\r\n\r\nTo replicate the issue:\r\n* make a copy of the dask [benchmark file](https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_dask/run_benchmark.py)\r\n* set the nthreads of the Worker class to a relatively high number (E.g. 1000).\r\n```\r\nasync with Worker(scheduler.address, nthreads=1000):\r\n...\r\n```\r\n* Create/activate a venv with Python 3.11 and install the dependencies\r\n```\r\npip install dask[distributed]==2022.2.0 pyperf\r\n```\r\n* Run a quick stress test\r\n```\r\nwhile true; do python run_benchmark.py; done \r\n```\r\nand wait to hang. It does it at random time.\r\n\r\nWith the process hanging, gdb shows on a thread (out of the hundreds):\r\n```\r\n (gdb) thread 4\r\n[Switching to thread 4 (Thread 0x7f5aeffff640 (LWP 402351))]\r\n#0 __futex_abstimed_wait_common64 (private=-1457409528, cancel=true, abstime=0x7f5aefffde20, op=137, expected=0, futex_word=0x5640a959d354 \u003c_PyRuntime+436\u003e) at ./nptl/futex-internal.c:57\r\n57 in ./nptl/futex-internal.c\r\n(gdb) py-bt\r\nTraceback (most recent call first):\r\n Waiting for the GIL\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/psutil/_common.py\", line 788, in open_binary\r\n return open(fname, \"rb\", buffering=FILE_READ_BUFFER_SIZE)\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/psutil/_pslinux.py\", line 1967, in memory_info\r\n with open_binary(\"%s/%s/statm\" % (self._procfs_path, self.pid)) as f:\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/psutil/_pslinux.py\", line 1714, in wrapper\r\n return fun(self, *args, **kwargs)\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/psutil/__init__.py\", line 1102, in memory_info\r\n return self._proc.memory_info()\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/psutil/_common.py\", line 495, in wrapper\r\n return fun(self)\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/distributed/utils_perf.py\", line 188, in _gc_callback\r\n rss = self._proc.memory_info().rss\r\n \u003cbuilt-in method _current_frames of module object at remote 0x7f5dc0a32ca0\u003e\r\n File \"/home/ent-user/venv/cpython3.11-324490c70469-compat-2d3356be745c/lib/python3.11/site-packages/distributed/profile.py\", line 270, in _watch\r\n frame = sys._current_frames()[thread_id]\r\n File \"/home/ent-user/ci-scripts/tmpdir/prefix/lib/python3.11/threading.py\", line 982, in run\r\n self._target(*self._args, **self._kwargs)\r\n File \"/home/ent-user/ci-scripts/tmpdir/prefix/lib/python3.11/threading.py\", line 1045, in _bootstrap_inner\r\n self.run()\r\n File \"/home/ent-user/ci-scripts/tmpdir/prefix/lib/python3.11/threading.py\", line 1002, in _bootstrap\r\n self._bootstrap_inner()\r\n```\r\n\r\nA strace of a thread shows (continuously)\r\n```\r\n...\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=468031783}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=473122144}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=478228035}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=483319687}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=488417438}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=493521779}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=498608771}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=503711922}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=508813993}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=513919325}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\nfutex(0x55707e87f358, FUTEX_WAKE_PRIVATE, 1) = 0\r\nfutex(0x55707e87f350, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=6498067, tv_nsec=519022166}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)\r\n...\r\n```\r\n\r\nI tried upgrading Dask[distributed] the latest version but I have the same effects. I think there is something going on in Python 3.11.\r\nThis happens only with Python 3.11: 3.9 and 3.12 work as expected.\r\n\r\nI've seen it on x86, aarch64 still to test.\r\n\r\n### CPython versions tested on:\r\n\r\n3.11\r\n\r\n### Operating systems tested on:\r\n\r\nLinux","author":{"url":"https://github.com/diegorusso","@type":"Person","name":"diegorusso"},"datePublished":"2024-03-18T18:58:52.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":8},"url":"https://github.com/116969/cpython/issues/116969"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:634cd4f6-e579-ff8f-3842-6f9ef252fc83 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D32E:15DFF5:1B85F17:25E246C:696AB5F6 |
| html-safe-nonce | 455f0e28711c6354174423cfd428bcb0b9fc5ffc48ba6fc9ed1110d88558701c |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEMzJFOjE1REZGNToxQjg1RjE3OjI1RTI0NkM6Njk2QUI1RjYiLCJ2aXNpdG9yX2lkIjoiNzg0ODM4MjYwNDM0NjMwODA4NiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | c75e475eed8eb1efff11dffc352dcfa2b22f6f5d815ed0c5d837b1d7a407ee14 |
| hovercard-subject-tag | issue:2193029935 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/116969/issue_layout |
| twitter:image | https://opengraph.githubassets.com/1260622cdc196b7dbe56fd3bbb12a9c52cd43a02138d6d12db964b45c683b025/python/cpython/issues/116969 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/1260622cdc196b7dbe56fd3bbb12a9c52cd43a02138d6d12db964b45c683b025/python/cpython/issues/116969 |
| og:image:alt | Bug report Bug description: I have noticed that the dask benchmark in pyperformance hangs when running it with Python 3.11 with a "high" number of cores on the machine. I have seen issues with 191 ... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | diegorusso |
| hostname | github.com |
| expected-hostname | github.com |
| None | 46ce962e0e18113ea447391b6ace8b02d4d2861e57b4fbab3658698f73d8855b |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 30300f30bb3949de255e84a146706a3bdb5c19c9 |
| ui-target | canary-1 |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width