Title: ProcessPoolExecutor shutdown hangs after future cancel was requested · Issue #94440 · python/cpython · GitHub
Open Graph Title: ProcessPoolExecutor shutdown hangs after future cancel was requested · Issue #94440 · python/cpython
X Title: ProcessPoolExecutor shutdown hangs after future cancel was requested · Issue #94440 · python/cpython
Description: Bug report With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True) would hang indefinitely. This happens pretty much on all platforms and all recent Python versions. Here is a minimal re...
Open Graph Description: Bug report With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True) would hang indefinitely. This happens pretty much on all platforms and all rece...
X Description: Bug report With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True) would hang indefinitely. This happens pretty much on all platforms and all rece...
Opengraph URL: https://github.com/python/cpython/issues/94440
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"ProcessPoolExecutor shutdown hangs after future cancel was requested","articleBody":"**Bug report**\r\n\r\nWith a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to `shutdown(wait=True)` would hang indefinitely.\r\nThis happens pretty much on all platforms and all recent Python versions.\r\n\r\nHere is a minimal reproduction:\r\n\r\n```py\r\nimport concurrent.futures\r\n\r\nppe = concurrent.futures.ProcessPoolExecutor(1)\r\nppe.submit(int).result()\r\nppe.submit(int).cancel()\r\nppe.shutdown(wait=True)\r\n```\r\n\r\nThe first submission gets the executor going and creates its internal `queue_management_thread`.\r\nThe second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.\r\n\r\nIntroducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way the `queue_management_worker` internal loop is structured doesn't handle this edge case well.\r\n\r\nShutting down with `wait=False` would return immediately as expected, but the `queue_management_thread` would then die with an unhandled `OSError: handle is closed` exception.\r\n\r\n**Environment**\r\n\r\n* Discovered on macOS-12.2.1 with cpython 3.8.5.\r\n* Reproduced in Ubuntu and Windows (x64) as well, and in cpython versions 3.7 to 3.11.0-beta.3.\r\n* Reproduced in pypy3.8 as well, but not consistently. Seen for example in Ubuntu with Python 3.8.13 (PyPy 7.3.9).\r\n\r\n**Additional info**\r\n\r\nWhen tested with `pytest-timeout` under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:\r\n\r\n\u003cdetails\u003e\r\n\r\n```pytb\r\n_____________________________________ test _____________________________________\r\n @pytest.mark.timeout(10)\r\n def test():\r\n ppe = concurrent.futures.ProcessPoolExecutor(1)\r\n ppe.submit(int).result()\r\n ppe.submit(int).cancel()\r\n\u003e ppe.shutdown(wait=True)\r\ntest_reproduce_python_bug.py:14:\r\n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _\r\n/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py:686: in shutdown\r\n self._queue_management_thread.join()\r\n/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1011: in join\r\n self._wait_for_tstate_lock()\r\n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _\r\nself = \u003cThread(QueueManagerThread, started daemon 140003176535808)\u003e\r\nblock = True, timeout = -1\r\n def _wait_for_tstate_lock(self, block=True, timeout=-1):\r\n # Issue #18808: wait for the thread state to be gone.\r\n # At the end of the thread's life, after all knowledge of the thread\r\n # is removed from C data structures, C code releases our _tstate_lock.\r\n # This method passes its arguments to _tstate_lock.acquire().\r\n # If the lock is acquired, the C code is done, and self._stop() is\r\n # called. That sets ._is_stopped to True, and ._tstate_lock to None.\r\n lock = self._tstate_lock\r\n if lock is None: # already determined that the C code is done\r\n assert self._is_stopped\r\n\u003e elif lock.acquire(block, timeout):\r\nE Failed: Timeout \u003e10.0s\r\n/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1027: Failed\r\n----------------------------- Captured stderr call -----------------------------\r\n+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++\r\n~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (140003159754496) ~~~~~~~~~~~~~~~~~\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 890, in _bootstrap\r\n self._bootstrap_inner()\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 932, in _bootstrap_inner\r\n self.run()\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 870, in run\r\n self._target(*self._args, **self._kwargs)\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/queues.py\", line 227, in _feed\r\n nwait()\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 302, in wait\r\n waiter.acquire()\r\n~~~~~~~~~~~~~~~~ Stack of QueueManagerThread (140003176535808) ~~~~~~~~~~~~~~~~~\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 890, in _bootstrap\r\n self._bootstrap_inner()\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 932, in _bootstrap_inner\r\n self.run()\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py\", line 870, in run\r\n self._target(*self._args, **self._kwargs)\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py\", line 362, in _queue_management_worker\r\n ready = mp.connection.wait(readers + worker_sentinels)\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/connection.py\", line 931, in wait\r\n ready = selector.select(timeout)\r\n File \"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/selectors.py\", line 415, in select\r\n fd_event_list = self._selector.poll(timeout)\r\n+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++\r\n```\r\n\r\n\u003c/details\u003e\r\n\r\nTracebacks in PyPy are similar on the `concurrent.futures.process` level. Tracebacks in Windows are different in the lower-level areas, but again similar on the `concurrent.futures.process` level.\r\n\r\nLinked PRs:\r\n- #94468\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-94468\n* gh-102746\n* gh-102747\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/yonatanp","@type":"Person","name":"yonatanp"},"datePublished":"2022-06-30T13:42:21.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":13},"url":"https://github.com/94440/cpython/issues/94440"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:6d8920c8-7a68-b4be-bf01-f22d1ed25947 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | BC52:BB9B6:F4CCBF:14CDFBC:6969F2BF |
| html-safe-nonce | 3b57788c956c9ae112c0529f3bd108a6c677fbd6e710c29a2dd34af81958f369 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCQzUyOkJCOUI2OkY0Q0NCRjoxNENERkJDOjY5NjlGMkJGIiwidmlzaXRvcl9pZCI6IjcwMDkwNDAzNTEzOTcyNzgzOTkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | a4085d3373f74124ba352a61883e3cbd0be09e16526b9aaf7ecdec0a241a24af |
| hovercard-subject-tag | issue:1290158385 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/94440/issue_layout |
| twitter:image | https://opengraph.githubassets.com/b12dbcafaca6b6e1b3cabb9cd0398c29b5421baf9de3b1ddd9f6de3abc814c10/python/cpython/issues/94440 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/b12dbcafaca6b6e1b3cabb9cd0398c29b5421baf9de3b1ddd9f6de3abc814c10/python/cpython/issues/94440 |
| og:image:alt | Bug report With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True) would hang indefinitely. This happens pretty much on all platforms and all rece... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | yonatanp |
| hostname | github.com |
| expected-hostname | github.com |
| None | 7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | bdde15ad1b403e23b08bbd89b53fbe6bdf688cad |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width