Title: TarFile filters fail in non-UTF-8 locales · Issue #133890 · python/cpython · GitHub
Open Graph Title: TarFile filters fail in non-UTF-8 locales · Issue #133890 · python/cpython
X Title: TarFile filters fail in non-UTF-8 locales · Issue #133890 · python/cpython
Description: Bug report test_tarfile files in non-UTF-8 locales. For example: $ LC_ALL=uk_UA ./python -m test -vuall test_tarfile -m 'NoneInfoExtractTests_*' -m test_data_filter -m test_tar_filter Details =============================================...
Open Graph Description: Bug report test_tarfile files in non-UTF-8 locales. For example: $ LC_ALL=uk_UA ./python -m test -vuall test_tarfile -m 'NoneInfoExtractTests_*' -m test_data_filter -m test_tar_filter Details =====...
X Description: Bug report test_tarfile files in non-UTF-8 locales. For example: $ LC_ALL=uk_UA ./python -m test -vuall test_tarfile -m 'NoneInfoExtractTests_*' -m test_data_filter -m test_tar_filter Detai...
Opengraph URL: https://github.com/python/cpython/issues/133890
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"TarFile filters fail in non-UTF-8 locales","articleBody":"# Bug report\n\n`test_tarfile` files in non-UTF-8 locales. For example:\n```\n$ LC_ALL=uk_UA ./python -m test -vuall test_tarfile -m 'NoneInfoExtractTests_*' -m test_data_filter -m test_tar_filter\n```\n\u003cdetails\u003e\n\n```\n======================================================================\nERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Data)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 3264, in setUpClass\n tar.extractall(cls.control_dir, filter=cls.extraction_filter)\n ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2389, in extractall\n tarinfo = self._get_extract_tarinfo(member, filter_function, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2441, in _get_extract_tarinfo\n tarinfo = filter_function(tarinfo, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 842, in data_filter\n new_attrs = _get_filtered_attrs(member, dest_path, True)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 782, in _get_filtered_attrs\n target_path = os.path.realpath(os.path.join(dest_path, name))\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 405, in realpath\n return _realpath(filename, strict, sep, curdir, pardir, getcwd)\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 452, in _realpath\n st_mode = lstat(newpath).st_mode\n ~~~~~^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 112-118: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n======================================================================\nERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Default)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 3264, in setUpClass\n tar.extractall(cls.control_dir, filter=cls.extraction_filter)\n ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2389, in extractall\n tarinfo = self._get_extract_tarinfo(member, filter_function, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2441, in _get_extract_tarinfo\n tarinfo = filter_function(tarinfo, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 842, in data_filter\n new_attrs = _get_filtered_attrs(member, dest_path, True)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 782, in _get_filtered_attrs\n target_path = os.path.realpath(os.path.join(dest_path, name))\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 405, in realpath\n return _realpath(filename, strict, sep, curdir, pardir, getcwd)\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 452, in _realpath\n st_mode = lstat(newpath).st_mode\n ~~~~~^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 112-118: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n======================================================================\nERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_FullyTrusted)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 3264, in setUpClass\n tar.extractall(cls.control_dir, filter=cls.extraction_filter)\n ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2397, in extractall\n self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),\n ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n numeric_owner=numeric_owner)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2460, in _extract_one\n self._extract_member(tarinfo, os.path.join(path, tarinfo.name),\n ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n set_attrs=set_attrs,\n ^^^^^^^^^^^^^^^^^^^^\n numeric_owner=numeric_owner)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2543, in _extract_member\n self.makefile(tarinfo, targetpath)\n ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2589, in makefile\n with bltn_open(targetpath, \"wb\") as target:\n ~~~~~~~~~^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 112-118: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n======================================================================\nERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Tar)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 3264, in setUpClass\n tar.extractall(cls.control_dir, filter=cls.extraction_filter)\n ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2389, in extractall\n tarinfo = self._get_extract_tarinfo(member, filter_function, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 2441, in _get_extract_tarinfo\n tarinfo = filter_function(tarinfo, path)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 836, in tar_filter\n new_attrs = _get_filtered_attrs(member, dest_path, False)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 782, in _get_filtered_attrs\n target_path = os.path.realpath(os.path.join(dest_path, name))\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 405, in realpath\n return _realpath(filename, strict, sep, curdir, pardir, getcwd)\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 452, in _realpath\n st_mode = lstat(newpath).st_mode\n ~~~~~^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 112-118: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n======================================================================\nERROR: test_data_filter (test.test_tarfile.TestExtractionFilters.test_data_filter)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 4086, in test_data_filter\n filtered = tarfile.data_filter(tarinfo, '')\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 842, in data_filter\n new_attrs = _get_filtered_attrs(member, dest_path, True)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 782, in _get_filtered_attrs\n target_path = os.path.realpath(os.path.join(dest_path, name))\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 405, in realpath\n return _realpath(filename, strict, sep, curdir, pardir, getcwd)\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 452, in _realpath\n st_mode = lstat(newpath).st_mode\n ~~~~~^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 69-75: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n======================================================================\nERROR: test_tar_filter (test.test_tarfile.TestExtractionFilters.test_tar_filter)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/home/serhiy/py/cpython/Lib/test/test_tarfile.py\", line 4076, in test_tar_filter\n filtered = tarfile.tar_filter(tarinfo, '')\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 836, in tar_filter\n new_attrs = _get_filtered_attrs(member, dest_path, False)\n File \"/home/serhiy/py/cpython/Lib/tarfile.py\", line 782, in _get_filtered_attrs\n target_path = os.path.realpath(os.path.join(dest_path, name))\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 405, in realpath\n return _realpath(filename, strict, sep, curdir, pardir, getcwd)\n File \"/home/serhiy/py/cpython/Lib/posixpath.py\", line 452, in _realpath\n st_mode = lstat(newpath).st_mode\n ~~~~~^^^^^^^^^\n File \"/home/serhiy/py/cpython/Lib/encodings/koi8_u.py\", line 12, in encode\n return codecs.charmap_encode(input,errors,encoding_table)\n ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nUnicodeEncodeError: 'charmap' codec can't encode characters in position 69-75: character maps to \u003cundefined\u003e\nencoding with 'koi8-u' codec failed\n\n----------------------------------------------------------------------\n```\n\n\u003c/details\u003e\n\nThis happens because they use `os.path.realpath()` for paths in a tar archive, which uses `os.stat()`, which fails with unexpected `UnicodeEncodeError` if the path in a tar archive can't be encoded in the current filesystem encoding. This error should be handled at some level, either in `os.path.realpath()` or in `tarfile`. `os.stat()` can also raise `ValueError` if the path contain null bytes. Don't know if this is relevant here, we should test.\n\n\n\u003c!-- gh-linked-prs --\u003e\n### Linked PRs\n* gh-134147\n* gh-134195\n* gh-134196\n\u003c!-- /gh-linked-prs --\u003e\n","author":{"url":"https://github.com/serhiy-storchaka","@type":"Person","name":"serhiy-storchaka"},"datePublished":"2025-05-11T11:23:42.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/133890/cpython/issues/133890"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:cce49d4b-51d2-0065-4062-d0cc7f8ca04a |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | DE0C:9F9DA:986EA1:D224A6:696A793F |
| html-safe-nonce | f4fac669f93b3235f5e75e05d290a1130359c80ea9efa431f02d790d2ece62ec |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJERTBDOjlGOURBOjk4NkVBMTpEMjI0QTY6Njk2QTc5M0YiLCJ2aXNpdG9yX2lkIjoiMzE4ODk5MDEyNTAwOTQzNDk0MyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | fe400d00b4955fb19f1b55a17c9086f3b139c2c81bc428a2e8ca67fb7d8bb03b |
| hovercard-subject-tag | issue:3054928420 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/133890/issue_layout |
| twitter:image | https://opengraph.githubassets.com/ab662f0dc4e7a90dcb79e379ee36d0fc47dede8d3cd24e6ecb269760d85f55dd/python/cpython/issues/133890 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/ab662f0dc4e7a90dcb79e379ee36d0fc47dede8d3cd24e6ecb269760d85f55dd/python/cpython/issues/133890 |
| og:image:alt | Bug report test_tarfile files in non-UTF-8 locales. For example: $ LC_ALL=uk_UA ./python -m test -vuall test_tarfile -m 'NoneInfoExtractTests_*' -m test_data_filter -m test_tar_filter Details =====... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | serhiy-storchaka |
| hostname | github.com |
| expected-hostname | github.com |
| None | 5b774e44f85c14a75886edd04ddda4e5a25ddebbb241bcbb590b08a3048730e8 |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | cc5f4eee261b3601c1e98e217ceaf28508b9567e |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width