Title: Bug: `binascii.a2b_uu` incorrectly assumes padded bytes are always whitespace · Issue #100308 · python/cpython · GitHub
Open Graph Title: Bug: `binascii.a2b_uu` incorrectly assumes padded bytes are always whitespace · Issue #100308 · python/cpython
X Title: Bug: `binascii.a2b_uu` incorrectly assumes padded bytes are always whitespace · Issue #100308 · python/cpython
Description: Bug Description I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the binascii.a2b_uu function. After digging into Linux's uu decode implementation(L248) and other resources (linked below) I'm decently...
Open Graph Description: Bug Description I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the binascii.a2b_uu function. After digging into Linux's uu decode implementation(L248) and oth...
X Description: Bug Description I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the binascii.a2b_uu function. After digging into Linux's uu decode implementation(L...
Opengraph URL: https://github.com/python/cpython/issues/100308
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Bug: `binascii.a2b_uu` incorrectly assumes padded bytes are always whitespace","articleBody":"### Bug Description\r\nI was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the `binascii.a2b_uu` function. After digging into [Linux's uu decode implementation](https://fossies.org/linux/uuencode/uudecode.c)(L248) and other resources (linked below) I'm decently certain the python implementation is bugged.\r\n\r\n### The following is what I tried:\r\n```python\r\nfrom binascii import a2b_uu\r\ns = '%-@ !'\r\ndecoded = a2b_uu(s)\r\n```\r\n### The expected output is:\r\n```python\r\nprint(decoded) # b'6\\x00\\x00\\x00\\x00'\r\n```\r\n### The actual output is:\r\n```text\r\nTraceback (most recent call last):\r\n File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\r\nbinascii.Error: Trailing garbage\r\n```\r\n\r\nNotice there are 5 bytes in the expected output (b'6\\x00\\x00\\x00\\x00') because the `%` (first byte of input string, `s`) means 5 bytes of data follow (ascii code 37 - 32 = 5). UUEncoding requires output be divisible by 3 bytes so an extra padding character is added. In this case it's an `!`.\r\n\r\nThe python implementation assumes the padding is always whitespace. Different uuencoders will use different characters for padding though. I've seen three so far: ` `, `` ` ``, and `!`.\r\n\r\n[The following several lines of code are the issue](https://github.com/python/cpython/blob/main/Modules/binascii.c#L280)\r\n\r\n### Proposed fix\r\nSimply remove the following lines (279 - 296). Or if we really want the verification of padding we can include the '!' in the condition of valid padding chars. (The linked linux implementation does not verify padding, however.) And based on my research, there isn't a well defined padding character so we will be jumping to the same potentially false conclusion that we have here: believing we've accounted for all the padding characters that exist in the wild.\r\n```c\r\n/*\r\n** Finally, check that if there's anything left on the line\r\n** that it's whitespace only.\r\n*/\r\nwhile( ascii_len-- \u003e 0 ) {\r\n this_ch = *ascii_data++;\r\n /* Extra '`' may be written as padding in some cases */\r\n if ( this_ch != ' ' \u0026\u0026 this_ch != ' '+64 \u0026\u0026\r\n this_ch != '\\n' \u0026\u0026 this_ch != '\\r' ) {\r\n state = get_binascii_state(module);\r\n if (state == NULL) {\r\n return NULL;\r\n }\r\n PyErr_SetString(state-\u003eError, \"Trailing garbage\");\r\n Py_DECREF(rv);\r\n return NULL;\r\n }\r\n}\r\n```\r\n\r\nProblematically, this bug propagated up to the uu_codec decode implementation as well. [See the following code](https://github.com/python/cpython/blob/main/Lib/encodings/uu_codec.py#L60)\r\n\r\nA comment indicates the caught exception and \"workaround\" are due to broken uuencoders. According to what I've read, it's the broken python binascii.a2b_uu that incorrectly assumes any padding bytes are ` ` or `` ` ``.\r\n\r\nHere are the sources for my understanding of uu encoding:\r\n[Examples of non whitespace padding](https://www.herongyang.com/Encoding/UUEncode-Algorithm.html)\r\n[Wikipedia uuencoding](https://en.wikipedia.org/wiki/Uuencoding)\r\n[Busybox uudecode implementation](https://elixir.bootlin.com/busybox/0.45/source/uudecode.c#L67)\r\n\r\nFollowing is an illustration that helped me find a sense of understanding:\r\n\r\n\r\n[1] I couldn't find an RFC or other standards document so I looked for the earliest implementation I could find (1983 Linux implementation) along with the wikipedia entry.\r\n\r\n### In the meantime\r\nIf others encounter this issue I'm using the following workaround:\r\n```python\r\nimport binascii\r\nfrom binascii import a2b_uu\r\nfrom io import BytesIO\r\n\r\nmy_bytes = BytesIO()\r\nline_bytes = b'%-@ !'\r\nline = line_bytes.decode(encoding='ascii')\r\ntry:\r\n my_bytes.write(a2b_uu(line))\r\nexcept binascii.Error as err:\r\n if 'trailing garbage' in str(err).lower():\r\n n_bytes = line_bytes[0] - 32\r\n assert n_bytes \u003c= 45 and n_bytes \u003c= len(line[1:])\r\n workaround_line = f'M{line[1:]}' # replace first byte of UUEncoded line with max length specifier (M)\r\n data = a2b_uu(workaround_line)[:n_bytes]\r\n my_bytes.write(data)\r\n else:\r\n raise err\r\n```","author":{"url":"https://github.com/ajmedeio","@type":"Person","name":"ajmedeio"},"datePublished":"2022-12-16T20:52:31.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":5},"url":"https://github.com/100308/cpython/issues/100308"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:2144cca0-9a50-f2d1-ef35-ee2f4ae20a24 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D64A:1D772C:227B3BC:2D86257:696B1483 |
| html-safe-nonce | 6ffa5550ca832a5e3b0592cb281d307b9bdb243885e13b84c67143451db421c5 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJENjRBOjFENzcyQzoyMjdCM0JDOjJEODYyNTc6Njk2QjE0ODMiLCJ2aXNpdG9yX2lkIjoiNjI0NjI0MjcwOTI0NDk0MTQ0MyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 6414a12f4ad8d85a745c073216e92ea3274e823d0d3479bf679fe240fcf114d4 |
| hovercard-subject-tag | issue:1500849960 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/python/cpython/100308/issue_layout |
| twitter:image | https://opengraph.githubassets.com/02f88ca3b4213efa5700aff20ed614c9a97d0d4011e90215a4d197efbd976df1/python/cpython/issues/100308 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/02f88ca3b4213efa5700aff20ed614c9a97d0d4011e90215a4d197efbd976df1/python/cpython/issues/100308 |
| og:image:alt | Bug Description I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the binascii.a2b_uu function. After digging into Linux's uu decode implementation(L248) and oth... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ajmedeio |
| hostname | github.com |
| expected-hostname | github.com |
| None | 5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d |
| turbo-cache-control | no-preview |
| go-import | github.com/python/cpython git https://github.com/python/cpython.git |
| octolytics-dimension-user_id | 1525981 |
| octolytics-dimension-user_login | python |
| octolytics-dimension-repository_id | 81598961 |
| octolytics-dimension-repository_nwo | python/cpython |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 81598961 |
| octolytics-dimension-repository_network_root_nwo | python/cpython |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 82560a55c6b2054555076f46e683151ee28a19bc |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width