René's URL Explorer Experiment


Title: Don't choke on (legitimately) invalidly encoded Unicode paths by nvie · Pull Request #467 · gitpython-developers/GitPython · GitHub

Open Graph Title: Don't choke on (legitimately) invalidly encoded Unicode paths by nvie · Pull Request #467 · gitpython-developers/GitPython

X Title: Don't choke on (legitimately) invalidly encoded Unicode paths by nvie · Pull Request #467 · gitpython-developers/GitPython

Description: We've come across path names that contain bytes that are invalid in UTF-8 encoded strings, even though they're very rare. My assumption here is these commits have been created by an old (buggy?) version of Git, and now live in the tree objects with this data. Since we return only unicode strings for the a_path and b_path properties, we're not able to decode this string and thus choke when asking for the diff. This PR fixes that by using "replace" semantics when decoding. This will effectively replace the illegal bytes \200 (or \x80) by \ufffd (= �). Follow-up discussion However, this also means that if you would want to git-blame this file, there's no good way of referencing this path, since it's inherently a bytes path. Normally, when we pass unicode paths to git-blame via GitPython's blame API, the paths get converted to UTF-8 right before issuing the external command. But there's no way of getting the original bytes back after the "replace" operation happened. Example: The input path is b'illegal-\x80.txt' (containing illegal byte \x80) When decoded to UTF-8 with characters replaced, we get the unicode string u'illegal-\ufffd.txt' (= "illegal-�.txt") When encoding that in UTF-8, we find b'illegal-\xef\xbf\xbd.txt' When we next pass illegal-\xef\xbf\xbd.txt to git-blame, it will not be able to find this path. Perhaps it would be a good idea to not only return the decoded path strings, but also provide access to the raw bytes found, i.e. by exposing a_rawpath and b_rawpath, which would always be bytes? That way, you could still have the friendly "unicode paths" for most use cases, but use bytes if you need to speak the language of Git more accurately.

Open Graph Description: We've come across path names that contain bytes that are invalid in UTF-8 encoded strings, even though they're very rare. My assumption here is these commits have been created by an old (bu...

X Description: We've come across path names that contain bytes that are invalid in UTF-8 encoded strings, even though they're very rare. My assumption here is these commits have been created by an...

Opengraph URL: https://github.com/gitpython-developers/GitPython/pull/467

X: @github

direct link

Domain: github.com

route-pattern/:user_id/:repository/pull/:id/files(.:format)
route-controllerpull_requests
route-actionfiles
fetch-noncev2:27747202-0036-9dbc-f3dc-3bf6fbba7b76
current-catalog-service-hashae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b
request-idEC0C:2B1AB1:506ED5:69DA72:696AE333
html-safe-nonce991b1d512a94cbacdaf5f34aaa02b79fa1f44a8de738c8acd39334804f203c16
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFQzBDOjJCMUFCMTo1MDZFRDU6NjlEQTcyOjY5NkFFMzMzIiwidmlzaXRvcl9pZCI6IjMzMzg2NTA0MzM2OTIwMzM4NDMiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac273bd8f4e1d4ce9b5d02443fc8ee0f1043dac7632f0ac6f70be51aed9c6f3e15
hovercard-subject-tagpull_request:72831833
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///pull_requests/show/files
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/gitpython-developers/GitPython/pull/467/files
twitter:imagehttps://avatars.githubusercontent.com/u/83844?s=400&v=4
twitter:cardsummary_large_image
og:imagehttps://avatars.githubusercontent.com/u/83844?s=400&v=4
og:image:altWe've come across path names that contain bytes that are invalid in UTF-8 encoded strings, even though they're very rare. My assumption here is these commits have been created by an old (bu...
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d
turbo-cache-controlno-preview
diff-viewunified
go-importgithub.com/gitpython-developers/GitPython git https://github.com/gitpython-developers/GitPython.git
octolytics-dimension-user_id503709
octolytics-dimension-user_logingitpython-developers
octolytics-dimension-repository_id1126087
octolytics-dimension-repository_nwogitpython-developers/GitPython
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id1126087
octolytics-dimension-repository_network_root_nwogitpython-developers/GitPython
turbo-body-classeslogged-out env-production page-responsive full-width
disable-turbotrue
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release3d84d50b3c75fa36755c3cf392edbc09e626f979
ui-targetcanary-1
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/gitpython-developers/GitPython/pull/467/files#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fgitpython-developers%2FGitPython%2Fpull%2F467%2Ffiles
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fgitpython-developers%2FGitPython%2Fpull%2F467%2Ffiles
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fpull_requests%2Fshow%2Ffiles&source=header-repo&source_repo=gitpython-developers%2FGitPython
Reloadhttps://github.com/gitpython-developers/GitPython/pull/467/files
Reloadhttps://github.com/gitpython-developers/GitPython/pull/467/files
Reloadhttps://github.com/gitpython-developers/GitPython/pull/467/files
gitpython-developers https://github.com/gitpython-developers
GitPythonhttps://github.com/gitpython-developers/GitPython
Please reload this pagehttps://github.com/gitpython-developers/GitPython/pull/467/files
Notifications https://github.com/login?return_to=%2Fgitpython-developers%2FGitPython
Fork 964 https://github.com/login?return_to=%2Fgitpython-developers%2FGitPython
Star 5k https://github.com/login?return_to=%2Fgitpython-developers%2FGitPython
Code https://github.com/gitpython-developers/GitPython
Issues 169 https://github.com/gitpython-developers/GitPython/issues
Pull requests 8 https://github.com/gitpython-developers/GitPython/pulls
Discussions https://github.com/gitpython-developers/GitPython/discussions
Actions https://github.com/gitpython-developers/GitPython/actions
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/gitpython-developers/GitPython/security
Please reload this pagehttps://github.com/gitpython-developers/GitPython/pull/467/files
Insights https://github.com/gitpython-developers/GitPython/pulse
Code https://github.com/gitpython-developers/GitPython
Issues https://github.com/gitpython-developers/GitPython/issues
Pull requests https://github.com/gitpython-developers/GitPython/pulls
Discussions https://github.com/gitpython-developers/GitPython/discussions
Actions https://github.com/gitpython-developers/GitPython/actions
Security https://github.com/gitpython-developers/GitPython/security
Insights https://github.com/gitpython-developers/GitPython/pulse
Sign up for GitHub https://github.com/signup?return_to=%2Fgitpython-developers%2FGitPython%2Fissues%2Fnew%2Fchoose
terms of servicehttps://docs.github.com/terms
privacy statementhttps://docs.github.com/privacy
Sign inhttps://github.com/login?return_to=%2Fgitpython-developers%2FGitPython%2Fissues%2Fnew%2Fchoose
Byronhttps://github.com/Byron
masterhttps://github.com/gitpython-developers/GitPython/tree/master
fix-dont-choke-on-invalid-unicode-pathshttps://github.com/gitpython-developers/GitPython/tree/fix-dont-choke-on-invalid-unicode-paths
Conversation 3 https://github.com/gitpython-developers/GitPython/pull/467
Commits 1 https://github.com/gitpython-developers/GitPython/pull/467/commits
Checks 0 https://github.com/gitpython-developers/GitPython/pull/467/checks
Files changed https://github.com/gitpython-developers/GitPython/pull/467/files
Please reload this pagehttps://github.com/gitpython-developers/GitPython/pull/467/files
Don't choke on (legitimately) invalidly encoded Unicode paths https://github.com/gitpython-developers/GitPython/pull/467/files#top
Show all changes 1 commit https://github.com/gitpython-developers/GitPython/pull/467/files
200d3c6 Don't choke on (legitimately) invalidly encoded Unicode paths nvie Jun 6, 2016 https://github.com/gitpython-developers/GitPython/pull/467/commits/200d3c6cb436097eaee7c951a0c9921bfcb75c7f
Clear filters https://github.com/gitpython-developers/GitPython/pull/467/files
Please reload this pagehttps://github.com/gitpython-developers/GitPython/pull/467/files
Please reload this pagehttps://github.com/gitpython-developers/GitPython/pull/467/files
diff.py https://github.com/gitpython-developers/GitPython/pull/467/files#diff-300633890a1b325dfed86bb5120d89465f0687f4f6b8d5701c44c02f0eee723a
diff_patch_unsafe_paths https://github.com/gitpython-developers/GitPython/pull/467/files#diff-31adfe8fe3bbf86b47ca60cebb0e80f3d9907becc7edafcff2170265027338d2
test_diff.py https://github.com/gitpython-developers/GitPython/pull/467/files#diff-9f0c451bbfc178190924558e91ae5eb6c19beef1d12201e2f9a86cb9cc50f049
git/diff.pyhttps://github.com/gitpython-developers/GitPython/pull/467/files#diff-300633890a1b325dfed86bb5120d89465f0687f4f6b8d5701c44c02f0eee723a
View file https://github.com/gitpython-developers/GitPython/blob/200d3c6cb436097eaee7c951a0c9921bfcb75c7f/git/diff.py
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/gitpython-developers/GitPython/pull/467/{{ revealButtonHref }}
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-300633890a1b325dfed86bb5120d89465f0687f4f6b8d5701c44c02f0eee723a
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-300633890a1b325dfed86bb5120d89465f0687f4f6b8d5701c44c02f0eee723a
git/test/fixtures/diff_patch_unsafe_pathshttps://github.com/gitpython-developers/GitPython/pull/467/files#diff-31adfe8fe3bbf86b47ca60cebb0e80f3d9907becc7edafcff2170265027338d2
View file https://github.com/gitpython-developers/GitPython/blob/200d3c6cb436097eaee7c951a0c9921bfcb75c7f/git/test/fixtures/diff_patch_unsafe_paths
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/gitpython-developers/GitPython/pull/467/{{ revealButtonHref }}
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-31adfe8fe3bbf86b47ca60cebb0e80f3d9907becc7edafcff2170265027338d2
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-31adfe8fe3bbf86b47ca60cebb0e80f3d9907becc7edafcff2170265027338d2
git/test/test_diff.pyhttps://github.com/gitpython-developers/GitPython/pull/467/files#diff-9f0c451bbfc178190924558e91ae5eb6c19beef1d12201e2f9a86cb9cc50f049
View file https://github.com/gitpython-developers/GitPython/blob/200d3c6cb436097eaee7c951a0c9921bfcb75c7f/git/test/test_diff.py
Open in desktop https://desktop.github.com
https://github.co/hiddenchars
https://github.com/gitpython-developers/GitPython/pull/467/{{ revealButtonHref }}
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-9f0c451bbfc178190924558e91ae5eb6c19beef1d12201e2f9a86cb9cc50f049
https://github.com/gitpython-developers/GitPython/pull/467/files#diff-9f0c451bbfc178190924558e91ae5eb6c19beef1d12201e2f9a86cb9cc50f049
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.