Title: Add UTF-8 validity checking to schema · Issue #151 · singer-io/singer-python · GitHub
Open Graph Title: Add UTF-8 validity checking to schema · Issue #151 · singer-io/singer-python
X Title: Add UTF-8 validity checking to schema · Issue #151 · singer-io/singer-python
Description: For data-type "string", the _transform function just attempts to do str(data) and catches an exception to determine if the string is valid. Binary strings with null bytes or other invalid UTF-8 character sequences will pass through this ...
Open Graph Description: For data-type "string", the _transform function just attempts to do str(data) and catches an exception to determine if the string is valid. Binary strings with null bytes or other invalid UTF-8 cha...
X Description: For data-type "string", the _transform function just attempts to do str(data) and catches an exception to determine if the string is valid. Binary strings with null bytes or other invalid...
Opengraph URL: https://github.com/singer-io/singer-python/issues/151
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Add UTF-8 validity checking to schema","articleBody":"For data-type `\"string\"`, the `_transform` function just attempts to do `str(data)` and catches an exception to determine if the string is valid. Binary strings with null bytes or other invalid UTF-8 character sequences will pass through this function as valid strings. However, targets may expect strings to be valid encoded text, such as UTF-8.\r\n\r\nUTF-8 encoding validation can be enforced with a pre_hook when calling transform, but this doesn't inform the target about the type of string. It'd be helpful to somehow include character encoding as part of the schema so that downstream targets can know what to expect and choose the appropriate data type. For example, MySQL has `TEXT` and `BLOB` types to separately handle text and binary strings. One natural place to put this could be the `\"format\"` parameter, though it'd be tedious to have to explicitly specify UTF-8 for every string when that is the default. It'd be convenient to have a way to make UTF-8 the default for all strings in a schema and override it with binary (the current behavior) explicitly for binary fields.","author":{"url":"https://github.com/KBorders01","@type":"Person","name":"KBorders01"},"datePublished":"2021-09-08T13:42:27.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/151/singer-python/issues/151"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:0dd0eb96-45db-6879-9eda-8e3a3564d89b |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | B5C0:3523AF:5A30DA2:751BAA2:697E02AC |
| html-safe-nonce | 3c0a7fc0999655414b115a4a1d12f4695ed77c23a846f63fa163066f7ac024bb |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNUMwOjM1MjNBRjo1QTMwREEyOjc1MUJBQTI6Njk3RTAyQUMiLCJ2aXNpdG9yX2lkIjoiMzQ2MTYyNzg1ODYxNzUwMDMzMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 531454bda58086c8de3614130d391952ed557d78cf20d866566145a3daef82ec |
| hovercard-subject-tag | issue:991154828 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/singer-io/singer-python/151/issue_layout |
| twitter:image | https://opengraph.githubassets.com/ba6fd43c7ac72d88517579348bf9cbfc927e9d343272d2cbc0c6c6b4ba7f8a80/singer-io/singer-python/issues/151 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/ba6fd43c7ac72d88517579348bf9cbfc927e9d343272d2cbc0c6c6b4ba7f8a80/singer-io/singer-python/issues/151 |
| og:image:alt | For data-type "string", the _transform function just attempts to do str(data) and catches an exception to determine if the string is valid. Binary strings with null bytes or other invalid UTF-8 cha... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | KBorders01 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 60279d4097367e16897439d16d6bbe4180663db828c666eeed2656988ffe59f6 |
| turbo-cache-control | no-preview |
| go-import | github.com/singer-io/singer-python git https://github.com/singer-io/singer-python.git |
| octolytics-dimension-user_id | 25538203 |
| octolytics-dimension-user_login | singer-io |
| octolytics-dimension-repository_id | 72225524 |
| octolytics-dimension-repository_nwo | singer-io/singer-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 72225524 |
| octolytics-dimension-repository_network_root_nwo | singer-io/singer-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 7c85641c598ad130c74f7bcc27f58575cac69551 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width