René's URL Explorer Experiment


Title: Clarity on units of string · Issue #65 · WebAssembly/stringref · GitHub

Open Graph Title: Clarity on units of string · Issue #65 · WebAssembly/stringref

X Title: Clarity on units of string · Issue #65 · WebAssembly/stringref

Description: I'll start by saying that I'm not well-versed in WebAssembly specifications or proposals, but I happened to come across this proposal, and I'm quite interested in Unicode strings and how they're represented in different programming langu...

Open Graph Description: I'll start by saying that I'm not well-versed in WebAssembly specifications or proposals, but I happened to come across this proposal, and I'm quite interested in Unicode strings and how they're re...

X Description: I'll start by saying that I'm not well-versed in WebAssembly specifications or proposals, but I happened to come across this proposal, and I'm quite interested in Unicode strings and ho...

Opengraph URL: https://github.com/WebAssembly/stringref/issues/65

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Clarity on units of string","articleBody":"I'll start by saying that I'm not well-versed in WebAssembly specifications or proposals, but I happened to come across this proposal, and I'm quite interested in Unicode strings and how they're represented in different programming languages and VMs.\r\n\r\nThis might be just a wording issue, but I think the current description of [\"What's a string?\"](https://github.com/WebAssembly/stringref/blob/a64917cd5346f8704e614c4825ebf05737ac5e64/proposals/stringref/Overview.md?plain=1#L50) could be problematic:\r\n\r\n\u003e Therefore **we define a string to be a sequence of unicode scalar values and isolated surrogates**. The code units of a Java or JavaScript string can be interpreted to encode such a sequence, in the [WTF-16](https://simonsapin.github.io/wtf-8/) encoding form.\r\n\r\nIf this is taken to mean that a string is a sequence of Unicode code points (ie, \"Unicode scalar values\" and \"isolated surrogates\", basically any integer from `0x0` to `0x10FFFF`), this does not correspond with \"WTF-16\" or JavaScript strings, since there are sequences of isolated surrogates that can't be distinctly encoded in WTF-16.\r\n\r\neg, the sequence `[U+D83D, U+DCA9]` (high surrogate, low surrogate) doesn't have an encoding form in WTF-16. Interpreting the WTF-16/UTF-16 code unit sequence `\u003cD83D DCA9\u003e` produces the sequence `[U+1F4A9]` (a Unicode scalar value, `💩`). There are 1048576 occurrences of such sequences (one for every code point outside of the BMP), where the \"obvious\" encoding is already used to encode a USV.\r\n\r\n```js\r\n\u003e \"\\uD83D\\uDCA9\"\r\n'💩'\r\n\u003e \"\\u{1F4A9}\"\r\n'💩'\r\n\u003e [...\"\\uD83D\\uDCA9\"].length\r\n1\r\n\u003e [...\"\\u{1F4A9}\"].length\r\n1\r\n```\r\n\r\nNote that this is different to how strings work in Python 3, where a string is indeed a sequence of any Unicode code points:\r\n```python\r\n\u003e\u003e\u003e \"\\U0000D83D\\U0000DCA9\"\r\n'\\ud83d\\udca9'\r\n\u003e\u003e\u003e \"\\U0001F4A9\"\r\n'💩'\r\n\u003e\u003e\u003e len(\"\\U0000D83D\\U0000DCA9\")\r\n2\r\n\u003e\u003e\u003e len(\"\\U0001F4A9\")\r\n1\r\n```\r\n\r\nIf this proposal is suggesting that strings work the same way as in Python 3, I think implementations will likely[0] resort to using UTF-32 in some cases, as I believe Python implementations do (I think Python implementations usually use UTF-32[1] essentially for all strings, though they will switch between using 8-bit, 16-bit and 32-bit arrays depending on the range of code points used). Other than Python, I'm not actually sure what language implementations would benefit from such a string representation.\r\n\r\nAs a side note, the section in question also lumps together Python (presumably 3) and Rust, though this might result from a misunderstanding that should hopefully be explained above. Rust strings are meant to be[2] valid UTF-8, hence they correspond to sequences of Unicode scalar values, but as explained above, Python strings can be any distinct sequence of Unicode code points.\r\n\r\n---\r\n\r\n[0] Another alternative would be using a variation of WTF-8 that preserves UTF-16 surrogates instead of normalising them to USVs on concatenation, though this seems a bit crazy.\r\n\r\n[1] Technically this is an extension of UTF-32, since UTF-32 itself doesn't allow encoding of code points in the surrogate range.\r\n\r\n[2] This is at least true at some API level, though technically the representation of strings in Rust is allowed to contain arbitrary bytes, where it is up to libraries to avoid emitting invalid UTF-8 to safe code: https://github.com/rust-lang/rust/issues/71033","author":{"url":"https://github.com/Maxdamantus","@type":"Person","name":"Maxdamantus"},"datePublished":"2023-10-21T02:36:31.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":16},"url":"https://github.com/65/stringref/issues/65"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:9721259e-7367-ab65-d8ad-aeeff26360a9
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id9D0A:2C1011:7B87286:A459614:696DFC7F
html-safe-nonce0f137a973004dc68fe969fdf7ed392f168723cff3b66f63af451b96f99e30726
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5RDBBOjJDMTAxMTo3Qjg3Mjg2OkE0NTk2MTQ6Njk2REZDN0YiLCJ2aXNpdG9yX2lkIjoiODM1MjMwNTU4NTI2ODMyNTUwMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmacb5995ba42bb7075b53ba73d1798f0ca87c84fbc25455481b1b6a185350722a0c
hovercard-subject-tagissue:1955235460
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/WebAssembly/stringref/65/issue_layout
twitter:imagehttps://opengraph.githubassets.com/c8d7c23f261eaa029a2a7310bb6bc7b3e941096e42b54a0036f0cad53e4073b2/WebAssembly/stringref/issues/65
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/c8d7c23f261eaa029a2a7310bb6bc7b3e941096e42b54a0036f0cad53e4073b2/WebAssembly/stringref/issues/65
og:image:altI'll start by saying that I'm not well-versed in WebAssembly specifications or proposals, but I happened to come across this proposal, and I'm quite interested in Unicode strings and how they're re...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernameMaxdamantus
hostnamegithub.com
expected-hostnamegithub.com
None4922b452d03cd8dbce479d866a11bc25b59ef6ee2da23aa9b0ddefa6bd4d0064
turbo-cache-controlno-preview
go-importgithub.com/WebAssembly/stringref git https://github.com/WebAssembly/stringref.git
octolytics-dimension-user_id11578470
octolytics-dimension-user_loginWebAssembly
octolytics-dimension-repository_id485975060
octolytics-dimension-repository_nwoWebAssembly/stringref
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id485975060
octolytics-dimension-repository_network_root_nwoWebAssembly/stringref
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release7e5ae23c70136152637ceee8d6faceb35596ec46
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/WebAssembly/stringref/issues/65#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FWebAssembly%2Fstringref%2Fissues%2F65
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FWebAssembly%2Fstringref%2Fissues%2F65
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=WebAssembly%2Fstringref
Reloadhttps://github.com/WebAssembly/stringref/issues/65
Reloadhttps://github.com/WebAssembly/stringref/issues/65
Reloadhttps://github.com/WebAssembly/stringref/issues/65
WebAssembly https://github.com/WebAssembly
stringrefhttps://github.com/WebAssembly/stringref
Notifications https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Fork 3 https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Star 40 https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Code https://github.com/WebAssembly/stringref
Issues 41 https://github.com/WebAssembly/stringref/issues
Pull requests 3 https://github.com/WebAssembly/stringref/pulls
Actions https://github.com/WebAssembly/stringref/actions
Projects 0 https://github.com/WebAssembly/stringref/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/WebAssembly/stringref/security
Please reload this pagehttps://github.com/WebAssembly/stringref/issues/65
Insights https://github.com/WebAssembly/stringref/pulse
Code https://github.com/WebAssembly/stringref
Issues https://github.com/WebAssembly/stringref/issues
Pull requests https://github.com/WebAssembly/stringref/pulls
Actions https://github.com/WebAssembly/stringref/actions
Projects https://github.com/WebAssembly/stringref/projects
Security https://github.com/WebAssembly/stringref/security
Insights https://github.com/WebAssembly/stringref/pulse
New issuehttps://github.com/login?return_to=https://github.com/WebAssembly/stringref/issues/65
New issuehttps://github.com/login?return_to=https://github.com/WebAssembly/stringref/issues/65
Clarity on units of stringhttps://github.com/WebAssembly/stringref/issues/65#top
https://github.com/Maxdamantus
https://github.com/Maxdamantus
Maxdamantushttps://github.com/Maxdamantus
on Oct 21, 2023https://github.com/WebAssembly/stringref/issues/65#issue-1955235460
"What's a string?"https://github.com/WebAssembly/stringref/blob/a64917cd5346f8704e614c4825ebf05737ac5e64/proposals/stringref/Overview.md?plain=1#L50
WTF-16https://simonsapin.github.io/wtf-8/
rust-lang/rust#71033https://github.com/rust-lang/rust/issues/71033
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.