René's URL Explorer Experiment


Title: Substrings: slices vs. copies · Issue #63 · WebAssembly/stringref · GitHub

Open Graph Title: Substrings: slices vs. copies · Issue #63 · WebAssembly/stringref

X Title: Substrings: slices vs. copies · Issue #63 · WebAssembly/stringref

Description: The stringview_wtf16.slice operation creates a string S2 that's a substring of another string S1. V8 originally implemented this by reusing its existing machinery for creating string slices, i.e. S2 would only store a pointer to S1, a st...

Open Graph Description: The stringview_wtf16.slice operation creates a string S2 that's a substring of another string S1. V8 originally implemented this by reusing its existing machinery for creating string slices, i.e. S...

X Description: The stringview_wtf16.slice operation creates a string S2 that's a substring of another string S1. V8 originally implemented this by reusing its existing machinery for creating string slices, i....

Opengraph URL: https://github.com/WebAssembly/stringref/issues/63

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Substrings: slices vs. copies","articleBody":"The `stringview_wtf16.slice` operation creates a string S2 that's a substring of another string S1. V8 originally implemented this by reusing its existing machinery for creating string slices, i.e. S2 would only store a pointer to S1, a start offset, and a length, making the `slice` operation itself very cheap because it doesn't need to copy any characters, and making S2 very memory-efficient (especially when it's long, so that _many_ characters are shared with S1).\r\n\r\nLike almost any engineering technique, this approach is a tradeoff with pros and cons. In this case, its drawback is that a relatively short string (S2) can be keeping an otherwise-unreachable other string (S1) alive, which might be much bigger. We have just encountered a real-world use case where this happens a lot and has detrimental effects on memory consumption: user-space JSON parsing. Any string values in the decoded data keep the entire JSON source string in memory when they are represented as string slices. (Note that the specifics of the JSON format are irrelevant, the same would happen for any other string-based serialization format where string values occur as substrings of the serialized data, JSON just happens to be a very common case of this.)\r\nFor comparison, JavaScript doesn't run into this problem, because the implementation of `JSON.parse` never creates sliced strings, because it can make a very reasonable assumption that the JSON source string is meant to be thrown away when the parsing operation is complete.\r\n\r\nAs a short-term mitigation, we have [just changed](https://chromium-review.googlesource.com/c/v8/v8/+/4749320) V8's implementation of `stringview_wtf16.slice` to make full copies when the substring is at most half as long as the original string.\r\nA possible longer-term engine-side mitigation might be to build a tracking mechanism across the entire managed heap that figures out which long strings are being used by which slices, and can then decide whether converting these slices to copies would be overall beneficial or not. That's fairly heavyweight machinery for solving a rather limited problem though, so while I'd be excited to see this prototyped, I'm not convinced that it would actually be a reasonable choice for engines to spend the engineering hours to build such machinery and, more importantly, the runtime CPU cycles to operate it.\r\n\r\nAs a proper solution, it would seem reasonable to me to add a way to let Wasm modules explicitly control the desired behavior:\r\n- One option for this would be to have two variants of the slice operation: `slice_allow_shallow` and `slice_force_copy` (illustrative names, to be bikeshedded).\r\n- Another option would be to have an explicit \"make this a simple/flat/direct string\" operation. That could be more generic/flexible (e.g. it could also be used to flatten ropes resulting from concatenation), but also seems harder to spec/name, because it would only affect implementation internals, with no change to observable values. It might also be somewhat harder to implement, because for maximum efficiency engines would have to fuse it with the preceding operation (to avoid allocating a short-lived intermediate string). For optimizing compilers that's not a big deal, but not every engine wants to implement an optimizing compiler, and not every function always runs in optimized mode.\r\n- There may well be additional options to accomplish the same goal.\r\n\r\nI don't feel strongly about how this will be solved, I'm just saying that I think it would make sense to have _some_ improvement over the status quo.\r\n\r\nWithout new instructions, copies of string slices can be forced by converting the string to an array and then back to a string (e.g. using `string.encode_wtf16_array` + `string.new_wtf16_array`, or their wtf8 equivalents); engines could conceivably learn to recognize this pattern and shortcut it to avoid the double copy (but not shortcut it too much: if they avoided both copies, that would defeat the point). IMHO this would not be a pretty solution though; it feels similar to dubious JavaScript performance tricks like \"temporarily install this object as the prototype of some other dummy object just to make the JS engine to switch its internal representation into a certain mode\".\r\n\r\n(Same discussion for imported strings: https://github.com/WebAssembly/js-string-builtins/issues/1)","author":{"url":"https://github.com/jakobkummerow","@type":"Person","name":"jakobkummerow"},"datePublished":"2023-08-04T10:44:07.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/63/stringref/issues/63"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:1c3ed3a1-6769-47c8-9b8b-c4be82748580
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id903E:2CE7C1:8F105E1:BD2244E:696DFCD1
html-safe-noncecc439c110741d70914f8597e51bf7e807a43452161dc5af6b0002985d1cab9ee
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5MDNFOjJDRTdDMTo4RjEwNUUxOkJEMjI0NEU6Njk2REZDRDEiLCJ2aXNpdG9yX2lkIjoiODgzMDg2MzcwNDgxNTY5NzEwNSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac87bb741a6188f4906817261b993af9db48afea27abf304267f54bc6cb7f4f330
hovercard-subject-tagissue:1836504976
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/WebAssembly/stringref/63/issue_layout
twitter:imagehttps://opengraph.githubassets.com/abf2a2270944f673b495e4e6fe4674c69df7a050a316cd03374c13998da07b2d/WebAssembly/stringref/issues/63
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/abf2a2270944f673b495e4e6fe4674c69df7a050a316cd03374c13998da07b2d/WebAssembly/stringref/issues/63
og:image:altThe stringview_wtf16.slice operation creates a string S2 that's a substring of another string S1. V8 originally implemented this by reusing its existing machinery for creating string slices, i.e. S...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamejakobkummerow
hostnamegithub.com
expected-hostnamegithub.com
None4922b452d03cd8dbce479d866a11bc25b59ef6ee2da23aa9b0ddefa6bd4d0064
turbo-cache-controlno-preview
go-importgithub.com/WebAssembly/stringref git https://github.com/WebAssembly/stringref.git
octolytics-dimension-user_id11578470
octolytics-dimension-user_loginWebAssembly
octolytics-dimension-repository_id485975060
octolytics-dimension-repository_nwoWebAssembly/stringref
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id485975060
octolytics-dimension-repository_network_root_nwoWebAssembly/stringref
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release7e5ae23c70136152637ceee8d6faceb35596ec46
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/WebAssembly/stringref/issues/63#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FWebAssembly%2Fstringref%2Fissues%2F63
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FWebAssembly%2Fstringref%2Fissues%2F63
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=WebAssembly%2Fstringref
Reloadhttps://github.com/WebAssembly/stringref/issues/63
Reloadhttps://github.com/WebAssembly/stringref/issues/63
Reloadhttps://github.com/WebAssembly/stringref/issues/63
WebAssembly https://github.com/WebAssembly
stringrefhttps://github.com/WebAssembly/stringref
Notifications https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Fork 3 https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Star 40 https://github.com/login?return_to=%2FWebAssembly%2Fstringref
Code https://github.com/WebAssembly/stringref
Issues 41 https://github.com/WebAssembly/stringref/issues
Pull requests 3 https://github.com/WebAssembly/stringref/pulls
Actions https://github.com/WebAssembly/stringref/actions
Projects 0 https://github.com/WebAssembly/stringref/projects
Security Uh oh! There was an error while loading. Please reload this page. https://github.com/WebAssembly/stringref/security
Please reload this pagehttps://github.com/WebAssembly/stringref/issues/63
Insights https://github.com/WebAssembly/stringref/pulse
Code https://github.com/WebAssembly/stringref
Issues https://github.com/WebAssembly/stringref/issues
Pull requests https://github.com/WebAssembly/stringref/pulls
Actions https://github.com/WebAssembly/stringref/actions
Projects https://github.com/WebAssembly/stringref/projects
Security https://github.com/WebAssembly/stringref/security
Insights https://github.com/WebAssembly/stringref/pulse
New issuehttps://github.com/login?return_to=https://github.com/WebAssembly/stringref/issues/63
New issuehttps://github.com/login?return_to=https://github.com/WebAssembly/stringref/issues/63
Substrings: slices vs. copieshttps://github.com/WebAssembly/stringref/issues/63#top
https://github.com/jakobkummerow
https://github.com/jakobkummerow
jakobkummerowhttps://github.com/jakobkummerow
on Aug 4, 2023https://github.com/WebAssembly/stringref/issues/63#issue-1836504976
just changedhttps://chromium-review.googlesource.com/c/v8/v8/+/4749320
WebAssembly/js-string-builtins#1https://github.com/WebAssembly/js-string-builtins/issues/1
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.