René's URL Explorer Experiment


Title: Resolve tokenization issues causing BitFunnel parser crashes by hausdorff · Pull Request #6 · BitFunnel/Workbench · GitHub

Open Graph Title: Resolve tokenization issues causing BitFunnel parser crashes by hausdorff · Pull Request #6 · BitFunnel/Workbench

X Title: Resolve tokenization issues causing BitFunnel parser crashes by hausdorff · Pull Request #6 · BitFunnel/Workbench

Description: Java and Lucene based tools for BitFunnel corpus preparation - Resolve tokenization issues causing BitFunnel parser crashes by hausdorff · Pull Request #6 · BitFunnel/Workbench

Open Graph Description: The corpus as processed by the current version of Workbench contains characters (mostly punctuation) that cause the BitFunnel parser to crash. This commit will cause Workbench to handle these cases...

X Description: The corpus as processed by the current version of Workbench contains characters (mostly punctuation) that cause the BitFunnel parser to crash. This commit will cause Workbench to handle these cases...

Opengraph URL: https://github.com/BitFunnel/Workbench/pull/6

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/_view_fragments/voltron/pull_requests/show/:user_id/:repository/:id/pull_request_layout(.:format)
route-controllervoltron_pull_requests_fragments
route-actionpull_request_layout
fetch-noncev2:271864eb-7df9-df3a-59e2-cb9900918e71
current-catalog-service-hashae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b
request-idCFBA:1B42DF:60C6C1:7F3993:698FB115
html-safe-nonce56ec4517a6af745960673db300b4228d83b117f1b34ceaecf193b6e13bea5ba1
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDRkJBOjFCNDJERjo2MEM2QzE6N0YzOTkzOjY5OEZCMTE1IiwidmlzaXRvcl9pZCI6IjYyMDkyMTU4MTMzNjgzOTQwMDUiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmacea19fcefbc5fce62f61689180322d20d7f4ceb3558deb95de2bdf74741049ff0
hovercard-subject-tagpull_request:91445570
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/pull_requests_fragments/pull_request_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/voltron/pull_requests/show/BitFunnel/Workbench/6/pull_request_layout
twitter:imagehttps://opengraph.githubassets.com/debe6a1c5ec661b72dec565701eef0909ade2814819cf3958812d6a502ca12cb/BitFunnel/Workbench/pull/6
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/debe6a1c5ec661b72dec565701eef0909ade2814819cf3958812d6a502ca12cb/BitFunnel/Workbench/pull/6
og:image:altThe corpus as processed by the current version of Workbench contains characters (mostly punctuation) that cause the BitFunnel parser to crash. This commit will cause Workbench to handle these cases...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamehausdorff
hostnamegithub.com
expected-hostnamegithub.com
None6df359c0989bb4eb7656e0047ab7a57a6657880db88f5a202f4e51ddbc3dfce8
turbo-cache-controlno-cache
go-importgithub.com/BitFunnel/Workbench git https://github.com/BitFunnel/Workbench.git
octolytics-dimension-user_id18270860
octolytics-dimension-user_loginBitFunnel
octolytics-dimension-repository_id58910970
octolytics-dimension-repository_nwoBitFunnel/Workbench
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id58910970
octolytics-dimension-repository_network_root_nwoBitFunnel/Workbench
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
released09a7639fca70dcd33f2b127cabd422a73b10aef
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FBitFunnel%2FWorkbench%2Fpull%2F6
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FBitFunnel%2FWorkbench%2Fpull%2F6
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fpull_requests_fragments%2Fpull_request_layout&source=header-repo&source_repo=BitFunnel%2FWorkbench
Reloadhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
Reloadhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
Reloadhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
BitFunnel https://patch-diff.githubusercontent.com/BitFunnel
Workbenchhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FBitFunnel%2FWorkbench
Fork 4 https://patch-diff.githubusercontent.com/login?return_to=%2FBitFunnel%2FWorkbench
Star 20 https://patch-diff.githubusercontent.com/login?return_to=%2FBitFunnel%2FWorkbench
Code https://patch-diff.githubusercontent.com/BitFunnel/Workbench
Issues 10 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/issues
Pull requests 1 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pulls
Actions https://patch-diff.githubusercontent.com/BitFunnel/Workbench/actions
Projects 0 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/projects
Security 0 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/security
Insights https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pulse
Code https://patch-diff.githubusercontent.com/BitFunnel/Workbench
Issues https://patch-diff.githubusercontent.com/BitFunnel/Workbench/issues
Pull requests https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pulls
Actions https://patch-diff.githubusercontent.com/BitFunnel/Workbench/actions
Projects https://patch-diff.githubusercontent.com/BitFunnel/Workbench/projects
Security https://patch-diff.githubusercontent.com/BitFunnel/Workbench/security
Insights https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pulse
hausdorffhttps://patch-diff.githubusercontent.com/hausdorff
BitFunnel:masterhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/tree/master
hausdorff:tokenizer_fixeshttps://patch-diff.githubusercontent.com/hausdorff/Workbench/tree/tokenizer_fixes
Conversationhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
Commits1 (1)https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6/commits
Checkshttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6/checks
Files changedhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6/files
Resolve tokenization issues causing BitFunnel parser crasheshttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6#top
hausdorffhttps://patch-diff.githubusercontent.com/hausdorff
BitFunnel:masterhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/tree/master
hausdorff:tokenizer_fixeshttps://patch-diff.githubusercontent.com/hausdorff/Workbench/tree/tokenizer_fixes
https://patch-diff.githubusercontent.com/hausdorff
hausdorffhttps://patch-diff.githubusercontent.com/hausdorff
Oct 28, 2016https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6#issue-186012916
Please reload this pagehttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
https://patch-diff.githubusercontent.com/hausdorff
Resolve tokenization issues causing BitFunnel parser crasheshttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6/commits/87caeed5d45b2c134d6ab9770666218243576174
87caeedhttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6/commits/87caeed5d45b2c134d6ab9770666218243576174
https://patch-diff.githubusercontent.com/MikeHopcroft
MikeHopcrofthttps://patch-diff.githubusercontent.com/MikeHopcroft
Nov 23, 2016 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6#ref-issue-173610254
Wikipedia extraction seems to be giving bigrams #3 https://patch-diff.githubusercontent.com/BitFunnel/Workbench/issues/3
https://github.co/hiddenchars
https://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/{{ revealButtonHref }}
Sign up for freehttps://patch-diff.githubusercontent.com/join?source=comment-repo
Sign in to commenthttps://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FBitFunnel%2FWorkbench%2Fpull%2F6
Please reload this pagehttps://patch-diff.githubusercontent.com/BitFunnel/Workbench/pull/6
https://patch-diff.githubusercontent.com/hausdorff
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.