René's URL Explorer Experiment


Title: GitHub - giganticode/codeprep: A toolkit for pre-processing large source code corpora

Open Graph Title: GitHub - giganticode/codeprep: A toolkit for pre-processing large source code corpora

X Title: GitHub - giganticode/codeprep: A toolkit for pre-processing large source code corpora

Description: A toolkit for pre-processing large source code corpora - giganticode/codeprep

Open Graph Description: A toolkit for pre-processing large source code corpora - giganticode/codeprep

X Description: A toolkit for pre-processing large source code corpora - giganticode/codeprep

Opengraph URL: https://github.com/giganticode/codeprep

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:ac2cda73-d1fa-b13f-bd04-b4647cf39cb9
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-idAEF4:1B4AFD:1D649B:25544E:69909887
html-safe-noncec926f1e4e06fab298ce4e657fb2c5730a84d5265507f4012e22695fb8ec80642
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBRUY0OjFCNEFGRDoxRDY0OUI6MjU1NDRFOjY5OTA5ODg3IiwidmlzaXRvcl9pZCI6IjcyMDI0ODc0MDA2OTM1MzY5MDQiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac3ab29a110df0cd8fdc1d51a82f12b3f3a8d789a1159b7ae8ea6fdc32c04dc859
hovercard-subject-tagrepository:179685171
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/giganticode/codeprep
twitter:imagehttps://opengraph.githubassets.com/73df08590576c6508e5f4e880029b8f1eece375c38c08061156eeef0dd7f145f/giganticode/codeprep
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/73df08590576c6508e5f4e880029b8f1eece375c38c08061156eeef0dd7f145f/giganticode/codeprep
og:image:altA toolkit for pre-processing large source code corpora - giganticode/codeprep
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-preview
go-importgithub.com/giganticode/codeprep git https://github.com/giganticode/codeprep.git
octolytics-dimension-user_id49310057
octolytics-dimension-user_logingiganticode
octolytics-dimension-repository_id179685171
octolytics-dimension-repository_nwogiganticode/codeprep
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id179685171
octolytics-dimension-repository_network_root_nwogiganticode/codeprep
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release3b33c5aedc9808f45bc5fcf0b1e4404cf749dac7
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/giganticode/codeprep#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fgiganticode%2Fcodeprep
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fgiganticode%2Fcodeprep
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=giganticode%2Fcodeprep
Reloadhttps://patch-diff.githubusercontent.com/giganticode/codeprep
Reloadhttps://patch-diff.githubusercontent.com/giganticode/codeprep
Reloadhttps://patch-diff.githubusercontent.com/giganticode/codeprep
giganticode https://patch-diff.githubusercontent.com/giganticode
codeprephttps://patch-diff.githubusercontent.com/giganticode/codeprep
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fgiganticode%2Fcodeprep
Fork 10 https://patch-diff.githubusercontent.com/login?return_to=%2Fgiganticode%2Fcodeprep
Star 45 https://patch-diff.githubusercontent.com/login?return_to=%2Fgiganticode%2Fcodeprep
45 stars https://patch-diff.githubusercontent.com/giganticode/codeprep/stargazers
10 forks https://patch-diff.githubusercontent.com/giganticode/codeprep/forks
Branches https://patch-diff.githubusercontent.com/giganticode/codeprep/branches
Tags https://patch-diff.githubusercontent.com/giganticode/codeprep/tags
Activity https://patch-diff.githubusercontent.com/giganticode/codeprep/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2Fgiganticode%2Fcodeprep
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fgiganticode%2Fcodeprep
Code https://patch-diff.githubusercontent.com/giganticode/codeprep
Issues 6 https://patch-diff.githubusercontent.com/giganticode/codeprep/issues
Pull requests 2 https://patch-diff.githubusercontent.com/giganticode/codeprep/pulls
Actions https://patch-diff.githubusercontent.com/giganticode/codeprep/actions
Projects 0 https://patch-diff.githubusercontent.com/giganticode/codeprep/projects
Security 0 https://patch-diff.githubusercontent.com/giganticode/codeprep/security
Insights https://patch-diff.githubusercontent.com/giganticode/codeprep/pulse
Code https://patch-diff.githubusercontent.com/giganticode/codeprep
Issues https://patch-diff.githubusercontent.com/giganticode/codeprep/issues
Pull requests https://patch-diff.githubusercontent.com/giganticode/codeprep/pulls
Actions https://patch-diff.githubusercontent.com/giganticode/codeprep/actions
Projects https://patch-diff.githubusercontent.com/giganticode/codeprep/projects
Security https://patch-diff.githubusercontent.com/giganticode/codeprep/security
Insights https://patch-diff.githubusercontent.com/giganticode/codeprep/pulse
Brancheshttps://patch-diff.githubusercontent.com/giganticode/codeprep/branches
Tagshttps://patch-diff.githubusercontent.com/giganticode/codeprep/tags
https://patch-diff.githubusercontent.com/giganticode/codeprep/branches
https://patch-diff.githubusercontent.com/giganticode/codeprep/tags
319 Commitshttps://patch-diff.githubusercontent.com/giganticode/codeprep/commits/master/
https://patch-diff.githubusercontent.com/giganticode/codeprep/commits/master/
.ideahttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/.idea
.ideahttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/.idea
.reusehttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/.reuse
.reusehttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/.reuse
LICENSEShttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/LICENSES
LICENSEShttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/LICENSES
codeprephttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/codeprep
codeprephttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/codeprep
reports/bpe/wild-bpehttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/reports/bpe/wild-bpe
reports/bpe/wild-bpehttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/reports/bpe/wild-bpe
test-data/test-corpushttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/test-data/test-corpus
test-data/test-corpushttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/test-data/test-corpus
testshttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/tests
testshttps://patch-diff.githubusercontent.com/giganticode/codeprep/tree/master/tests
.gitignorehttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/.gitignore
.travis.ymlhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/.travis.yml
.travis.ymlhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/.travis.yml
MANIFEST.inhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/MANIFEST.in
MANIFEST.inhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/MANIFEST.in
README.mdhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/README.md
README.mdhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/README.md
requirements-dev.txthttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/requirements-dev.txt
requirements-dev.txthttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/requirements-dev.txt
requirements.txthttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/requirements.txt
requirements.txthttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/requirements.txt
setup.pyhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/setup.py
setup.pyhttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/setup.py
tox.inihttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/tox.ini
tox.inihttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/tox.ini
READMEhttps://patch-diff.githubusercontent.com/giganticode/codeprep
https://patch-diff.githubusercontent.com/giganticode/codeprep#codeprep
https://travis-ci.org/giganticode/codeprep
https://codeclimate.com/github/giganticode/codeprep/maintainability
https://codeclimate.com/github/giganticode/codeprep/test_coverage
https://pypi.python.org/pypi/codeprep/
http://joss.theoj.org/papers/10.21105/joss.00653http://joss.theoj.org/papers/10.21105/joss.00653
https://patch-diff.githubusercontent.com/giganticode/codeprep#getting-started
https://github.com/casics/spiral/https://github.com/casics/spiral/
herehttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/codeprep/cli/spec.py
https://patch-diff.githubusercontent.com/giganticode/codeprep#usage-examples
https://patch-diff.githubusercontent.com/giganticode/codeprep#basic-splitting
https://patch-diff.githubusercontent.com/giganticode/codeprep#tokenize-but-dont-split-identifiers
https://patch-diff.githubusercontent.com/giganticode/codeprep#bpe-byte-pair-encoding
the Github Java Corpushttp://groups.inf.ed.ac.uk/cup/javaGithub/
Learning custom BPE codeshttps://patch-diff.githubusercontent.com/giganticode/codeprep#Learning-custom-BPE-codes
docshttps://patch-diff.githubusercontent.com/giganticode/codeprep/blob/master/codeprep/cli/spec.py
https://patch-diff.githubusercontent.com/giganticode/codeprep#calculate-vocabulary
https://patch-diff.githubusercontent.com/giganticode/codeprep#learning-custom-bpe-codes
basic preprocessinghttps://patch-diff.githubusercontent.com/giganticode/codeprep#basic-splitting
Tweaking preprocessinghttps://patch-diff.githubusercontent.com/giganticode/codeprep#tweaking-preprocessing
https://patch-diff.githubusercontent.com/giganticode/codeprep#additional-options
https://patch-diff.githubusercontent.com/giganticode/codeprep#tweaking-preprocessing
https://patch-diff.githubusercontent.com/giganticode/codeprep#specifying-the-language
https://patch-diff.githubusercontent.com/giganticode/codeprep#miscellaneous
https://patch-diff.githubusercontent.com/giganticode/codeprep#getting-help
https://patch-diff.githubusercontent.com/giganticode/codeprep#paper
Big Code != Big Vocabulary: Open-Vocabulary Models for Source Codehttps://arxiv.org/pdf/2003.07914.pdf
https://patch-diff.githubusercontent.com/giganticode/codeprep#advanced
https://patch-diff.githubusercontent.com/giganticode/codeprep#caching
https://patch-diff.githubusercontent.com/giganticode/codeprep#releases
https://patch-diff.githubusercontent.com/giganticode/codeprep#103
https://patch-diff.githubusercontent.com/giganticode/codeprep#101
https://patch-diff.githubusercontent.com/giganticode/codeprep#100
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha12
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha11-not-backward-compatible-with-100-alpha10
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha10-not-backward-compatible-with-100-alpha9
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha9-not-backward-compatible-with-100-alpha7
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha7-not-backward-compatible-with-100-alpha6
https://patch-diff.githubusercontent.com/giganticode/codeprep#100-alpha6
natural-language-processing https://patch-diff.githubusercontent.com/topics/natural-language-processing
language-modeling https://patch-diff.githubusercontent.com/topics/language-modeling
word-segmentation https://patch-diff.githubusercontent.com/topics/word-segmentation
mining-software-repositories https://patch-diff.githubusercontent.com/topics/mining-software-repositories
source-code-analysis https://patch-diff.githubusercontent.com/topics/source-code-analysis
Readme https://patch-diff.githubusercontent.com/giganticode/codeprep#readme-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/giganticode/codeprep
Activityhttps://patch-diff.githubusercontent.com/giganticode/codeprep/activity
Custom propertieshttps://patch-diff.githubusercontent.com/giganticode/codeprep/custom-properties
45 starshttps://patch-diff.githubusercontent.com/giganticode/codeprep/stargazers
3 watchinghttps://patch-diff.githubusercontent.com/giganticode/codeprep/watchers
10 forkshttps://patch-diff.githubusercontent.com/giganticode/codeprep/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fgiganticode%2Fcodeprep&report=giganticode+%28user%29
Releases 3https://patch-diff.githubusercontent.com/giganticode/codeprep/releases
v1.0.5 Latest Apr 21, 2021 https://patch-diff.githubusercontent.com/giganticode/codeprep/releases/tag/v1.0.5
+ 2 releaseshttps://patch-diff.githubusercontent.com/giganticode/codeprep/releases
Packages 0https://patch-diff.githubusercontent.com/orgs/giganticode/packages?repo_name=codeprep
Please reload this pagehttps://patch-diff.githubusercontent.com/giganticode/codeprep
Contributors 4https://patch-diff.githubusercontent.com/giganticode/codeprep/graphs/contributors
Please reload this pagehttps://patch-diff.githubusercontent.com/giganticode/codeprep
Python 91.5% https://patch-diff.githubusercontent.com/giganticode/codeprep/search?l=python
Java 8.5% https://patch-diff.githubusercontent.com/giganticode/codeprep/search?l=java
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.