René's URL Explorer Experiment


Title: GitHub - CodeForget/olmocr: Toolkit for linearizing PDFs for LLM datasets/training

Open Graph Title: GitHub - CodeForget/olmocr: Toolkit for linearizing PDFs for LLM datasets/training

X Title: GitHub - CodeForget/olmocr: Toolkit for linearizing PDFs for LLM datasets/training

Description: Toolkit for linearizing PDFs for LLM datasets/training - CodeForget/olmocr

Open Graph Description: Toolkit for linearizing PDFs for LLM datasets/training - CodeForget/olmocr

X Description: Toolkit for linearizing PDFs for LLM datasets/training - CodeForget/olmocr

Opengraph URL: https://github.com/CodeForget/olmocr

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:8b69ca67-8725-6530-e78c-f2a340eb877c
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-idE4A8:392487:5DAF3DD:8004A52:69784B41
html-safe-nonce0fd541ca3ba2e2b63490b0964fdfa0db35c1af36209fb8e77076bb612a423744
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFNEE4OjM5MjQ4Nzo1REFGM0REOjgwMDRBNTI6Njk3ODRCNDEiLCJ2aXNpdG9yX2lkIjoiNjU2NDE1Mjg1NDM0NjY1NzkzIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmacb97a2c0ad6f41091a2911aa7c1316c7d8ea20583088923fe2b6b639f1c0e724f
hovercard-subject-tagrepository:1061354792
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/CodeForget/olmocr
twitter:imagehttps://opengraph.githubassets.com/f2d7195700a8163f884e04cb0f5907380f8955ec50aad1ffffd14c911c19c13e/CodeForget/olmocr
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/f2d7195700a8163f884e04cb0f5907380f8955ec50aad1ffffd14c911c19c13e/CodeForget/olmocr
og:image:altToolkit for linearizing PDFs for LLM datasets/training - CodeForget/olmocr
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50
turbo-cache-controlno-preview
go-importgithub.com/CodeForget/olmocr git https://github.com/CodeForget/olmocr.git
octolytics-dimension-user_id21257221
octolytics-dimension-user_loginCodeForget
octolytics-dimension-repository_id1061354792
octolytics-dimension-repository_nwoCodeForget/olmocr
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forktrue
octolytics-dimension-repository_parent_id858798469
octolytics-dimension-repository_parent_nwoallenai/olmocr
octolytics-dimension-repository_network_root_id858798469
octolytics-dimension-repository_network_root_nwoallenai/olmocr
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release520b65a872113b919c1bbdb03834a50af15859fd
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/CodeForget/olmocr#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FCodeForget%2Folmocr
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FCodeForget%2Folmocr
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=CodeForget%2Folmocr
Reloadhttps://patch-diff.githubusercontent.com/CodeForget/olmocr
Reloadhttps://patch-diff.githubusercontent.com/CodeForget/olmocr
Reloadhttps://patch-diff.githubusercontent.com/CodeForget/olmocr
CodeForget https://patch-diff.githubusercontent.com/CodeForget
olmocrhttps://patch-diff.githubusercontent.com/CodeForget/olmocr
allenai/olmocrhttps://patch-diff.githubusercontent.com/allenai/olmocr
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FCodeForget%2Folmocr
Fork 0 https://patch-diff.githubusercontent.com/login?return_to=%2FCodeForget%2Folmocr
Star 0 https://patch-diff.githubusercontent.com/login?return_to=%2FCodeForget%2Folmocr
Apache-2.0 license https://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/LICENSE
0 stars https://patch-diff.githubusercontent.com/CodeForget/olmocr/stargazers
1.3k forks https://patch-diff.githubusercontent.com/CodeForget/olmocr/forks
Branches https://patch-diff.githubusercontent.com/CodeForget/olmocr/branches
Tags https://patch-diff.githubusercontent.com/CodeForget/olmocr/tags
Activity https://patch-diff.githubusercontent.com/CodeForget/olmocr/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2FCodeForget%2Folmocr
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FCodeForget%2Folmocr
Code https://patch-diff.githubusercontent.com/CodeForget/olmocr
Pull requests 0 https://patch-diff.githubusercontent.com/CodeForget/olmocr/pulls
Actions https://patch-diff.githubusercontent.com/CodeForget/olmocr/actions
Projects 0 https://patch-diff.githubusercontent.com/CodeForget/olmocr/projects
Security 0 https://patch-diff.githubusercontent.com/CodeForget/olmocr/security
Insights https://patch-diff.githubusercontent.com/CodeForget/olmocr/pulse
Code https://patch-diff.githubusercontent.com/CodeForget/olmocr
Pull requests https://patch-diff.githubusercontent.com/CodeForget/olmocr/pulls
Actions https://patch-diff.githubusercontent.com/CodeForget/olmocr/actions
Projects https://patch-diff.githubusercontent.com/CodeForget/olmocr/projects
Security https://patch-diff.githubusercontent.com/CodeForget/olmocr/security
Insights https://patch-diff.githubusercontent.com/CodeForget/olmocr/pulse
Brancheshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/branches
Tagshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tags
https://patch-diff.githubusercontent.com/CodeForget/olmocr/branches
https://patch-diff.githubusercontent.com/CodeForget/olmocr/tags
1,498 Commitshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/commits/main/
https://patch-diff.githubusercontent.com/CodeForget/olmocr/commits/main/
.githubhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/.github
.githubhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/.github
docshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/docs
docshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/docs
olmocrhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/olmocr
olmocrhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/olmocr
scriptshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/scripts
scriptshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/scripts
testshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/tests
testshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/tree/main/tests
.dockerignorehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.dockerignore
.dockerignorehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.dockerignore
.gitignorehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.gitignore
.readthedocs.yamlhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.readthedocs.yaml
.readthedocs.yamlhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/.readthedocs.yaml
CHANGELOG.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/CHANGELOG.md
CHANGELOG.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/CHANGELOG.md
Dockerfilehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/Dockerfile
Dockerfilehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/Dockerfile
LICENSEhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/LICENSE
LICENSEhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/LICENSE
Makefilehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/Makefile
Makefilehttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/Makefile
README.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/README.md
README.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/README.md
RELEASE_PROCESS.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/RELEASE_PROCESS.md
RELEASE_PROCESS.mdhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/RELEASE_PROCESS.md
pyproject.tomlhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/pyproject.toml
pyproject.tomlhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/blob/main/pyproject.toml
READMEhttps://patch-diff.githubusercontent.com/CodeForget/olmocr
Contributinghttps://patch-diff.githubusercontent.com/CodeForget/olmocr
Licensehttps://patch-diff.githubusercontent.com/CodeForget/olmocr
https://private-user-images.githubusercontent.com/178819005/407043550-d70c8644-3e64-4230-98c3-c52fddaeccb6.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Njk0OTE1NjUsIm5iZiI6MTc2OTQ5MTI2NSwicGF0aCI6Ii8xNzg4MTkwMDUvNDA3MDQzNTUwLWQ3MGM4NjQ0LTNlNjQtNDIzMC05OGMzLWM1MmZkZGFlY2NiNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwMTI3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDEyN1QwNTIxMDVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lZjA2MmEyOTIyZWFiZDU0NWFjMzFiYWUwY2E2ZGQ1YjcxMmEzNWZjNDg3YmQ1ZjE3MDg1Y2QzNWI2ODNlNDdkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ClBuacUGhhfSwSvEfWKX3y-qYOptvXyjjShC9NYgzhg
https://github.com/allenai/OLMo/blob/main/LICENSE
https://github.com/allenai/olmocr/releases
https://olmocr.allenai.org/papers/olmocr.pdf
https://olmocr.allenai.org
https://discord.gg/sZq3jTNVNG
https://olmocr.allenai.org/https://olmocr.allenai.org/
https://patch-diff.githubusercontent.com/CodeForget/olmocr#news
New model releasehttps://huggingface.co/allenai/olmOCR-7B-0825-FP8
New model releasehttps://huggingface.co/allenai/olmOCR-7B-0725-FP8
olmOCR-Benchhttps://github.com/allenai/olmocr/tree/main/olmocr/bench
trainer codehttps://github.com/allenai/olmocr/tree/main/olmocr/train
See Docker usagehttps://patch-diff.githubusercontent.com/CodeForget/olmocr#using-docker
olmOCR-Benchhttps://github.com/allenai/olmocr/tree/main/olmocr/bench
https://patch-diff.githubusercontent.com/CodeForget/olmocr#benchmark
olmOCR-Benchhttps://github.com/allenai/olmocr/tree/main/olmocr/bench
https://patch-diff.githubusercontent.com/CodeForget/olmocr#installation
https://patch-diff.githubusercontent.com/CodeForget/olmocr#local-usage-example
web demohttps://olmocr.allen.ai/
sglanghttps://github.com/sgl-project/sglang
https://patch-diff.githubusercontent.com/CodeForget/olmocr#using-external-vllm-server
https://patch-diff.githubusercontent.com/CodeForget/olmocr#viewing-results
Dolmahttps://github.com/allenai/dolma
https://patch-diff.githubusercontent.com/CodeForget/olmocr#multi-node--cluster-usage
beakerhttps://www.beaker.org
https://patch-diff.githubusercontent.com/CodeForget/olmocr#using-docker
Docker Hubhttps://hub.docker.com/r/alleninstituteforai/olmocr
https://patch-diff.githubusercontent.com/CodeForget/olmocr#full-documentation-for-the-pipeline
https://patch-diff.githubusercontent.com/CodeForget/olmocr#code-overview
buildsilver.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/data/buildsilver.py
runeval.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/eval/runeval.py
filter.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/filter/filter.py
train.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/train/train.py
pipeline.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/pipeline.py
Dolma docshttps://github.com/allenai/dolma
dolmaviewer.pyhttps://github.com/allenai/olmocr/blob/main/olmocr/viewer/dolmaviewer.py
https://patch-diff.githubusercontent.com/CodeForget/olmocr#team
the Allen Institute for Artificial Intelligence (AI2)https://allenai.org/
our contributorshttps://github.com/allenai/olmocr/graphs/contributors
https://patch-diff.githubusercontent.com/CodeForget/olmocr#license
Apache 2.0https://www.apache.org/licenses/LICENSE-2.0
on GitHubhttps://github.com/allenai/olmocr/blob/main/LICENSE
https://patch-diff.githubusercontent.com/CodeForget/olmocr#citing
Readme https://patch-diff.githubusercontent.com/CodeForget/olmocr#readme-ov-file
Apache-2.0 license https://patch-diff.githubusercontent.com/CodeForget/olmocr#Apache-2.0-1-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/CodeForget/olmocr
Activityhttps://patch-diff.githubusercontent.com/CodeForget/olmocr/activity
0 starshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/stargazers
0 watchinghttps://patch-diff.githubusercontent.com/CodeForget/olmocr/watchers
0 forkshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2FCodeForget%2Folmocr&report=CodeForget+%28user%29
Releaseshttps://patch-diff.githubusercontent.com/CodeForget/olmocr/releases
Packages 0https://patch-diff.githubusercontent.com/users/CodeForget/packages?repo_name=olmocr
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.