RenΓ©'s URL Explorer Experiment


Title: GitHub - deependujha/litdata: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.

Open Graph Title: GitHub - deependujha/litdata: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.

X Title: GitHub - deependujha/litdata: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.

Description: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training. - deependujha/litdata

Open Graph Description: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training. - deependujha/litdata

X Description: Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training. - deependujha/litdata

Opengraph URL: https://github.com/deependujha/litdata

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:d329b100-84d6-901e-e0c3-ed2cf6277dc1
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-id9746:BCEB:213BF36:2AF245A:69928A89
html-safe-nonce39fd2e86ca8e0fa3cdc2e30d3806845884f0a7398c35e39b7ce860ace507a265
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NzQ2OkJDRUI6MjEzQkYzNjoyQUYyNDVBOjY5OTI4QTg5IiwidmlzaXRvcl9pZCI6IjIyMjYyMzIzMzg5OTk4MzkzNjkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmacbd721870204c7f7b37a9bd2536840c508fcd8891fe973186971a51df26ec3e0b
hovercard-subject-tagrepository:809607184
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/deependujha/litdata
twitter:imagehttps://opengraph.githubassets.com/801a1e0bfb86537e23f3ee8b31157bfc222b2892902a5d6a6f62e1cd079ba6e9/deependujha/litdata
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/801a1e0bfb86537e23f3ee8b31157bfc222b2892902a5d6a6f62e1cd079ba6e9/deependujha/litdata
og:image:altStreamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training. - deependujha/litdata
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-preview
go-importgithub.com/deependujha/litdata git https://github.com/deependujha/litdata.git
octolytics-dimension-user_id76887609
octolytics-dimension-user_logindeependujha
octolytics-dimension-repository_id809607184
octolytics-dimension-repository_nwodeependujha/litdata
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forktrue
octolytics-dimension-repository_parent_id758163683
octolytics-dimension-repository_parent_nwoLightning-AI/litData
octolytics-dimension-repository_network_root_id758163683
octolytics-dimension-repository_network_root_nwoLightning-AI/litData
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release848bc6032dcc93a9a7301dcc3f379a72ba13b96e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/deependujha/litdata#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fdeependujha%2Flitdata
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fdeependujha%2Flitdata
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=deependujha%2Flitdata
Reloadhttps://patch-diff.githubusercontent.com/deependujha/litdata
Reloadhttps://patch-diff.githubusercontent.com/deependujha/litdata
Reloadhttps://patch-diff.githubusercontent.com/deependujha/litdata
deependujha https://patch-diff.githubusercontent.com/deependujha
litdatahttps://patch-diff.githubusercontent.com/deependujha/litdata
Lightning-AI/litDatahttps://patch-diff.githubusercontent.com/Lightning-AI/litData
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fdeependujha%2Flitdata
Fork 0 https://patch-diff.githubusercontent.com/login?return_to=%2Fdeependujha%2Flitdata
Star 0 https://patch-diff.githubusercontent.com/login?return_to=%2Fdeependujha%2Flitdata
Apache-2.0 license https://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/LICENSE
0 stars https://patch-diff.githubusercontent.com/deependujha/litdata/stargazers
87 forks https://patch-diff.githubusercontent.com/deependujha/litdata/forks
Branches https://patch-diff.githubusercontent.com/deependujha/litdata/branches
Tags https://patch-diff.githubusercontent.com/deependujha/litdata/tags
Activity https://patch-diff.githubusercontent.com/deependujha/litdata/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2Fdeependujha%2Flitdata
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fdeependujha%2Flitdata
Code https://patch-diff.githubusercontent.com/deependujha/litdata
Pull requests 0 https://patch-diff.githubusercontent.com/deependujha/litdata/pulls
Actions https://patch-diff.githubusercontent.com/deependujha/litdata/actions
Projects 0 https://patch-diff.githubusercontent.com/deependujha/litdata/projects
Security 0 https://patch-diff.githubusercontent.com/deependujha/litdata/security
Insights https://patch-diff.githubusercontent.com/deependujha/litdata/pulse
Code https://patch-diff.githubusercontent.com/deependujha/litdata
Pull requests https://patch-diff.githubusercontent.com/deependujha/litdata/pulls
Actions https://patch-diff.githubusercontent.com/deependujha/litdata/actions
Projects https://patch-diff.githubusercontent.com/deependujha/litdata/projects
Security https://patch-diff.githubusercontent.com/deependujha/litdata/security
Insights https://patch-diff.githubusercontent.com/deependujha/litdata/pulse
Brancheshttps://patch-diff.githubusercontent.com/deependujha/litdata/branches
Tagshttps://patch-diff.githubusercontent.com/deependujha/litdata/tags
https://patch-diff.githubusercontent.com/deependujha/litdata/branches
https://patch-diff.githubusercontent.com/deependujha/litdata/tags
628 Commitshttps://patch-diff.githubusercontent.com/deependujha/litdata/commits/main/
https://patch-diff.githubusercontent.com/deependujha/litdata/commits/main/
.githubhttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/.github
.githubhttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/.github
benchmarkshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/benchmarks
benchmarkshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/benchmarks
docshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/docs
docshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/docs
exampleshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/examples
exampleshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/examples
requirementshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/requirements
requirementshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/requirements
src/litdatahttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/src/litdata
src/litdatahttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/src/litdata
testshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/tests
testshttps://patch-diff.githubusercontent.com/deependujha/litdata/tree/main/tests
.codecov.ymlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.codecov.yml
.codecov.ymlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.codecov.yml
.gitignorehttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.gitignore
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.pre-commit-config.yaml
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/.pre-commit-config.yaml
CONTRIBUTING.mdhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/CONTRIBUTING.md
CONTRIBUTING.mdhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/CONTRIBUTING.md
LICENSEhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/LICENSE
LICENSEhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/LICENSE
MANIFEST.inhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/MANIFEST.in
MANIFEST.inhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/MANIFEST.in
Makefilehttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/Makefile
Makefilehttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/Makefile
README.mdhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/README.md
README.mdhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/README.md
pyproject.tomlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/pyproject.toml
pyproject.tomlhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/pyproject.toml
requirements.txthttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/requirements.txt
requirements.txthttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/requirements.txt
setup.pyhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/setup.py
setup.pyhttps://patch-diff.githubusercontent.com/deependujha/litdata/blob/main/setup.py
READMEhttps://patch-diff.githubusercontent.com/deependujha/litdata
Contributinghttps://patch-diff.githubusercontent.com/deependujha/litdata
Licensehttps://patch-diff.githubusercontent.com/deependujha/litdata
https://patch-diff.githubusercontent.com/deependujha/litdata#--speed-up-model-training-by-fixing-data-loading
https://camo.githubusercontent.com/39ff93ac5b725c48926842066606d343a3062838b9ecc1ce044ddc03e0b6ad2a/68747470733a2f2f706c2d666c6173682d646174612e73332e616d617a6f6e6177732e636f6d2f6c69745f646174615f6c6f676f2e77656270
https://camo.githubusercontent.com/c392e265c0564ced78496eb97f97bde39a21a2cd866909cc9fe504bc62558658/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6c697464617461
https://camo.githubusercontent.com/b9303fe70c26e650d1b9a18e19e9ededdbcc44078696e84acbeda18c6ee8a596/68747470733a2f2f696d672e736869656c64732e696f2f707970692f646d2f6c697464617461
https://camo.githubusercontent.com/08c5206cacac68b79eaf4a4a572518baad12cc703d5ab4bec1a464087d38481f/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f4c696768746e696e672d41492f6c697464617461
https://discord.gg/VptPCZkGNa
Lightning AIhttps://lightning.ai/
Quick starthttps://patch-diff.githubusercontent.com/deependujha/litdata#quick-start
Optimize datahttps://patch-diff.githubusercontent.com/deependujha/litdata#speed-up-model-training
Transform datahttps://patch-diff.githubusercontent.com/deependujha/litdata#transform-datasets
Featureshttps://patch-diff.githubusercontent.com/deependujha/litdata#key-features
Benchmarkshttps://patch-diff.githubusercontent.com/deependujha/litdata#benchmarks
Templateshttps://patch-diff.githubusercontent.com/deependujha/litdata#start-from-a-template
Communityhttps://patch-diff.githubusercontent.com/deependujha/litdata#community
https://lightning.ai/docs/overview/optimize-data/optimize-datasets
https://patch-diff.githubusercontent.com/deependujha/litdata#why-litdata
https://patch-diff.githubusercontent.com/deependujha/litdata#looking-for-gpus
Lightning Cloudhttps://lightning.ai/?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
GPUshttps://lightning.ai/pricing?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
Clustershttps://lightning.ai/clusters?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
AI Studio (vibe train)https://lightning.ai/studios?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
AI Studio (vibe deploy)https://lightning.ai/studios?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
Notebookshttps://lightning.ai/notebooks?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
Inferencehttps://lightning.ai/deploy?utm_source=litdata&utm_medium=referral&utm_campaign=litdata
https://patch-diff.githubusercontent.com/deependujha/litdata#quick-start
Speed up model traininghttps://patch-diff.githubusercontent.com/deependujha/litdata#speed-up-model-training
Transform datasetshttps://patch-diff.githubusercontent.com/deependujha/litdata#transform-datasets
https://patch-diff.githubusercontent.com/deependujha/litdata#speed-up-model-training
https://patch-diff.githubusercontent.com/deependujha/litdata#option-1-start-immediately-with-existing-data-
https://patch-diff.githubusercontent.com/deependujha/litdata#option-2-optimize-for-maximum-performance-
Lightning Studiohttps://lightning.ai
https://patch-diff.githubusercontent.com/deependujha/litdata#transform-datasets
https://patch-diff.githubusercontent.com/deependujha/litdata#key-features
https://patch-diff.githubusercontent.com/deependujha/litdata#features-for-optimizing-and-streaming-datasets-for-model-training
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#stream-raw
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#stream-large
S3https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.client
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#stream-hf
https://patch-diff.githubusercontent.com/deependujha/litdata#indexing-the-hf-dataset-optional
https://patch-diff.githubusercontent.com/deependujha/litdata#full-workflow-for-hugging-face-datasets
https://patch-diff.githubusercontent.com/deependujha/litdata#litdata-optimize-vs-parquet
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#multi-gpu
PyTorch Lightninghttps://lightning.ai/docs/pytorch/stable/
Lightning Fabrichttps://lightning.ai/docs/fabric/stable/
PyTorchhttps://pytorch.org/docs/stable/index.html
https://camo.githubusercontent.com/50ebb2f4380415afa9a55f400a953be6ded91b1615514c9ec1d2cc0a568eb05f/68747470733a2f2f706c2d666c6173682d646174612e73332e616d617a6f6e6177732e636f6d2f73747265616d696e675f646174617365742e676966
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#cloud-providers
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#pause-resume
Lit-GPThttps://github.com/Lightning-AI/litgpt/blob/main/tutorials/pretrain_tinyllama.md
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#shared-queue
https://patch-diff.githubusercontent.com/deependujha/litdata#performance-difference-between-using-a-shared-queue-and-not-using-it
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#queue-input
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#llm-training
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#filter-data
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#combine-datasets
Slimpajamahttps://huggingface.co/datasets/cerebras/SlimPajama-627B
StarCoderhttps://huggingface.co/datasets/bigcode/starcoderdata
TinyLLAMAhttps://github.com/jzhang38/TinyLlama
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#parallel-streaming
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#cycle-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#merge-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#transform-streaming
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#split-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#load-subset
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#upsample-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#modify-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#stream-parquet
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#compression
zstdhttps://github.com/facebook/zstd
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#access-samples
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#data-transforms
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#profile-loading
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#reduce-memory
parquet fileshttps://en.wikipedia.org/wiki/Apache_Parquet
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#limit-cache
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#cache-directory
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#networked-drives
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#distributed-optimization
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#encrypt-decrypt
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#debug-profile
https://private-user-images.githubusercontent.com/76887609/432731090-4e40676c-ba0b-49af-acac-975977173669.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMTE3MDIsIm5iZiI6MTc3MTIxMTQwMiwicGF0aCI6Ii83Njg4NzYwOS80MzI3MzEwOTAtNGU0MDY3NmMtYmEwYi00OWFmLWFjYWMtOTc1OTc3MTczNjY5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDAzMTAwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZlODA5ZGU0ZmFmNGFiYWE5MDdhOWVmNzdkZDViYjc2ODczYjk3NTJlY2JhYTQzOGY3YzM0Y2RhMGI4YTQ5NzUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bXjtmYnYWke2XVJxUMMS2GjYSgDCOHqKFeOIq1owoBE
Litracerhttps://github.com/deependujha/litracer/
LitRacer GitHub Releaseshttps://github.com/deependujha/litracer/releases
Perfetto documentationhttps://perfetto.dev/docs/visualization/large-traces
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#lightning-connections
Lightning Studioshttps://lightning.ai/
https://patch-diff.githubusercontent.com/deependujha/litdata#features-for-transforming-datasets
πŸ”—https://patch-diff.githubusercontent.com/deependujha/litdata#map
https://patch-diff.githubusercontent.com/deependujha/litdata#benchmarks
Reproduce the benchmarkhttps://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries
https://patch-diff.githubusercontent.com/deependujha/litdata#streaming-speed
https://patch-diff.githubusercontent.com/deependujha/litdata#litdata-chunks
Imagenet-1.2M datasethttps://www.image-net.org/
AWS S3https://aws.amazon.com/s3/
https://patch-diff.githubusercontent.com/deependujha/litdata#raw-dataset
https://patch-diff.githubusercontent.com/deependujha/litdata#time-to-optimize-data
https://patch-diff.githubusercontent.com/deependujha/litdata#parallelize-transforms-and-data-optimization-on-cloud-machines
https://camo.githubusercontent.com/426b52965f8cbc123bc2ec5a8e6bb437817feb54c84c4bd80cc2a52046e8e853/68747470733a2f2f706c2d666c6173682d646174612e73332e616d617a6f6e6177732e636f6d2f646174612d707265702e6a7067
https://patch-diff.githubusercontent.com/deependujha/litdata#parallelize-data-transforms
Lightning Studioshttps://lightning.ai/
https://patch-diff.githubusercontent.com/deependujha/litdata#parallelize-data-optimization
Lightning Studioshttps://lightning.ai/
Process the LAION 400 million image dataset in 2 hours on 32 machines, each with 32 CPUshttps://lightning.ai/lightning-ai/studios/use-or-explore-laion-400million-dataset
https://patch-diff.githubusercontent.com/deependujha/litdata#start-from-a-template
https://patch-diff.githubusercontent.com/deependujha/litdata#templates-transform-datasets
Download LAION-400MILLION datasethttps://lightning.ai/lightning-ai/studios/use-or-explore-laion-400million-dataset
LAION-400Mhttps://laion.ai/blog/laion-400-open-dataset/
Tokenize 2M Swedish Wikipedia Articleshttps://lightning.ai/lightning-ai/studios/tokenize-2m-swedish-wikipedia-articles
Swedish Wikipediahttps://huggingface.co/datasets/wikipedia
Embed English Wikipedia under 5 dollarshttps://lightning.ai/lightning-ai/studios/embed-english-wikipedia-under-5-dollars
English Wikipediahttps://huggingface.co/datasets/wikipedia
https://patch-diff.githubusercontent.com/deependujha/litdata#templates-optimize--stream-data
Benchmark cloud data-loading librarieshttps://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries
Imagenet 1Mhttps://paperswithcode.com/sota/image-classification-on-imagenet?tag_filter=171
Optimize GeoSpatial data for model traininghttps://lightning.ai/lightning-ai/studios/convert-spatial-data-to-lightning-streaming
Chesapeake Roads Spatial Contexthttps://github.com/isaaccorley/chesapeakersc
Optimize TinyLlama 1T dataset for traininghttps://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset
SlimPajamahttps://huggingface.co/datasets/cerebras/SlimPajama-627B
StarCoderhttps://huggingface.co/datasets/bigcode/starcoderdata
Optimize parquet files for model traininghttps://lightning.ai/lightning-ai/studios/convert-parquets-to-lightning-streaming
https://patch-diff.githubusercontent.com/deependujha/litdata#community
Get help on Discordhttps://discord.com/invite/XncpTy7DSt
License: Apache 2.0https://github.com/Lightning-AI/litdata/blob/main/LICENSE
https://patch-diff.githubusercontent.com/deependujha/litdata#citation
https://patch-diff.githubusercontent.com/deependujha/litdata#papers-with-litdata
Towards Interpretable Protein Structure Prediction with Sparse Autoencodershttps://arxiv.org/pdf/2503.08764
Githubhttps://github.com/johnyang101/reticular-sae
https://patch-diff.githubusercontent.com/deependujha/litdata#governance
https://patch-diff.githubusercontent.com/deependujha/litdata#maintainers
tchatonhttps://github.com/tchaton
bhimrazyhttps://github.com/bhimrazy
deependujhahttps://github.com/deependujha
https://patch-diff.githubusercontent.com/deependujha/litdata#emeritus-maintainers
lantigahttps://github.com/lantiga
justusschockhttps://github.com/justusschock
Bordahttps://github.com/Borda
awaelchlihttps://github.com/awaelchli
Readme https://patch-diff.githubusercontent.com/deependujha/litdata#readme-ov-file
Apache-2.0 license https://patch-diff.githubusercontent.com/deependujha/litdata#Apache-2.0-1-ov-file
Contributing https://patch-diff.githubusercontent.com/deependujha/litdata#contributing-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/deependujha/litdata
Activityhttps://patch-diff.githubusercontent.com/deependujha/litdata/activity
0 starshttps://patch-diff.githubusercontent.com/deependujha/litdata/stargazers
0 watchinghttps://patch-diff.githubusercontent.com/deependujha/litdata/watchers
0 forkshttps://patch-diff.githubusercontent.com/deependujha/litdata/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fdeependujha%2Flitdata&report=deependujha+%28user%29
Releaseshttps://patch-diff.githubusercontent.com/deependujha/litdata/releases
Packages 0https://patch-diff.githubusercontent.com/users/deependujha/packages?repo_name=litdata
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.