René's URL Explorer Experiment


Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python · GitHub

Open Graph Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python

X Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python

Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], loss: 1.05701 [1.05701] | walltime: 5.7374...

Open Graph Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...

X Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...

Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/55

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"ETA calculation is inaccurate","articleBody":"Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse):\r\n```\r\n[0] step: 0 [ETA: 468568011.02s] [0.00/1789], loss: 1.05701 [1.05701] | walltime: 5.7374 | 8.47E+02 Examples/sec | 6.04E-01 sec/batch [92.3% calc., 7.7% sync.][batch = 512 = 128*4] [lr = 7.30E-05 = 1.83E-05*4]\r\n```\r\nThe ETA provided in this example is clearly inaccurate (each epoch takes around 60s). Specifically, there are two types of issues:\r\n1. The ETA computed in the first step of any epoch is always inaccurate. \r\n2. For later epochs within a session, the ETA increases nearly monotonically for many steps before starting to decrease nearly monotonically. \r\n\r\n### First step\r\n\r\nFor the first epoch in a given session, it gives a huge ETA since `MPI_Model.num_so_far` is zero, resulting in `work_so_far` of 0 being passed to:\r\nhttps://github.com/PPPLDeepLearning/plasma-python/blob/c82ba61e339882a5af10b1052edc0348e16119f4/plasma/models/mpi_runner.py#L613-L616\r\ncausing `total_time` to explode. \r\n- [ ] Probably should just refuse to give an ETA for the first step (or steps) of the first epoch\r\n\r\nFor later epochs within a session, it gives a minuscule ETA:\r\n```\r\nstep: 0 [ETA: 0.55s] [1819.00/1789], loss: 0.98688 [0.98688] | walltime: 174.4240 | 8.93E+02 Examples/sec | 5.73E-01 sec/batch [96.1% calc., 3.9% sync.][batch = 512 = 128*4] [lr = 7.08E-05 = 1.77E-05*4]\r\n``` \r\n- [ ] I think an error was introduced when I changed the 0-based indexing of the epochs 1-2 months ago.\r\n\r\n### Later steps in later epochs\r\n\r\nE.g. here are the ETAs for some later epoch:\r\n```\r\n\r\nETA: 0.55s\r\nETA: 22.14\r\nETA: 27.98\r\nETA: 31.63\r\nETA: 35.88\r\nETA: 38.45\r\nETA: 34.89\r\nETA: 36.21\r\nETA: 35.35\r\nETA: 35.56\r\nETA: 36.04\r\nETA: 35.88\r\nETA: 35.33\r\nETA: 34.49\r\nETA: 34.73\r\nETA: 34.29\r\nETA: 34.13\r\nETA: 33.51\r\nETA: 33.16\r\n…\r\nETA: 1.35s\r\nETA: 1.06s\r\nETA: 0.67s\r\nETA: 0.11s\r\nETA: -0.45\r\n```\r\n- [ ] Consider using the measured runtimes of the previous epochs within this session to inform the ETA in later epochs. ","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2020-01-07T02:15:58.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/55/plasma-python/issues/55"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:c1861943-a8bd-e6bd-6281-9cedbeb5325a
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idBF58:105208:CD136:10C597:698E6862
html-safe-nonce0b2acfa302d4371e49311a5aab82a38b1ace15da909b6c37d69bc8628b709546
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCRjU4OjEwNTIwODpDRDEzNjoxMEM1OTc6Njk4RTY4NjIiLCJ2aXNpdG9yX2lkIjoiNDY4MzM4MzY4NzM4NjkxNjk2MiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac7cc5f9e8b4f5d66167219604466d4c6fa2987bd8f7be8d9b875139ea36696148
hovercard-subject-tagissue:546039666
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/55/issue_layout
twitter:imagehttps://opengraph.githubassets.com/fcbfe0512147b40a8ea0045e182100e74925df5bf8f55757ebf6918723a32d33/PPPLDeepLearning/plasma-python/issues/55
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/fcbfe0512147b40a8ea0045e182100e74925df5bf8f55757ebf6918723a32d33/PPPLDeepLearning/plasma-python/issues/55
og:image:altExample of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamefelker
hostnamegithub.com
expected-hostnamegithub.com
None666e30cc1de8ebdf458084bf731e95deba4f074a5008f91b50803aa9a71e3725
turbo-cache-controlno-preview
go-importgithub.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git
octolytics-dimension-user_id23219101
octolytics-dimension-user_loginPPPLDeepLearning
octolytics-dimension-repository_id72968591
octolytics-dimension-repository_nwoPPPLDeepLearning/plasma-python
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id72968591
octolytics-dimension-repository_network_root_nwoPPPLDeepLearning/plasma-python
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasec5daa44975c44e187dd9ea0d761c37973489d508
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/PPPLDeepLearning/plasma-python/issues/55#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F55
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F55
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=PPPLDeepLearning%2Fplasma-python
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/55
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/55
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/55
PPPLDeepLearning https://github.com/PPPLDeepLearning
plasma-pythonhttps://github.com/PPPLDeepLearning/plasma-python
Notifications https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Fork 43 https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Star 88 https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Code https://github.com/PPPLDeepLearning/plasma-python
Issues 21 https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests 1 https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions https://github.com/PPPLDeepLearning/plasma-python/actions
Projects 0 https://github.com/PPPLDeepLearning/plasma-python/projects
Security 0 https://github.com/PPPLDeepLearning/plasma-python/security
Insights https://github.com/PPPLDeepLearning/plasma-python/pulse
Code https://github.com/PPPLDeepLearning/plasma-python
Issues https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions https://github.com/PPPLDeepLearning/plasma-python/actions
Projects https://github.com/PPPLDeepLearning/plasma-python/projects
Security https://github.com/PPPLDeepLearning/plasma-python/security
Insights https://github.com/PPPLDeepLearning/plasma-python/pulse
New issuehttps://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/55
New issuehttps://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/55
ETA calculation is inaccuratehttps://github.com/PPPLDeepLearning/plasma-python/issues/55#top
https://github.com/felker
https://github.com/felker
felkerhttps://github.com/felker
on Jan 7, 2020https://github.com/PPPLDeepLearning/plasma-python/issues/55#issue-546039666
plasma-python/plasma/models/mpi_runner.pyhttps://github.com/PPPLDeepLearning/plasma-python/blob/c82ba61e339882a5af10b1052edc0348e16119f4/plasma/models/mpi_runner.py#L613-L616
c82ba61https://github.com/PPPLDeepLearning/plasma-python/commit/c82ba61e339882a5af10b1052edc0348e16119f4
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.