René's URL Explorer Experiment

Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python · GitHub

Open Graph Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python

X Title: ETA calculation is inaccurate · Issue #55 · PPPLDeepLearning/plasma-python

Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], loss: 1.05701 [1.05701] | walltime: 5.7374...

Open Graph Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...

X Description: Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...

Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/55

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"ETA calculation is inaccurate","articleBody":"Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse):\r\n```\r\n[0] step: 0 [ETA: 468568011.02s] [0.00/1789], loss: 1.05701 [1.05701] | walltime: 5.7374 | 8.47E+02 Examples/sec | 6.04E-01 sec/batch [92.3% calc., 7.7% sync.][batch = 512 = 128*4] [lr = 7.30E-05 = 1.83E-05*4]\r\n```\r\nThe ETA provided in this example is clearly inaccurate (each epoch takes around 60s). Specifically, there are two types of issues:\r\n1. The ETA computed in the first step of any epoch is always inaccurate. \r\n2. For later epochs within a session, the ETA increases nearly monotonically for many steps before starting to decrease nearly monotonically. \r\n\r\n### First step\r\n\r\nFor the first epoch in a given session, it gives a huge ETA since `MPI_Model.num_so_far` is zero, resulting in `work_so_far` of 0 being passed to:\r\nhttps://github.com/PPPLDeepLearning/plasma-python/blob/c82ba61e339882a5af10b1052edc0348e16119f4/plasma/models/mpi_runner.py#L613-L616\r\ncausing `total_time` to explode. \r\n- [ ] Probably should just refuse to give an ETA for the first step (or steps) of the first epoch\r\n\r\nFor later epochs within a session, it gives a minuscule ETA:\r\n```\r\nstep: 0 [ETA: 0.55s] [1819.00/1789], loss: 0.98688 [0.98688] | walltime: 174.4240 | 8.93E+02 Examples/sec | 5.73E-01 sec/batch [96.1% calc., 3.9% sync.][batch = 512 = 128*4] [lr = 7.08E-05 = 1.77E-05*4]\r\n``` \r\n- [ ] I think an error was introduced when I changed the 0-based indexing of the epochs 1-2 months ago.\r\n\r\n### Later steps in later epochs\r\n\r\nE.g. here are the ETAs for some later epoch:\r\n```\r\n\r\nETA: 0.55s\r\nETA: 22.14\r\nETA: 27.98\r\nETA: 31.63\r\nETA: 35.88\r\nETA: 38.45\r\nETA: 34.89\r\nETA: 36.21\r\nETA: 35.35\r\nETA: 35.56\r\nETA: 36.04\r\nETA: 35.88\r\nETA: 35.33\r\nETA: 34.49\r\nETA: 34.73\r\nETA: 34.29\r\nETA: 34.13\r\nETA: 33.51\r\nETA: 33.16\r\n…\r\nETA: 1.35s\r\nETA: 1.06s\r\nETA: 0.67s\r\nETA: 0.11s\r\nETA: -0.45\r\n```\r\n- [ ] Consider using the measured runtimes of the previous epochs within this session to inform the ETA in later epochs. ","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2020-01-07T02:15:58.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/55/plasma-python/issues/55"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:c1861943-a8bd-e6bd-6281-9cedbeb5325a
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	BF58:105208:CD136:10C597:698E6862
html-safe-nonce	0b2acfa302d4371e49311a5aab82a38b1ace15da909b6c37d69bc8628b709546
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCRjU4OjEwNTIwODpDRDEzNjoxMEM1OTc6Njk4RTY4NjIiLCJ2aXNpdG9yX2lkIjoiNDY4MzM4MzY4NzM4NjkxNjk2MiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	7cc5f9e8b4f5d66167219604466d4c6fa2987bd8f7be8d9b875139ea36696148
hovercard-subject-tag	issue:546039666
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/55/issue_layout
twitter:image	https://opengraph.githubassets.com/fcbfe0512147b40a8ea0045e182100e74925df5bf8f55757ebf6918723a32d33/PPPLDeepLearning/plasma-python/issues/55
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/fcbfe0512147b40a8ea0045e182100e74925df5bf8f55757ebf6918723a32d33/PPPLDeepLearning/plasma-python/issues/55
og:image:alt	Example of the current per-step (iteration) diagnostic output provided by FRNN around epoch 22 of the D3D 0D model (run on 4 V100 GPUs of Traverse): [0] step: 0 [ETA: 468568011.02s] [0.00/1789], lo...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	felker
hostname	github.com
expected-hostname	github.com
None	666e30cc1de8ebdf458084bf731e95deba4f074a5008f91b50803aa9a71e3725
turbo-cache-control	no-preview
go-import	github.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git
octolytics-dimension-user_id	23219101
octolytics-dimension-user_login	PPPLDeepLearning
octolytics-dimension-repository_id	72968591
octolytics-dimension-repository_nwo	PPPLDeepLearning/plasma-python
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	72968591
octolytics-dimension-repository_network_root_nwo	PPPLDeepLearning/plasma-python
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	c5daa44975c44e187dd9ea0d761c37973489d508
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/PPPLDeepLearning/plasma-python/issues/55#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F55
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F55
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=PPPLDeepLearning%2Fplasma-python
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/55
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/55
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/55
PPPLDeepLearning	https://github.com/PPPLDeepLearning
plasma-python	https://github.com/PPPLDeepLearning/plasma-python
Notifications	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Fork 43	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Star 88	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Code	https://github.com/PPPLDeepLearning/plasma-python
Issues 21	https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests 1	https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions	https://github.com/PPPLDeepLearning/plasma-python/actions
Projects 0	https://github.com/PPPLDeepLearning/plasma-python/projects
Security 0	https://github.com/PPPLDeepLearning/plasma-python/security
Insights	https://github.com/PPPLDeepLearning/plasma-python/pulse
Code	https://github.com/PPPLDeepLearning/plasma-python
Issues	https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests	https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions	https://github.com/PPPLDeepLearning/plasma-python/actions
Projects	https://github.com/PPPLDeepLearning/plasma-python/projects
Security	https://github.com/PPPLDeepLearning/plasma-python/security
Insights	https://github.com/PPPLDeepLearning/plasma-python/pulse
New issue	https://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/55
New issue	https://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/55
ETA calculation is inaccurate	https://github.com/PPPLDeepLearning/plasma-python/issues/55#top
	https://github.com/felker
	https://github.com/felker
felker	https://github.com/felker
on Jan 7, 2020	https://github.com/PPPLDeepLearning/plasma-python/issues/55#issue-546039666
plasma-python/plasma/models/mpi_runner.py	https://github.com/PPPLDeepLearning/plasma-python/blob/c82ba61e339882a5af10b1052edc0348e16119f4/plasma/models/mpi_runner.py#L613-L616
c82ba61	https://github.com/PPPLDeepLearning/plasma-python/commit/c82ba61e339882a5af10b1052edc0348e16119f4
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.