René's URL Explorer Experiment

Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python · GitHub

Open Graph Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python

X Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python

Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traverse cluster, is about 3x slower than on t...

Open Graph Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traver...

X Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Tr...

Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/52

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Improve performance on V100s","articleBody":"Mostly repeating private email and in-person communication on this topic for reference notes and posterity. \r\n\r\nFRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traverse cluster, is **about 3x slower** than on the P100s on Princeton's TigerGPU cluster. See the below table, which tests the performance for `d3d_0D` training on both machines as a function of batch size (as suggested by @jnkh). I have run these tests with 1, 2, 8 GPUs as well, and several datasets. \r\n\r\n\u003ctable border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\"\u003e\r\n\r\n\r\n\u003ccolgroup\u003e\r\n\u003ccol  class=\"org-left\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\u003c/colgroup\u003e\r\n\u003cthead\u003e\r\n\u003ctr\u003e\r\n\u003cth scope=\"col\" class=\"org-left\"\u003eMachine (GPU Model)\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eN_node\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eN_{GPU}\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eExamples/sec\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eSec/batch\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eBatch size\u003c/th\u003e\r\n\u003c/tr\u003e\r\n\u003c/thead\u003e\r\n\r\n\u003ctbody\u003e\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003eTraverse (V100)\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1.35e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.75\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1024\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2.53e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.80\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2048\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e5.20e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.80\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4096\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\u003c/tbody\u003e\r\n\r\n\u003ctbody\u003e\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003eTigerGPU (P100)\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4.30e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.24\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1024\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e7.70e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.26\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2048\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1.38e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.30\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4096\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\u003c/tbody\u003e\r\n\u003c/table\u003e\r\n\r\n\r\nAt first, I suspected some issue with my Conda / MPI environment on the Power 9 architecture. However, @ge-dong and I compared figures, and we confirmed that we are both independently observing this behavior. In fact, the original modules on Traverse produced about even slower performance (20%). \r\n\r\n@ASvyatkovskiy identified the primary issue being that the TensorFlow backend for`tf.keras` or external Keras does not run the cuDNN autotuner unlike vanilla TensorFlow architecture definitions. See my notes about the autotuner in #51. The default implementations of our layers might be slower on V100 than on P100.\r\n\r\nHe opened issues about this when he first ran on Summit over 1.5 years ago:\r\nhttps://github.com/tensorflow/tensorflow/issues/18913,  https://github.com/keras-team/keras/issues/9825. Related: https://github.com/keras-team/keras/issues/9321\r\n\r\nAnd proposed the following optimizations especially for V100s:\r\n- Use https://github.com/NVIDIA/nccl library to perform all-reduce directly on the GPU\r\n- Use https://github.com/NVIDIA/apex mixed precision optimizers\r\n\r\n\u003e All these things are easier to enable/add in PyTorch, which now also support distributed training natively and through Horovod.\r\n\r\nAlso, I am systematically benchmarking the `LSTM`  Keras layer definition vs. `CuDNNLSTM`, which seems to be at least an order of magnitude faster. \r\n\r\n\r\n**IBM AC922 \"Traverse\" architecture details:**\r\n- Processor is 16-core Power 9 running at 2.7 GHz\r\n- Host memory 256 GB DDR4\r\n- 4 X V100 with 32 GB HBM2\r\n\r\n ","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2019-12-17T23:03:56.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/52/plasma-python/issues/52"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:3a276bd9-09c9-e973-14ae-5b25088c29c7
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	DD5C:26F9F:7622A:A1670:698F1B61
html-safe-nonce	07d07553d03855b97d03f09684145629993f0a6d0b79b63c29c93bfa15916b0f
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJERDVDOjI2RjlGOjc2MjJBOkExNjcwOjY5OEYxQjYxIiwidmlzaXRvcl9pZCI6IjM4OTA2OTA1MDMyMjI5NTY0OSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	12f561e6c5b6f603a35084cdb4652e052c35d65004d79bf8701010fe664d418f
hovercard-subject-tag	issue:539358048
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/52/issue_layout
twitter:image	https://opengraph.githubassets.com/e1a939851304b3730d01433cf6b05929a770f98b8ab49e8f7bd723637ff6f3e1/PPPLDeepLearning/plasma-python/issues/52
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/e1a939851304b3730d01433cf6b05929a770f98b8ab49e8f7bd723637ff6f3e1/PPPLDeepLearning/plasma-python/issues/52
og:image:alt	Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traver...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	felker
hostname	github.com
expected-hostname	github.com
None	2da1a0d1318592c9965539b12269c4641177dfabfc86c3807992efb13e1d96ff
turbo-cache-control	no-preview
go-import	github.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git
octolytics-dimension-user_id	23219101
octolytics-dimension-user_login	PPPLDeepLearning
octolytics-dimension-repository_id	72968591
octolytics-dimension-repository_nwo	PPPLDeepLearning/plasma-python
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	72968591
octolytics-dimension-repository_network_root_nwo	PPPLDeepLearning/plasma-python
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	c9646ffd6f86b00952c2b39e3c62e15904eff1e5
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/PPPLDeepLearning/plasma-python/issues/52#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F52
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F52
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=PPPLDeepLearning%2Fplasma-python
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/52
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/52
Reload	https://github.com/PPPLDeepLearning/plasma-python/issues/52
PPPLDeepLearning	https://github.com/PPPLDeepLearning
plasma-python	https://github.com/PPPLDeepLearning/plasma-python
Notifications	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Fork 43	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Star 88	https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Code	https://github.com/PPPLDeepLearning/plasma-python
Issues 21	https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests 1	https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions	https://github.com/PPPLDeepLearning/plasma-python/actions
Projects 0	https://github.com/PPPLDeepLearning/plasma-python/projects
Security 0	https://github.com/PPPLDeepLearning/plasma-python/security
Insights	https://github.com/PPPLDeepLearning/plasma-python/pulse
Code	https://github.com/PPPLDeepLearning/plasma-python
Issues	https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests	https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions	https://github.com/PPPLDeepLearning/plasma-python/actions
Projects	https://github.com/PPPLDeepLearning/plasma-python/projects
Security	https://github.com/PPPLDeepLearning/plasma-python/security
Insights	https://github.com/PPPLDeepLearning/plasma-python/pulse
New issue	https://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/52
New issue	https://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/52
Improve performance on V100s	https://github.com/PPPLDeepLearning/plasma-python/issues/52#top
	https://github.com/felker
	https://github.com/felker
felker	https://github.com/felker
on Dec 17, 2019	https://github.com/PPPLDeepLearning/plasma-python/issues/52#issue-539358048
@jnkh	https://github.com/jnkh
@ge-dong	https://github.com/ge-dong
@ASvyatkovskiy	https://github.com/ASvyatkovskiy
#51	https://github.com/PPPLDeepLearning/plasma-python/issues/51
tensorflow/tensorflow#18913	https://github.com/tensorflow/tensorflow/issues/18913
keras-team/keras#9825	https://github.com/keras-team/keras/issues/9825
keras-team/keras#9321	https://github.com/keras-team/keras/issues/9321
https://github.com/NVIDIA/nccl	https://github.com/NVIDIA/nccl
https://github.com/NVIDIA/apex	https://github.com/NVIDIA/apex
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.