René's URL Explorer Experiment


Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python · GitHub

Open Graph Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python

X Title: Improve performance on V100s · Issue #52 · PPPLDeepLearning/plasma-python

Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traverse cluster, is about 3x slower than on t...

Open Graph Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traver...

X Description: Mostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Tr...

Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/52

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Improve performance on V100s","articleBody":"Mostly repeating private email and in-person communication on this topic for reference notes and posterity. \r\n\r\nFRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traverse cluster, is **about 3x slower** than on the P100s on Princeton's TigerGPU cluster. See the below table, which tests the performance for `d3d_0D` training on both machines as a function of batch size (as suggested by @jnkh). I have run these tests with 1, 2, 8 GPUs as well, and several datasets. \r\n\r\n\u003ctable border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\"\u003e\r\n\r\n\r\n\u003ccolgroup\u003e\r\n\u003ccol  class=\"org-left\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\r\n\u003ccol  class=\"org-right\" /\u003e\r\n\u003c/colgroup\u003e\r\n\u003cthead\u003e\r\n\u003ctr\u003e\r\n\u003cth scope=\"col\" class=\"org-left\"\u003eMachine (GPU Model)\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eN_node\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eN_{GPU}\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eExamples/sec\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eSec/batch\u003c/th\u003e\r\n\u003cth scope=\"col\" class=\"org-right\"\u003eBatch size\u003c/th\u003e\r\n\u003c/tr\u003e\r\n\u003c/thead\u003e\r\n\r\n\u003ctbody\u003e\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003eTraverse (V100)\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1.35e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.75\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1024\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2.53e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.80\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2048\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e5.20e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.80\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4096\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\u003c/tbody\u003e\r\n\r\n\u003ctbody\u003e\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003eTigerGPU (P100)\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4.30e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.24\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1024\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e7.70e3\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.26\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e2048\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\r\n\r\n\u003ctr\u003e\r\n\u003ctd class=\"org-left\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e\u0026#xa0;\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e1.38e4\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e0.30\u003c/td\u003e\r\n\u003ctd class=\"org-right\"\u003e4096\u003c/td\u003e\r\n\u003c/tr\u003e\r\n\u003c/tbody\u003e\r\n\u003c/table\u003e\r\n\r\n\r\nAt first, I suspected some issue with my Conda / MPI environment on the Power 9 architecture. However, @ge-dong and I compared figures, and we confirmed that we are both independently observing this behavior. In fact, the original modules on Traverse produced about even slower performance (20%). \r\n\r\n@ASvyatkovskiy identified the primary issue being that the TensorFlow backend for`tf.keras` or external Keras does not run the cuDNN autotuner unlike vanilla TensorFlow architecture definitions. See my notes about the autotuner in #51. The default implementations of our layers might be slower on V100 than on P100.\r\n\r\nHe opened issues about this when he first ran on Summit over 1.5 years ago:\r\nhttps://github.com/tensorflow/tensorflow/issues/18913,  https://github.com/keras-team/keras/issues/9825. Related: https://github.com/keras-team/keras/issues/9321\r\n\r\nAnd proposed the following optimizations especially for V100s:\r\n- Use https://github.com/NVIDIA/nccl library to perform all-reduce directly on the GPU\r\n- Use https://github.com/NVIDIA/apex mixed precision optimizers\r\n\r\n\u003e All these things are easier to enable/add in PyTorch, which now also support distributed training natively and through Horovod.\r\n\r\nAlso, I am systematically benchmarking the `LSTM`  Keras layer definition vs. `CuDNNLSTM`, which seems to be at least an order of magnitude faster. \r\n\r\n\r\n**IBM AC922 \"Traverse\" architecture details:**\r\n- Processor is 16-core Power 9 running at 2.7 GHz\r\n- Host memory 256 GB DDR4\r\n- 4 X V100 with 32 GB HBM2\r\n\r\n ","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2019-12-17T23:03:56.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/52/plasma-python/issues/52"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:3a276bd9-09c9-e973-14ae-5b25088c29c7
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idDD5C:26F9F:7622A:A1670:698F1B61
html-safe-nonce07d07553d03855b97d03f09684145629993f0a6d0b79b63c29c93bfa15916b0f
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJERDVDOjI2RjlGOjc2MjJBOkExNjcwOjY5OEYxQjYxIiwidmlzaXRvcl9pZCI6IjM4OTA2OTA1MDMyMjI5NTY0OSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac12f561e6c5b6f603a35084cdb4652e052c35d65004d79bf8701010fe664d418f
hovercard-subject-tagissue:539358048
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/52/issue_layout
twitter:imagehttps://opengraph.githubassets.com/e1a939851304b3730d01433cf6b05929a770f98b8ab49e8f7bd723637ff6f3e1/PPPLDeepLearning/plasma-python/issues/52
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/e1a939851304b3730d01433cf6b05929a770f98b8ab49e8f7bd723637ff6f3e1/PPPLDeepLearning/plasma-python/issues/52
og:image:altMostly repeating private email and in-person communication on this topic for reference notes and posterity. FRNN performance on V100s on the 2x IBM AC922 systems, OLCF Summit and Princeton's Traver...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamefelker
hostnamegithub.com
expected-hostnamegithub.com
None2da1a0d1318592c9965539b12269c4641177dfabfc86c3807992efb13e1d96ff
turbo-cache-controlno-preview
go-importgithub.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git
octolytics-dimension-user_id23219101
octolytics-dimension-user_loginPPPLDeepLearning
octolytics-dimension-repository_id72968591
octolytics-dimension-repository_nwoPPPLDeepLearning/plasma-python
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id72968591
octolytics-dimension-repository_network_root_nwoPPPLDeepLearning/plasma-python
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasec9646ffd6f86b00952c2b39e3c62e15904eff1e5
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/PPPLDeepLearning/plasma-python/issues/52#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F52
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FPPPLDeepLearning%2Fplasma-python%2Fissues%2F52
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=PPPLDeepLearning%2Fplasma-python
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/52
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/52
Reloadhttps://github.com/PPPLDeepLearning/plasma-python/issues/52
PPPLDeepLearning https://github.com/PPPLDeepLearning
plasma-pythonhttps://github.com/PPPLDeepLearning/plasma-python
Notifications https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Fork 43 https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Star 88 https://github.com/login?return_to=%2FPPPLDeepLearning%2Fplasma-python
Code https://github.com/PPPLDeepLearning/plasma-python
Issues 21 https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests 1 https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions https://github.com/PPPLDeepLearning/plasma-python/actions
Projects 0 https://github.com/PPPLDeepLearning/plasma-python/projects
Security 0 https://github.com/PPPLDeepLearning/plasma-python/security
Insights https://github.com/PPPLDeepLearning/plasma-python/pulse
Code https://github.com/PPPLDeepLearning/plasma-python
Issues https://github.com/PPPLDeepLearning/plasma-python/issues
Pull requests https://github.com/PPPLDeepLearning/plasma-python/pulls
Actions https://github.com/PPPLDeepLearning/plasma-python/actions
Projects https://github.com/PPPLDeepLearning/plasma-python/projects
Security https://github.com/PPPLDeepLearning/plasma-python/security
Insights https://github.com/PPPLDeepLearning/plasma-python/pulse
New issuehttps://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/52
New issuehttps://github.com/login?return_to=https://github.com/PPPLDeepLearning/plasma-python/issues/52
Improve performance on V100shttps://github.com/PPPLDeepLearning/plasma-python/issues/52#top
https://github.com/felker
https://github.com/felker
felkerhttps://github.com/felker
on Dec 17, 2019https://github.com/PPPLDeepLearning/plasma-python/issues/52#issue-539358048
@jnkhhttps://github.com/jnkh
@ge-donghttps://github.com/ge-dong
@ASvyatkovskiyhttps://github.com/ASvyatkovskiy
#51https://github.com/PPPLDeepLearning/plasma-python/issues/51
tensorflow/tensorflow#18913https://github.com/tensorflow/tensorflow/issues/18913
keras-team/keras#9825https://github.com/keras-team/keras/issues/9825
keras-team/keras#9321https://github.com/keras-team/keras/issues/9321
https://github.com/NVIDIA/ncclhttps://github.com/NVIDIA/nccl
https://github.com/NVIDIA/apexhttps://github.com/NVIDIA/apex
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.