Title: Speed up multiple inference steps at the end of each epoch · Issue #61 · PPPLDeepLearning/plasma-python · GitHub
Open Graph Title: Speed up multiple inference steps at the end of each epoch · Issue #61 · PPPLDeepLearning/plasma-python
X Title: Speed up multiple inference steps at the end of each epoch · Issue #61 · PPPLDeepLearning/plasma-python
Description: Presently, at the end of every epoch, the trained weights are reloaded via a call to Keras.Models.load_weights() 3x separate times in order to evaluate the accuracy on the shots in the training, validation, and testing sets: plasma-pytho...
Open Graph Description: Presently, at the end of every epoch, the trained weights are reloaded via a call to Keras.Models.load_weights() 3x separate times in order to evaluate the accuracy on the shots in the training, va...
X Description: Presently, at the end of every epoch, the trained weights are reloaded via a call to Keras.Models.load_weights() 3x separate times in order to evaluate the accuracy on the shots in the training, va...
Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/61
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Speed up multiple inference steps at the end of each epoch","articleBody":"Presently, at the end of every epoch, the trained weights are reloaded via a call to `Keras.Models.load_weights()` 3x separate times in order to evaluate the accuracy on the shots in the training, validation, and testing sets:\r\n\r\nhttps://github.com/PPPLDeepLearning/plasma-python/blob/c82ba61e339882a5af10b1052edc0348e16119f4/plasma/models/mpi_runner.py#L932-L965\r\n\r\nDepending on the size of the datasets (number of shots, pulse length, number of signals per shot), network architecture, and hardware, this process might take a significant amount of time. This is especially noticeable if the epoch walltimes are relatively short due to small batch sizes, etc. \r\n\r\nFor example, for a recent test with `d3d_0D` on Traverse 4x V100s:\r\n```\r\nFinished training epoch 3.01 during this session (1.00 epochs passed) in 87.65 seconds\r\nFinished training of epoch 6.01/1000\r\nBegin evaluation of epoch 6.01/1000\r\n[2] loading from epoch 6\r\n[1] loading from epoch 6\r\n[0] loading from epoch 6\r\n[3] loading from epoch 6\r\n\r\n128/894 [===\u003e..........................] - ETA: 1:53\r\n640/894 [====================\u003e.........] - ETA: 13s\r\n896/894 [==============================] - 35s 39ms/step\r\n[0] loading from epoch 6\r\n[3] loading from epoch 6\r\n[1] loading from epoch 6\r\n[2] loading from epoch 6\r\n\r\n128/894 [===\u003e..........................] - ETA: 1:53\r\n640/894 [====================\u003e.........] - ETA: 13s\r\n896/894 [==============================] - 35s 39ms/step\r\nepoch 6, val_roc_30 = 0.85346611872694 val_roc_70 = 0.8345022047574768 val_roc_200 = 0.7913309535951044 val_roc_500 = 0.6638869724330323 va\\l_roc_1000 = 0.5480697123316435\r\n[3] loading from epoch 6 [2] loading from epoch 6\r\n[0] loading from epoch 6 [1] loading from epoch 6\r\n 128/894 [===\u003e..........................] - ETA: 1:53\r\n640/894 [====================\u003e.........] - ETA: 12s 896/894 [==============================] - 35s 39ms/step\r\nepoch 6, test_roc_30 = 0.8400389140546622 test_roc_70 = 0.8236098866020126 test_roc_200 = 0.7792357036451524 test_roc_500 = 0.6798285349466\\453 test_roc_1000 = 0.5699692943787431\r\n```\r\n\r\nIt seems straightforward to deduplicate the 3x 1:53 load times via a new combined function instead of 2x calls to `mpi_make_predictions_and_evaluate_multiple_times()` + 1x call to `mpi_make_predictions_and_evaluate()`. \r\n","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2020-01-07T21:00:41.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/61/plasma-python/issues/61"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:6bdd3c01-f93f-eb58-2ea9-9af59f7d98ee |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9A7E:9C600:474BBF0:609B1CF:698DE393 |
| html-safe-nonce | f42dcf48808d1319180b5fb346e664ee9fc0e94068af71cf88c9373ecd18d926 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5QTdFOjlDNjAwOjQ3NEJCRjA6NjA5QjFDRjo2OThERTM5MyIsInZpc2l0b3JfaWQiOiI3OTkwMjUxMTgwOTMxMjc3NzE1IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | 76b02fbb6992fdacf33538a7f137730a0f4f73c7c0abe5ea47c654f75be1283f |
| hovercard-subject-tag | issue:546504868 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/61/issue_layout |
| twitter:image | https://opengraph.githubassets.com/b19a4e9e5f76f8961e28956883b6ee62261591c3aa682ed7b2006a0decf31d08/PPPLDeepLearning/plasma-python/issues/61 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/b19a4e9e5f76f8961e28956883b6ee62261591c3aa682ed7b2006a0decf31d08/PPPLDeepLearning/plasma-python/issues/61 |
| og:image:alt | Presently, at the end of every epoch, the trained weights are reloaded via a call to Keras.Models.load_weights() 3x separate times in order to evaluate the accuracy on the shots in the training, va... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | felker |
| hostname | github.com |
| expected-hostname | github.com |
| None | 929d0ce8b653d60df0698366d7e9012f9423ea1bace40816e16e5b007242aae4 |
| turbo-cache-control | no-preview |
| go-import | github.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git |
| octolytics-dimension-user_id | 23219101 |
| octolytics-dimension-user_login | PPPLDeepLearning |
| octolytics-dimension-repository_id | 72968591 |
| octolytics-dimension-repository_nwo | PPPLDeepLearning/plasma-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 72968591 |
| octolytics-dimension-repository_network_root_nwo | PPPLDeepLearning/plasma-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 5048f761cd42a133ec9ccbf7ca847affc26d4937 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width