Title: Resolve differences in shot counts from Nature paper; improve storage of details of the shot and signal sets · Issue #60 · PPPLDeepLearning/plasma-python · GitHub
Open Graph Title: Resolve differences in shot counts from Nature paper; improve storage of details of the shot and signal sets · Issue #60 · PPPLDeepLearning/plasma-python
X Title: Resolve differences in shot counts from Nature paper; improve storage of details of the shot and signal sets · Issue #60 · PPPLDeepLearning/plasma-python
Description: Details reproduced from email correspondence in November 2019. There are slight discrepancies in the output of guaranteed_preprocessed.py from the current version of the code and the figures from Kates-Harbeck et al (2019) when applied t...
Open Graph Description: Details reproduced from email correspondence in November 2019. There are slight discrepancies in the output of guaranteed_preprocessed.py from the current version of the code and the figures from K...
X Description: Details reproduced from email correspondence in November 2019. There are slight discrepancies in the output of guaranteed_preprocessed.py from the current version of the code and the figures from K...
Opengraph URL: https://github.com/PPPLDeepLearning/plasma-python/issues/60
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Resolve differences in shot counts from Nature paper; improve storage of details of the shot and signal sets","articleBody":"Details reproduced from email correspondence in November 2019.\r\n\r\nThere are slight discrepancies in the output of `guaranteed_preprocessed.py` from the current version of the code and the figures from Kates-Harbeck *et al* (2019) when applied to the JET dataset.\r\n\r\nWhen specifying the following definition of the `all_signals` dictionary in `data/signals.py` for the 0D FRNN model for JET carbon wall (CW) training -\u003e ITER-like wall (ILW) testing (`jet_data_0D`), I end up with 5502 processed shots out of the 5524 raw input files, and 479/488 raw disruptive shots:\r\n```\r\nall_signals = {\r\n 'q95': q95, 'li': li, 'ip': ip, 'betan': betan, 'energy': energy, 'lm': lm,\r\n 'dens': dens, 'pradcore': pradcore,\r\n 'pradedge': pradedge, 'pradtot': pradtot, 'pin': pin,\r\n 'torquein': torquein,\r\n 'energydt': energydt, 'ipdirect': ipdirect, 'iptarget': iptarget,\r\n 'iperr': iperr,\r\n}\r\n```\r\n\r\n\u003cimg width=\"506\" alt=\"PastedGraphic-5\" src=\"https://user-images.githubusercontent.com/1410981/71926895-e0b48a00-3159-11ea-8768-0f8895939c1a.png\"\u003e\r\n\r\nThis amounts to 8 fewer overall shots vs. the 5510 which was published in the Extended Data Table 2 (see below). Specifically, there are 3 more CW disruptive shots in the (train+validate) set in my results, which means that there are actually 4 fewer nondisruptive CW and 7 fewer nondisruptive ILW shots, too.\r\n\r\nOne of @ge-dong's `.npz` files of processed JET shots from October 2019 exactly matches my numbers. I have looked through the Git history of the relevant preprocessing files, and cannot account for a change to the “omit” criteria that would have caused such a change. I assume that the raw input shot lists and data have not changed for the 5524 JET candidates since the paper was published. \r\n\r\nI have long suspected that the shot counts and some of the early JET results in the paper predate the addition of the 2x \u003cvar\u003eP\u003csub\u003erad,core\u003c/sub\u003e\u003c/var\u003e, \u003cvar\u003eP\u003csub\u003erad,edge\u003c/sub\u003e\u003c/var\u003e signals to the JET datasets in the code (even though they do appear in Extended Data Table 1). They were not in our original 8-9 signal set, and the files in `/tigress/jk7/best_performance/deep_jet/` do not list them. \r\n\r\nIf you remove `pradedge, pradcore` from the all_signals dictionary and run `guarantee_preprocessed.py` you end up with 5514 total processed shots, the ILW set counts exactly match the test set numbers from the paper, 1191 (174):\r\n\r\n\u003cimg width=\"506\" alt=\"PastedGraphic-6\" src=\"https://user-images.githubusercontent.com/1410981/71927176-87992600-315a-11ea-81a6-610ee8b1d185.png\"\u003e\r\nIn this case there are only 4 extra disruptive CW shots vs. the numbers from the paper, and the split of disruptive shots between the train and validate sets is different, but this could be due to a change in the random number sequence….\r\nSo perhaps the Nature paper numbers were from before the addition of the 2x extra radiated power signals on JET, and there has been some change in the preprocessing since publication that allows an extra 4x disruptive shots to be not be omitted. \r\n\r\n\u003cimg width=\"941\" alt=\"PastedGraphic-4\" src=\"https://user-images.githubusercontent.com/1410981/71927212-9bdd2300-315a-11ea-9f66-8d81102230e1.png\"\u003e\r\n\r\nTo-do\r\n=======\r\n\r\nThe lesson from this is that `guarantee_preprocessed.py` should always output another `.txt` file (or add to the existing `processed_shotlists/d3d_0D/shot_lists_signal_group_X.npz`) the details of the omitted shots:\r\n- omitted shot numbers (so that the set of all input/raw shot numbers could be reconstructed in conjunction with the included shot numbers)\r\n- criterion for omission\r\n\r\nPlaintext would be good for reproducibility; see #41. Even though this info is contained in the `.npz` file and/or codebase, it would be good to dump to `.txt`:\r\n- Exact shot numbers for included shots in each of the train/validate/test set\r\n- Precise signal path (in MDSPlus database) info for all signals in this signal group\r\n- Details about resampling, clipping, causal shifting, etc. \r\n\r\nWill be useful for real-time inference model @mdboyer\r\n","author":{"url":"https://github.com/felker","@type":"Person","name":"felker"},"datePublished":"2020-01-07T20:45:34.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/60/plasma-python/issues/60"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:3eca50f3-be80-09da-825c-d527f13e3bd4 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A8C6:6C968:67057A:8BD775:698E3681 |
| html-safe-nonce | a34309bf75e36d053ad058a3ffbf586d04d2c72c4d2b756425a1728284f170f9 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBOEM2OjZDOTY4OjY3MDU3QTo4QkQ3NzU6Njk4RTM2ODEiLCJ2aXNpdG9yX2lkIjoiNjIzMjU2MTE0MjQ2NTQ0MzQ1NyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | f39f171d586c011061b9edce409ad98cd686c583d7964eb9a66fa348f6e1cb58 |
| hovercard-subject-tag | issue:546498248 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/PPPLDeepLearning/plasma-python/60/issue_layout |
| twitter:image | https://opengraph.githubassets.com/82c79bdd581dc291d4498a33488fb52ebecd486d11add4f7386763fe66901689/PPPLDeepLearning/plasma-python/issues/60 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/82c79bdd581dc291d4498a33488fb52ebecd486d11add4f7386763fe66901689/PPPLDeepLearning/plasma-python/issues/60 |
| og:image:alt | Details reproduced from email correspondence in November 2019. There are slight discrepancies in the output of guaranteed_preprocessed.py from the current version of the code and the figures from K... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | felker |
| hostname | github.com |
| expected-hostname | github.com |
| None | a5632af64f7fed7bff1d6a428d1aca1b94fa7a48f760de2d39d9b1effdbf0082 |
| turbo-cache-control | no-preview |
| go-import | github.com/PPPLDeepLearning/plasma-python git https://github.com/PPPLDeepLearning/plasma-python.git |
| octolytics-dimension-user_id | 23219101 |
| octolytics-dimension-user_login | PPPLDeepLearning |
| octolytics-dimension-repository_id | 72968591 |
| octolytics-dimension-repository_nwo | PPPLDeepLearning/plasma-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 72968591 |
| octolytics-dimension-repository_network_root_nwo | PPPLDeepLearning/plasma-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | df1885ca022c7c5634d3a31e5a91e6a35990d0b7 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width