Title: Performance improvements · Issue #163 · feos-org/feos · GitHub
Open Graph Title: Performance improvements · Issue #163 · feos-org/feos
X Title: Performance improvements · Issue #163 · feos-org/feos
Description: Following some guidelines of the Rust Performance Book here are some things we can try to improve performance: Add codegen-units = 1 to release build Use a faster allocator. E.g. mimalloc works on all operating systems Not so easy: prope...
Open Graph Description: Following some guidelines of the Rust Performance Book here are some things we can try to improve performance: Add codegen-units = 1 to release build Use a faster allocator. E.g. mimalloc works on ...
X Description: Following some guidelines of the Rust Performance Book here are some things we can try to improve performance: Add codegen-units = 1 to release build Use a faster allocator. E.g. mimalloc works on ...
Opengraph URL: https://github.com/feos-org/feos/issues/163
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Performance improvements","articleBody":"Following some guidelines of the [Rust Performance Book](https://nnethercote.github.io/perf-book/title-page.html) here are some things we can try to improve performance:\r\n\r\n- Add `codegen-units = 1` to release build\r\n- Use a faster allocator. E.g. [mimalloc](https://github.com/microsoft/mimalloc) works on all operating systems\r\n\r\nNot so easy:\r\n- properly profile to identify hot parts\r\n- remove clones/allocations where not needed\r\n- use profile-guided optimization (e.g. via [cargo-pgo](https://github.com/Kobzol/cargo-pgo))\r\n - unfortunately this is currently not working with LTO and the PGO version is 10-20% slower than LTO\r\n - might be available in the future in `maturin` directly, see [here](https://github.com/PyO3/maturin/issues/1840)\r\n\r\n---\r\n\r\nQuick tests with `codegen-units = 1` added to `release-lto` (see [here](https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units)) show performance improvements of benchmarks of up to 12% (mean is about 7%) while for `dual_number`, changes are a bit smaller (see below).\r\n\r\nProper benchmarks (across all benchmarks) with comparison to current release workflow are needed but this might be an easy-to-get improvement if it turns out to be faster for all cases.\r\n\r\n- Benchmark: dual_numbers\r\n- System: methane/CO2\r\n- `main`: main branch + lto\r\n- `main_codegen`: main branch + lto + codegen-units = 1\r\n- `develop_`: like main\r\n\r\n**Execution times in µs**\r\n| name | f64 | dual | dual2 | hyperdual | dual3 |\r\n|:----------------|-------:|-------:|--------:|------------:|--------:|\r\n| main | 1.1382 | 1.2325 | 1.4539 | 1.6267 | 1.7563 |\r\n| main_codegen | 1.0229 | 1.1741 | 1.3708 | 1.5777 | 1.6316 |\r\n| develop | 1.0138 | 1.1989 | 1.4465 | 1.589 | 1.7549 |\r\n| develop_codegen | 0.9761 | 1.1681 | 1.4195 | 1.5446 | 1.6304 |\r\n\r\n**Slowdown t_f64/t_d for each branch/option**\r\n| | f64 | dual | dual2 | hyperdual | dual3 |\r\n|:----------------|------:|--------:|--------:|------------:|--------:|\r\n| main | 1 | 1.08285 | 1.27737 | 1.42919 | 1.54305 |\r\n| main_codegen | 1 | 1.14782 | 1.34011 | 1.54238 | 1.59507 |\r\n| develop | 1 | 1.18258 | 1.42681 | 1.56737 | 1.73101 |\r\n| develop_codegen | 1 | 1.1967 | 1.45426 | 1.58242 | 1.67032 |\r\n\r\n**Relative difference in % w.r.t. main + lto for each dual number (t_d_branch - t_d_main) / t_d_main * 100**\r\n| name | f64 | dual | dual2 | hyperdual | dual3 |\r\n|:----------------|---------:|---------:|----------:|------------:|----------:|\r\n| main_codegen | -10.13 | -4.74 | -5.72 | -3.01 | -7.10 |\r\n| develop | -10.93 | -2.73 | -0.51 | -2.32 | -0.08 |\r\n| develop_codegen | -14.24 | -5.23 | -2.37 | -5.05 | -7.17 |\r\n\r\n","author":{"url":"https://github.com/g-bauer","@type":"Person","name":"g-bauer"},"datePublished":"2023-06-27T13:17:41.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/163/feos/issues/163"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:7c7dd847-588c-260a-3fbe-0d820f29f692 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | EE0C:EFD0D:10BFE7:174A38:698EED69 |
| html-safe-nonce | 172ca5cf25d0b11ac4389191569c0c99080fe196f9127c0e8142922df12eb200 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFRTBDOkVGRDBEOjEwQkZFNzoxNzRBMzg6Njk4RUVENjkiLCJ2aXNpdG9yX2lkIjoiODE4Mjg5NjU5MTc1OTE0MjI0OSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 84b153cfaa453f23f194d8aa9c8a0dc7373534bccd2ac4f2384b4475c226b0a4 |
| hovercard-subject-tag | issue:1776946224 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feos-org/feos/163/issue_layout |
| twitter:image | https://opengraph.githubassets.com/2b2dd880fb3fc0952df220b4f5033efda542950fc67d02a60180478ea06dea34/feos-org/feos/issues/163 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/2b2dd880fb3fc0952df220b4f5033efda542950fc67d02a60180478ea06dea34/feos-org/feos/issues/163 |
| og:image:alt | Following some guidelines of the Rust Performance Book here are some things we can try to improve performance: Add codegen-units = 1 to release build Use a faster allocator. E.g. mimalloc works on ... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | g-bauer |
| hostname | github.com |
| expected-hostname | github.com |
| None | cb2828a801ee6b7be618f3ac76fbf55def35bbc30f053a9c41bf90210b8b72ba |
| turbo-cache-control | no-preview |
| go-import | github.com/feos-org/feos git https://github.com/feos-org/feos.git |
| octolytics-dimension-user_id | 87855701 |
| octolytics-dimension-user_login | feos-org |
| octolytics-dimension-repository_id | 424905104 |
| octolytics-dimension-repository_nwo | feos-org/feos |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 424905104 |
| octolytics-dimension-repository_network_root_nwo | feos-org/feos |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 74ed479fe042e0ee79d00083dd248df8cc447655 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width