Title: To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation - Archive ouverte HAL
Open Graph Title: To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation
Description:
Due to its speed in cross-executing sequential code, dynamic binary translation is the unchallenged technology for full system-level simulation. Among the translators, QEMU has become the de facto solution. It introduced parallel host execution of the target cores a few years ago for the ARM instruction set architecture and this support is now also available, among others, for RISCV. Given the popularity of these instruction sets in multi and many-core systems, assessing the scalability of their parallel implementation makes sense. In this paper, we use a subset of the PARSEC benchmark to measure the execution time of QEMU's parallel implementation, to which we added the ability to pin a target processor to a host core or hardware thread. We report the results of a wealth of experiments we performed on a 16-core/32-thread x86-64 SMP machine. They show that the support of parallelism in QEMU scales well, and that, somewhat counter intuitively, pinning does not improve performance.
Open Graph Description:
Due to its speed in cross-executing sequential code, dynamic binary translation is the unchallenged technology for full system-level simulation. Among the translators, QEMU has become the de facto solution. It introduced parallel host execution of the target cores a few years ago for the ARM instruction set architecture and this support is now also available, among others, for RISCV. Given the popularity of these instruction sets in multi and many-core systems, assessing the scalability of their parallel implementation makes sense. In this paper, we use a subset of the PARSEC benchmark to measure the execution time of QEMU's parallel implementation, to which we added the ability to pin a target processor to a host core or hardware thread. We report the results of a wealth of experiments we performed on a 16-core/32-thread x86-64 SMP machine. They show that the support of parallelism in QEMU scales well, and that, somewhat counter intuitively, pinning does not improve performance.
Keywords:
Opengraph URL: https://hal.science/hal-03417343/document
Domain: hal.archives-ouvertes.fr
| citation_language | en |
| DC.language | en |
| DC.type | proceedings |
| og:type | proceedings |
| citation_title | To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation |
| DC.title | To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation |
| DC.identifier | https://hal.science/hal-03417343/document |
| citation_author | Frédéric Pétrot |
| citation_author_institution | System Level Synthesis |
| DC.creator | Frédéric Pétrot |
| citation_author_hal_id | frederic-petrot |
| citation_abstract | Due to its speed in cross-executing sequential code, dynamic binary translation is the unchallenged technology for full system-level simulation. Among the translators, QEMU has become the de facto solution. It introduced parallel host execution of the target cores a few years ago for the ARM instruction set architecture and this support is now also available, among others, for RISCV. Given the popularity of these instruction sets in multi and many-core systems, assessing the scalability of their parallel implementation makes sense. In this paper, we use a subset of the PARSEC benchmark to measure the execution time of QEMU's parallel implementation, to which we added the ability to pin a target processor to a host core or hardware thread. We report the results of a wealth of experiments we performed on a 16-core/32-thread x86-64 SMP machine. They show that the support of parallelism in QEMU scales well, and that, somewhat counter intuitively, pinning does not improve performance. |
| DC.description | Due to its speed in cross-executing sequential code, dynamic binary translation is the unchallenged technology for full system-level simulation. Among the translators, QEMU has become the de facto solution. It introduced parallel host execution of the target cores a few years ago for the ARM instruction set architecture and this support is now also available, among others, for RISCV. Given the popularity of these instruction sets in multi and many-core systems, assessing the scalability of their parallel implementation makes sense. In this paper, we use a subset of the PARSEC benchmark to measure the execution time of QEMU's parallel implementation, to which we added the ability to pin a target processor to a host core or hardware thread. We report the results of a wealth of experiments we performed on a 16-core/32-thread x86-64 SMP machine. They show that the support of parallelism in QEMU scales well, and that, somewhat counter intuitively, pinning does not improve performance. |
| citation_keywords | parallel simulation;dynamic binary translation;discrete event simulation;performance evaluation |
| DC.subject | parallel simulation;dynamic binary translation;discrete event simulation;performance evaluation |
| citation_pdf_url | https://hal.science/hal-03417343/document |
| citation_online_date | 2021/12/01 |
| citation_publication_date | 2021/09/01 |
| DC.date | 2021/09/01 |
| DC.issued | 2021/09/01 |
| citation_firstpage | 238-245 |
| citation_conference_title | 24th Euromicro Conference on Digital System Design (Euromicro DSD/SEAA 2021) |
| DC.relation.ispartof | 24th Euromicro Conference on Digital System Design (Euromicro DSD/SEAA 2021) |
| DC.publisher | IEEE |
| citation_doi | 10.1109/DSD53832.2021.00045 |
| citation_funding_source | citation_funder=Agence Nationale de la Recherche; citation_funder_id=10.13039/501100001665; citation_grant_number=ANR-18-CE25-0017; |
| msapplication-TileColor | #000092 |
| theme-color | #ffffff |
Links:
Viewport: width=device-width, initial-scale=1, shrink-to-fit=no