* -fprofile-update=atomic vs. 32-bit architectures @ 2022-11-04 8:27 Sebastian Huber 2022-11-04 9:53 ` Gabriel Paubert ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Sebastian Huber @ 2022-11-04 8:27 UTC (permalink / raw) To: GCC Development Hello, even recent 32-bit architectures such as RISC-V do not support 64-bit atomic operations. Using -fprofile-update=atomic for the 32-bit RISC-V RV32GC ISA yields: warning: target does not support atomic profile update, single mode is selected For multi-threaded applications it is quite important to use atomic counter increments to get valid coverage data. I think this fall back is not really good. Maybe we should consider using this approach from Jakub Jelinek for 32-bit architectures lacking 64-bit atomic operations: if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0) __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334 Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally reduce the gcov type size to 32 bits. I am not really sure if this was a good idea. Longer running executables may observe counter overflows leading to invalid coverage data. If someone wants atomic updates, then the updates should be atomic even if this means to use a library implementation (libatomic). What about the following approach if -fprofile-update=atomic is given: 1. Use 64-bit atomics if available. 2. Use if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0) __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); if 32-bit atomics are available. 3. Else use a library call (libatomic). -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-04 8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber @ 2022-11-04 9:53 ` Gabriel Paubert 2022-11-04 10:02 ` Sebastian Huber 2022-11-05 11:18 ` Richard Biener 2022-12-07 9:55 ` Sebastian Huber 2 siblings, 1 reply; 17+ messages in thread From: Gabriel Paubert @ 2022-11-04 9:53 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Fri, Nov 04, 2022 at 09:27:34AM +0100, Sebastian Huber wrote: > Hello, > > even recent 32-bit architectures such as RISC-V do not support 64-bit atomic > operations. Using -fprofile-update=atomic for the 32-bit RISC-V RV32GC ISA > yields: > > warning: target does not support atomic profile update, single mode is > selected > > For multi-threaded applications it is quite important to use atomic counter > increments to get valid coverage data. I think this fall back is not really > good. Maybe we should consider using this approach from Jakub Jelinek for > 32-bit architectures lacking 64-bit atomic operations: > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == > 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); > > https://urldefense.com/v3/__https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/*1447334__;Iw!!D9dNQwwGXtA!QgLVk_W5VF39jGPn64zfvbJ4IiAGApjLqzW7UkLWWuFD6ya4AAega4z4_tu2YquarSyTIl7qLzWvIefVpXkLKsAaeeIU63MtmQU$ > > Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally reduce > the gcov type size to 32 bits. I am not really sure if this was a good idea. > Longer running executables may observe counter overflows leading to invalid > coverage data. If someone wants atomic updates, then the updates should be > atomic even if this means to use a library implementation (libatomic). > > What about the following approach if -fprofile-update=atomic is given: > > 1. Use 64-bit atomics if available. > > 2. Use > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == > 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); > > if 32-bit atomics are available. This assumes little-endian byte order. Cheers, Gabriel > > 3. Else use a library call (libatomic). > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://urldefense.com/v3/__https://embedded-brains.de/datenschutzerklaerung/__;!!D9dNQwwGXtA!QgLVk_W5VF39jGPn64zfvbJ4IiAGApjLqzW7UkLWWuFD6ya4AAega4z4_tu2YquarSyTIl7qLzWvIefVpXkLKsAaeeIUo5lh3vs$ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-04 9:53 ` Gabriel Paubert @ 2022-11-04 10:02 ` Sebastian Huber 0 siblings, 0 replies; 17+ messages in thread From: Sebastian Huber @ 2022-11-04 10:02 UTC (permalink / raw) To: Gabriel Paubert; +Cc: GCC Development On 04/11/2022 10:53, Gabriel Paubert wrote: >> 2. Use >> >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == >> 0) >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); >> >> if 32-bit atomics are available. > This assumes little-endian byte order. Yes, but this approach would also work on big-endian architectures. You just have to use other addresses. I guess the compiler knows for which endianess it generates code. -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-04 8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber 2022-11-04 9:53 ` Gabriel Paubert @ 2022-11-05 11:18 ` Richard Biener 2022-11-08 6:22 ` Sebastian Huber 2022-12-07 9:55 ` Sebastian Huber 2 siblings, 1 reply; 17+ messages in thread From: Richard Biener @ 2022-11-05 11:18 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > Hello, > > even recent 32-bit architectures such as RISC-V do not support 64-bit > atomic operations. Using -fprofile-update=atomic for the 32-bit RISC-V > RV32GC ISA yields: > > warning: target does not support atomic profile update, single mode is > selected > > For multi-threaded applications it is quite important to use atomic > counter increments to get valid coverage data. I think this fall back is > not really good. Maybe we should consider using this approach from Jakub > Jelinek for 32-bit architectures lacking 64-bit atomic operations: > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > == 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); > > https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334 > > Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally > reduce the gcov type size to 32 bits. I am not really sure if this was a > good idea. Longer running executables may observe counter overflows > leading to invalid coverage data. If someone wants atomic updates, then > the updates should be atomic even if this means to use a library > implementation (libatomic). > > What about the following approach if -fprofile-update=atomic is given: > > 1. Use 64-bit atomics if available. > > 2. Use > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > == 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); > > if 32-bit atomics are available. > > 3. Else use a library call (libatomic). sounds good, though a library call would really be prohibitly expensive? > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-05 11:18 ` Richard Biener @ 2022-11-08 6:22 ` Sebastian Huber 2022-11-08 10:25 ` Richard Biener 0 siblings, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-11-08 6:22 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 05.11.22 12:18, Richard Biener wrote: > On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber > <sebastian.huber@embedded-brains.de> wrote: >> Hello, >> >> even recent 32-bit architectures such as RISC-V do not support 64-bit >> atomic operations. Using -fprofile-update=atomic for the 32-bit RISC-V >> RV32GC ISA yields: >> >> warning: target does not support atomic profile update, single mode is >> selected >> >> For multi-threaded applications it is quite important to use atomic >> counter increments to get valid coverage data. I think this fall back is >> not really good. Maybe we should consider using this approach from Jakub >> Jelinek for 32-bit architectures lacking 64-bit atomic operations: >> >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) >> == 0) >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, >> __ATOMIC_RELAXED); >> >> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334 >> >> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally >> reduce the gcov type size to 32 bits. I am not really sure if this was a >> good idea. Longer running executables may observe counter overflows >> leading to invalid coverage data. If someone wants atomic updates, then >> the updates should be atomic even if this means to use a library >> implementation (libatomic). >> >> What about the following approach if -fprofile-update=atomic is given: >> >> 1. Use 64-bit atomics if available. >> >> 2. Use >> >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) >> == 0) >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, >> __ATOMIC_RELAXED); >> >> if 32-bit atomics are available. >> >> 3. Else use a library call (libatomic). > sounds good, though a library call would really be prohibitly expensive? I someone wants to profile a multi-threaded application and selects -fprofile-update=atomic, then probably a correct result is preferred. You still have the option to select -fprofile-update=prefer-atomic. For 2. I have to modify: void gimple_gen_edge_profiler (int edgeno, edge e) { tree one; one = build_int_cst (gcov_type_node, 1); if (flag_profile_update == PROFILE_UPDATE_ATOMIC) { /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 ? BUILT_IN_ATOMIC_FETCH_ADD_8: BUILT_IN_ATOMIC_FETCH_ADD_4); gcall *stmt = gimple_build_call (f, 3, addr, one, build_int_cst (integer_type_node, MEMMODEL_RELAXED)); gsi_insert_on_edge (e, stmt); } Is if (WORDS_BIG_ENDIAN) the right way to check for big/little endian? How do I get ((unsigned int *) &val) + 1 from tree addr? It would be great to have a code example for the construction of the "if (f()) f();". -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-08 6:22 ` Sebastian Huber @ 2022-11-08 10:25 ` Richard Biener 2022-11-08 12:00 ` Sebastian Huber 2022-12-05 7:26 ` Sebastian Huber 0 siblings, 2 replies; 17+ messages in thread From: Richard Biener @ 2022-11-08 10:25 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Tue, Nov 8, 2022 at 7:22 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 05.11.22 12:18, Richard Biener wrote: > > On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber > > <sebastian.huber@embedded-brains.de> wrote: > >> Hello, > >> > >> even recent 32-bit architectures such as RISC-V do not support 64-bit > >> atomic operations. Using -fprofile-update=atomic for the 32-bit RISC-V > >> RV32GC ISA yields: > >> > >> warning: target does not support atomic profile update, single mode is > >> selected > >> > >> For multi-threaded applications it is quite important to use atomic > >> counter increments to get valid coverage data. I think this fall back is > >> not really good. Maybe we should consider using this approach from Jakub > >> Jelinek for 32-bit architectures lacking 64-bit atomic operations: > >> > >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > >> == 0) > >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > >> __ATOMIC_RELAXED); > >> > >> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334 > >> > >> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally > >> reduce the gcov type size to 32 bits. I am not really sure if this was a > >> good idea. Longer running executables may observe counter overflows > >> leading to invalid coverage data. If someone wants atomic updates, then > >> the updates should be atomic even if this means to use a library > >> implementation (libatomic). > >> > >> What about the following approach if -fprofile-update=atomic is given: > >> > >> 1. Use 64-bit atomics if available. > >> > >> 2. Use > >> > >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > >> == 0) > >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > >> __ATOMIC_RELAXED); > >> > >> if 32-bit atomics are available. > >> > >> 3. Else use a library call (libatomic). > > sounds good, though a library call would really be prohibitly expensive? > > I someone wants to profile a multi-threaded application and selects > -fprofile-update=atomic, then probably a correct result is preferred. > You still have the option to select -fprofile-update=prefer-atomic. > > For 2. I have to modify: > > void > gimple_gen_edge_profiler (int edgeno, edge e) > { > tree one; > > one = build_int_cst (gcov_type_node, 1); > > if (flag_profile_update == PROFILE_UPDATE_ATOMIC) > { > /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ > tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); > tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 > ? BUILT_IN_ATOMIC_FETCH_ADD_8: > BUILT_IN_ATOMIC_FETCH_ADD_4); > gcall *stmt = gimple_build_call (f, 3, addr, one, > build_int_cst (integer_type_node, > MEMMODEL_RELAXED)); > gsi_insert_on_edge (e, stmt); > } > > Is > > if (WORDS_BIG_ENDIAN) > > the right way to check for big/little endian? Yes. > How do I get ((unsigned int *) &val) + 1 from tree addr? > > It would be great to have a code example for the construction of the "if > (f()) f();". I think for the function above we need to emit __atomic_fetch_add_8, not the emulated form because we cannot insert the required control flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be lowered after the instrumentation took place. There's currently no helper to create a diamond so the canonical way is to create a GIMPLE_COND, split the block after this stmt, split the outgoing edge and then redirect edges to form a half-diamond. move_sese_in_condition has most of that CFG manipulation (but it performs sth different) Richard. > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-08 10:25 ` Richard Biener @ 2022-11-08 12:00 ` Sebastian Huber 2022-11-08 13:52 ` Richard Biener 2022-12-05 7:26 ` Sebastian Huber 1 sibling, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-11-08 12:00 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 08.11.22 11:25, Richard Biener wrote: >> How do I get ((unsigned int *) &val) + 1 from tree addr? >> >> It would be great to have a code example for the construction of the "if >> (f()) f();". > I think for the function above we need to emit __atomic_fetch_add_8, > not the emulated form because we cannot insert the required control > flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be > lowered after the instrumentation took place. Ok, I am not a compiler expert, so I have just a vague feeling how this works. I am not sure which piece is responsible for the "lowering" of this particular __atomic_fetch_add_8. I guess we don't want to split all __atomic_fetch_add_8 into this if (f()) f(); form? > > There's currently no helper to create a diamond so the canonical > way is to create a GIMPLE_COND, split the block after this stmt, > split the outgoing edge and then redirect edges to form a half-diamond. > move_sese_in_condition has most of that CFG manipulation (but > it performs sth different) Thanks, I will probably able to do this with a bit trial and error. -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-08 12:00 ` Sebastian Huber @ 2022-11-08 13:52 ` Richard Biener 0 siblings, 0 replies; 17+ messages in thread From: Richard Biener @ 2022-11-08 13:52 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Tue, Nov 8, 2022 at 1:00 PM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 08.11.22 11:25, Richard Biener wrote: > >> How do I get ((unsigned int *) &val) + 1 from tree addr? > >> > >> It would be great to have a code example for the construction of the "if > >> (f()) f();". > > I think for the function above we need to emit __atomic_fetch_add_8, > > not the emulated form because we cannot insert the required control > > flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be > > lowered after the instrumentation took place. > > Ok, I am not a compiler expert, so I have just a vague feeling how this > works. I am not sure which piece is responsible for the "lowering" of > this particular __atomic_fetch_add_8. I guess we don't want to split all > __atomic_fetch_add_8 into this if (f()) f(); form? I think we should do it right after the profile instrumentation commits the edge insertions. And yes, we only want to lower those that are not natively supported (but have native support for fetch_add_4). Not sure how to determine this. > > > > > There's currently no helper to create a diamond so the canonical > > way is to create a GIMPLE_COND, split the block after this stmt, > > split the outgoing edge and then redirect edges to form a half-diamond. > > move_sese_in_condition has most of that CFG manipulation (but > > it performs sth different) > > Thanks, I will probably able to do this with a bit trial and error. > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-08 10:25 ` Richard Biener 2022-11-08 12:00 ` Sebastian Huber @ 2022-12-05 7:26 ` Sebastian Huber 2022-12-05 7:44 ` Richard Biener 1 sibling, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-12-05 7:26 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 08/11/2022 11:25, Richard Biener wrote: >> It would be great to have a code example for the construction of the "if >> (f()) f();". > I think for the function above we need to emit __atomic_fetch_add_8, > not the emulated form because we cannot insert the required control > flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be > lowered after the instrumentation took place. Would it help to change the if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0) __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); into unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0) v = (unsigned int)(v == 0); __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED); to get rid of an inserted control flow? On riscv this is optimized to: li a4,1 amoadd.w a5,a4,0(a0) addi a5,a5,1 seqz a5,a5 addi a4,a0,4 amoadd.w zero,a5,0(a4) -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-05 7:26 ` Sebastian Huber @ 2022-12-05 7:44 ` Richard Biener 2022-12-06 13:11 ` Sebastian Huber 0 siblings, 1 reply; 17+ messages in thread From: Richard Biener @ 2022-12-05 7:44 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 08/11/2022 11:25, Richard Biener wrote: > >> It would be great to have a code example for the construction of the "if > >> (f()) f();". > > I think for the function above we need to emit __atomic_fetch_add_8, > > not the emulated form because we cannot insert the required control > > flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be > > lowered after the instrumentation took place. > > Would it help to change the > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > == 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); > > into > > unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1, > __ATOMIC_RELAXED) > == 0) > v = (unsigned int)(v == 0); > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); that's supposed to add 'v' instead of 1? Possibly use uint32_t here (aka uint32_type_node). > > to get rid of an inserted control flow? That for sure wouldn't require any changes to how the profile instrumentation works, so yes it would be simpler. Richard. > On riscv this is optimized to: > > li a4,1 > amoadd.w a5,a4,0(a0) > addi a5,a5,1 > seqz a5,a5 > addi a4,a0,4 > amoadd.w zero,a5,0(a4) > > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-05 7:44 ` Richard Biener @ 2022-12-06 13:11 ` Sebastian Huber 2022-12-06 16:08 ` Richard Biener 0 siblings, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-12-06 13:11 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 05/12/2022 08:44, Richard Biener wrote: > On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber > <sebastian.huber@embedded-brains.de> wrote: >> On 08/11/2022 11:25, Richard Biener wrote: >>>> It would be great to have a code example for the construction of the "if >>>> (f()) f();". >>> I think for the function above we need to emit __atomic_fetch_add_8, >>> not the emulated form because we cannot insert the required control >>> flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be >>> lowered after the instrumentation took place. >> Would it help to change the >> >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) >> == 0) >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, >> __ATOMIC_RELAXED); >> >> into >> >> unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1, >> __ATOMIC_RELAXED) >> == 0) >> v = (unsigned int)(v == 0); >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, >> __ATOMIC_RELAXED); > that's supposed to add 'v' instead of 1? Possibly use uint32_t here > (aka uint32_type_node). > >> to get rid of an inserted control flow? > That for sure wouldn't require any changes to how the profile > instrumentation works, > so yes it would be simpler. Yes, this seems to work. After a bit of trial and error I ended up with something in gimple_gen_edge_profiler() like this (endian support is missing): else if (flag_profile_update == PROFILE_UPDATE_SPLIT_ATOMIC) { tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4); gcall *stmt1 = gimple_build_call (f, 3, addr, one, build_int_cst (integer_type_node, MEMMODEL_RELAXED)); tree low = create_tmp_var (uint32_type_node); gimple_call_set_lhs (stmt1, low); tree is_zero = create_tmp_var (boolean_type_node); gassign *stmt2 = gimple_build_assign (is_zero, EQ_EXPR, low, build_zero_cst (uint32_type_node)); tree high_inc = create_tmp_var (uint32_type_node); gassign *stmt3 = gimple_build_assign (high_inc, COND_EXPR, is_zero, build_one_cst (uint32_type_node), build_zero_cst (uint32_type_node)); tree addr_high = create_tmp_var (TREE_TYPE (addr)); gassign *stmt4 = gimple_build_assign (addr_high, addr); gassign *stmt5 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR, addr_high, build_int_cst (size_type_node, 4)); gcall *stmt6 = gimple_build_call (f, 3, addr_high, high_inc, build_int_cst (integer_type_node, MEMMODEL_RELAXED)); gsi_insert_on_edge (e, stmt1); gsi_insert_on_edge (e, stmt2); gsi_insert_on_edge (e, stmt3); gsi_insert_on_edge (e, stmt4); gsi_insert_on_edge (e, stmt5); gsi_insert_on_edge (e, stmt6); } It can be probably simplified. The generated code: .type f, @function f: lui a4,%hi(__gcov0.f) li a3,1 addi a4,a4,%lo(__gcov0.f) amoadd.w a5,a3,0(a4) lui a4,%hi(__gcov0.f+4) addi a5,a5,1 seqz a5,a5 addi a4,a4,%lo(__gcov0.f+4) amoadd.w zero,a5,0(a4) li a0,3 ret looks good for this code: int f(void) { return 3; } The loading of the high address could be probably optimized from lui a4,%hi(__gcov0.f+4) addi a4,a4,%lo(__gcov0.f+4) to addi a4,a4,4 I wasn't able to figure out how to do this. -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-06 13:11 ` Sebastian Huber @ 2022-12-06 16:08 ` Richard Biener 2022-12-07 8:51 ` Sebastian Huber 0 siblings, 1 reply; 17+ messages in thread From: Richard Biener @ 2022-12-06 16:08 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Tue, Dec 6, 2022 at 2:11 PM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 05/12/2022 08:44, Richard Biener wrote: > > On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber > > <sebastian.huber@embedded-brains.de> wrote: > >> On 08/11/2022 11:25, Richard Biener wrote: > >>>> It would be great to have a code example for the construction of the "if > >>>> (f()) f();". > >>> I think for the function above we need to emit __atomic_fetch_add_8, > >>> not the emulated form because we cannot insert the required control > >>> flow (if (f()) f()) on an edge. The __atomic_fetch_add_8 should then be > >>> lowered after the instrumentation took place. > >> Would it help to change the > >> > >> if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > >> == 0) > >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > >> __ATOMIC_RELAXED); > >> > >> into > >> > >> unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1, > >> __ATOMIC_RELAXED) > >> == 0) > >> v = (unsigned int)(v == 0); > >> __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > >> __ATOMIC_RELAXED); > > that's supposed to add 'v' instead of 1? Possibly use uint32_t here > > (aka uint32_type_node). > > > >> to get rid of an inserted control flow? > > That for sure wouldn't require any changes to how the profile > > instrumentation works, > > so yes it would be simpler. > > Yes, this seems to work. After a bit of trial and error I ended up with > something in gimple_gen_edge_profiler() like this (endian support is > missing): > > else if (flag_profile_update == PROFILE_UPDATE_SPLIT_ATOMIC) > { > tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); > tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4); > gcall *stmt1 = gimple_build_call (f, 3, addr, one, > build_int_cst (integer_type_node, > MEMMODEL_RELAXED)); > tree low = create_tmp_var (uint32_type_node); > gimple_call_set_lhs (stmt1, low); > tree is_zero = create_tmp_var (boolean_type_node); > gassign *stmt2 = gimple_build_assign (is_zero, EQ_EXPR, low, > build_zero_cst (uint32_type_node)); > tree high_inc = create_tmp_var (uint32_type_node); > gassign *stmt3 = gimple_build_assign (high_inc, COND_EXPR, is_zero, > build_one_cst (uint32_type_node), > build_zero_cst (uint32_type_node)); > tree addr_high = create_tmp_var (TREE_TYPE (addr)); > gassign *stmt4 = gimple_build_assign (addr_high, addr); > gassign *stmt5 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR, > addr_high, > build_int_cst (size_type_node, 4)); > gcall *stmt6 = gimple_build_call (f, 3, addr_high, high_inc, > build_int_cst (integer_type_node, > MEMMODEL_RELAXED)); > gsi_insert_on_edge (e, stmt1); > gsi_insert_on_edge (e, stmt2); > gsi_insert_on_edge (e, stmt3); > gsi_insert_on_edge (e, stmt4); > gsi_insert_on_edge (e, stmt5); > gsi_insert_on_edge (e, stmt6); > } > > It can be probably simplified. Likely. I'd use the gimple_build () API from gimple-fold.h which builds the expression(s) to a gimple_seq creating necessary temporaries on-the-fly and then insert that sequence on the edge. But even the above should work. The generated code: > > .type f, @function > f: > lui a4,%hi(__gcov0.f) > li a3,1 > addi a4,a4,%lo(__gcov0.f) > amoadd.w a5,a3,0(a4) > lui a4,%hi(__gcov0.f+4) > addi a5,a5,1 > seqz a5,a5 > addi a4,a4,%lo(__gcov0.f+4) > amoadd.w zero,a5,0(a4) > li a0,3 > ret > > looks good for this code: > > int f(void) > { > return 3; > } > > The loading of the high address could be probably optimized from > > lui a4,%hi(__gcov0.f+4) > addi a4,a4,%lo(__gcov0.f+4) > > to > > addi a4,a4,4 > > I wasn't able to figure out how to do this. I think that's something for the backend - we're not good CSEing parts of an "invariant" address and the above might be the form required when relocations are needed. Richard. > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-06 16:08 ` Richard Biener @ 2022-12-07 8:51 ` Sebastian Huber 2022-12-07 9:09 ` Richard Biener 0 siblings, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-12-07 8:51 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 06.12.22 17:08, Richard Biener wrote: > Likely. I'd use the gimple_build () API from gimple-fold.h which > builds the expression(s) to a gimple_seq creating necessary temporaries > on-the-fly and then insert that sequence on the edge. Thanks, I will have a look at this. I am struggling to convert a uint32_type_node node to a gcov_type_node (64-bit). I tried to use this: if (result != NULL_TREE) { tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name); gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR, build1 (VIEW_CONVERT_EXPR, gcov_type_node, high)); tree tmp2 = make_temp_ssa_name (gcov_type_node, NULL, name); gassign *stmt8 = gimple_build_assign (tmp2, LSHIFT_EXPR, tmp1, build_int_cst (integer_type_node, 32)); gassign *stmt9 = gimple_build_assign (result, BIT_IOR_EXPR, tmp2, tmp1); gsi_insert_after (gsi, stmt7, GSI_NEW_STMT); gsi_insert_after (gsi, stmt8, GSI_NEW_STMT); gsi_insert_after (gsi, stmt9, GSI_NEW_STMT); } This ends up in: ../test.c: In function 'f': ../test.c:4:1: error: conversion of register to a different size in 'view_convert_expr' 4 | } | ^ VIEW_CONVERT_EXPR<long long int>(PROF_time_profiler_15); PROF_time_profile_9 = VIEW_CONVERT_EXPR<long long int>(PROF_time_profiler_15); during IPA pass: profile ../test.c:4:1: internal compiler error: verify_gimple failed 0xdddc95 verify_gimple_in_cfg(function*, bool, bool) /home/EB/sebastian_h/src/gcc/gcc/tree-cfg.cc:5647 0xc20221 execute_function_todo /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2091 0xc1efd6 do_per_function /home/EB/sebastian_h/src/gcc/gcc/passes.cc:1701 0xc20416 execute_todo /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2145 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-07 8:51 ` Sebastian Huber @ 2022-12-07 9:09 ` Richard Biener 2022-12-07 9:24 ` Sebastian Huber 0 siblings, 1 reply; 17+ messages in thread From: Richard Biener @ 2022-12-07 9:09 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 06.12.22 17:08, Richard Biener wrote: > > Likely. I'd use the gimple_build () API from gimple-fold.h which > > builds the expression(s) to a gimple_seq creating necessary temporaries > > on-the-fly and then insert that sequence on the edge. > > Thanks, I will have a look at this. > > I am struggling to convert a uint32_type_node node to a gcov_type_node > (64-bit). I tried to use this: > > if (result != NULL_TREE) > { > tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name); > gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR, > build1 (VIEW_CONVERT_EXPR, gcov_type_node, > high)); You want gimple_build_assign (result, NOP_EXPR, high); here (a conversion, from unsigned it will zero-extend) > tree tmp2 = make_temp_ssa_name (gcov_type_node, NULL, name); > gassign *stmt8 = gimple_build_assign (tmp2, LSHIFT_EXPR, tmp1, > build_int_cst (integer_type_node, 32)); > gassign *stmt9 = gimple_build_assign (result, BIT_IOR_EXPR, tmp2, tmp1); > gsi_insert_after (gsi, stmt7, GSI_NEW_STMT); > gsi_insert_after (gsi, stmt8, GSI_NEW_STMT); > gsi_insert_after (gsi, stmt9, GSI_NEW_STMT); > } > > This ends up in: > > ../test.c: In function 'f': > ../test.c:4:1: error: conversion of register to a different size in > 'view_convert_expr' > 4 | } > | ^ > VIEW_CONVERT_EXPR<long long int>(PROF_time_profiler_15); > > PROF_time_profile_9 = VIEW_CONVERT_EXPR<long long > int>(PROF_time_profiler_15); > during IPA pass: profile > ../test.c:4:1: internal compiler error: verify_gimple failed > 0xdddc95 verify_gimple_in_cfg(function*, bool, bool) > /home/EB/sebastian_h/src/gcc/gcc/tree-cfg.cc:5647 > 0xc20221 execute_function_todo > /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2091 > 0xc1efd6 do_per_function > /home/EB/sebastian_h/src/gcc/gcc/passes.cc:1701 > 0xc20416 execute_todo > /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2145 > Please submit a full bug report, with preprocessed source (by using > -freport-bug). > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-07 9:09 ` Richard Biener @ 2022-12-07 9:24 ` Sebastian Huber 2022-12-07 11:49 ` Richard Biener 0 siblings, 1 reply; 17+ messages in thread From: Sebastian Huber @ 2022-12-07 9:24 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Development On 07.12.22 10:09, Richard Biener wrote: > On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber > <sebastian.huber@embedded-brains.de> wrote: >> On 06.12.22 17:08, Richard Biener wrote: >>> Likely. I'd use the gimple_build () API from gimple-fold.h which >>> builds the expression(s) to a gimple_seq creating necessary temporaries >>> on-the-fly and then insert that sequence on the edge. >> Thanks, I will have a look at this. >> >> I am struggling to convert a uint32_type_node node to a gcov_type_node >> (64-bit). I tried to use this: >> >> if (result != NULL_TREE) >> { >> tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name); >> gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR, >> build1 (VIEW_CONVERT_EXPR, gcov_type_node, >> high)); > You want > > gimple_build_assign (result, NOP_EXPR, high); > > here (a conversion, from unsigned it will zero-extend) Thanks, with this NOP_EXPR it did work. I have now a proof of concept ready. Should I wait for the GCC 14 development cycle or can I post a patch set now? -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-12-07 9:24 ` Sebastian Huber @ 2022-12-07 11:49 ` Richard Biener 0 siblings, 0 replies; 17+ messages in thread From: Richard Biener @ 2022-12-07 11:49 UTC (permalink / raw) To: Sebastian Huber; +Cc: GCC Development On Wed, Dec 7, 2022 at 10:24 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > > > On 07.12.22 10:09, Richard Biener wrote: > > On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber > > <sebastian.huber@embedded-brains.de> wrote: > >> On 06.12.22 17:08, Richard Biener wrote: > >>> Likely. I'd use the gimple_build () API from gimple-fold.h which > >>> builds the expression(s) to a gimple_seq creating necessary temporaries > >>> on-the-fly and then insert that sequence on the edge. > >> Thanks, I will have a look at this. > >> > >> I am struggling to convert a uint32_type_node node to a gcov_type_node > >> (64-bit). I tried to use this: > >> > >> if (result != NULL_TREE) > >> { > >> tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name); > >> gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR, > >> build1 (VIEW_CONVERT_EXPR, gcov_type_node, > >> high)); > > You want > > > > gimple_build_assign (result, NOP_EXPR, high); > > > > here (a conversion, from unsigned it will zero-extend) > > Thanks, with this NOP_EXPR it did work. I have now a proof of concept > ready. Should I wait for the GCC 14 development cycle or can I post a > patch set now? You can surely post a patch now, if it addresses a bug it could be even considered. Richard. > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: -fprofile-update=atomic vs. 32-bit architectures 2022-11-04 8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber 2022-11-04 9:53 ` Gabriel Paubert 2022-11-05 11:18 ` Richard Biener @ 2022-12-07 9:55 ` Sebastian Huber 2 siblings, 0 replies; 17+ messages in thread From: Sebastian Huber @ 2022-12-07 9:55 UTC (permalink / raw) To: GCC Development On 04.11.22 09:27, Sebastian Huber wrote: > Hello, > > even recent 32-bit architectures such as RISC-V do not support 64-bit > atomic operations. Using -fprofile-update=atomic for the 32-bit RISC-V > RV32GC ISA yields: > > warning: target does not support atomic profile update, single mode is > selected > > For multi-threaded applications it is quite important to use atomic > counter increments to get valid coverage data. I think this fall back is > not really good. Maybe we should consider using this approach from Jakub > Jelinek for 32-bit architectures lacking 64-bit atomic operations: > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > == 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); > > https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334 > > Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally > reduce the gcov type size to 32 bits. I am not really sure if this was a > good idea. Longer running executables may observe counter overflows > leading to invalid coverage data. If someone wants atomic updates, then > the updates should be atomic even if this means to use a library > implementation (libatomic). > > What about the following approach if -fprofile-update=atomic is given: > > 1. Use 64-bit atomics if available. > > 2. Use > > if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) > == 0) > __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, > __ATOMIC_RELAXED); > > if 32-bit atomics are available. This approach works fine for the edge counters in gimple_gen_edge_profiler() because we don't have to read the counter value. We just have to do an increment. In gimple_gen_time_profiler() we have to do this: /* Emit: counters[0] = ++__gcov_time_profiler_counter. */ So here we have to do an atomic increment and fetch the value. This doesn't work with the approach above. For example let thread A increment the lower part from 0xfffffffe to 0xffffffff, then let thread B increment the lower part from 0xffffffff to 0x0, then the higher part from 0x7 to 0x8, then let thread A read 0x8. Thread A would then get 0x8_ffffffff instead of the correct 0x7_ffffffff. > > 3. Else use a library call (libatomic). > -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.huber@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-12-07 11:49 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-04 8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber 2022-11-04 9:53 ` Gabriel Paubert 2022-11-04 10:02 ` Sebastian Huber 2022-11-05 11:18 ` Richard Biener 2022-11-08 6:22 ` Sebastian Huber 2022-11-08 10:25 ` Richard Biener 2022-11-08 12:00 ` Sebastian Huber 2022-11-08 13:52 ` Richard Biener 2022-12-05 7:26 ` Sebastian Huber 2022-12-05 7:44 ` Richard Biener 2022-12-06 13:11 ` Sebastian Huber 2022-12-06 16:08 ` Richard Biener 2022-12-07 8:51 ` Sebastian Huber 2022-12-07 9:09 ` Richard Biener 2022-12-07 9:24 ` Sebastian Huber 2022-12-07 11:49 ` Richard Biener 2022-12-07 9:55 ` Sebastian Huber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).