-fprofile-update=atomic vs. 32-bit architectures

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* -fprofile-update=atomic vs. 32-bit architectures
@ 2022-11-04  8:27 Sebastian Huber
  2022-11-04  9:53 ` Gabriel Paubert
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Sebastian Huber @ 2022-11-04  8:27 UTC (permalink / raw)
  To: GCC Development

Hello,

even recent 32-bit architectures such as RISC-V do not support 64-bit 
atomic operations.  Using -fprofile-update=atomic for the 32-bit RISC-V 
RV32GC ISA yields:

warning: target does not support atomic profile update, single mode is 
selected

For multi-threaded applications it is quite important to use atomic 
counter increments to get valid coverage data. I think this fall back is 
not really good. Maybe we should consider using this approach from Jakub 
Jelinek for 32-bit architectures lacking 64-bit atomic operations:

   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) 
== 0)
     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, 
__ATOMIC_RELAXED);

https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334

Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally 
reduce the gcov type size to 32 bits. I am not really sure if this was a 
good idea. Longer running executables may observe counter overflows 
leading to invalid coverage data. If someone wants atomic updates, then 
the updates should be atomic even if this means to use a library 
implementation (libatomic).

What about the following approach if -fprofile-update=atomic is given:

1. Use 64-bit atomics if available.

2. Use

   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) 
== 0)
     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, 
__ATOMIC_RELAXED);

if 32-bit atomics are available.

3. Else use a library call (libatomic).

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-04  8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber
@ 2022-11-04  9:53 ` Gabriel Paubert
  2022-11-04 10:02   ` Sebastian Huber
  2022-11-05 11:18 ` Richard Biener
  2022-12-07  9:55 ` Sebastian Huber
  2 siblings, 1 reply; 17+ messages in thread
From: Gabriel Paubert @ 2022-11-04  9:53 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Fri, Nov 04, 2022 at 09:27:34AM +0100, Sebastian Huber wrote:
> Hello,
> 
> even recent 32-bit architectures such as RISC-V do not support 64-bit atomic
> operations.  Using -fprofile-update=atomic for the 32-bit RISC-V RV32GC ISA
> yields:
> 
> warning: target does not support atomic profile update, single mode is
> selected
> 
> For multi-threaded applications it is quite important to use atomic counter
> increments to get valid coverage data. I think this fall back is not really
> good. Maybe we should consider using this approach from Jakub Jelinek for
> 32-bit architectures lacking 64-bit atomic operations:
> 
>   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) ==
> 0)
>     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);
> 
> https://urldefense.com/v3/__https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/*1447334__;Iw!!D9dNQwwGXtA!QgLVk_W5VF39jGPn64zfvbJ4IiAGApjLqzW7UkLWWuFD6ya4AAega4z4_tu2YquarSyTIl7qLzWvIefVpXkLKsAaeeIU63MtmQU$
> 
> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally reduce
> the gcov type size to 32 bits. I am not really sure if this was a good idea.
> Longer running executables may observe counter overflows leading to invalid
> coverage data. If someone wants atomic updates, then the updates should be
> atomic even if this means to use a library implementation (libatomic).
> 
> What about the following approach if -fprofile-update=atomic is given:
> 
> 1. Use 64-bit atomics if available.
> 
> 2. Use
> 
>   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) ==
> 0)
>     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);
> 
> if 32-bit atomics are available.

This assumes little-endian byte order.

	Cheers,
	Gabriel

> 
> 3. Else use a library call (libatomic).
> 
> -- 
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://urldefense.com/v3/__https://embedded-brains.de/datenschutzerklaerung/__;!!D9dNQwwGXtA!QgLVk_W5VF39jGPn64zfvbJ4IiAGApjLqzW7UkLWWuFD6ya4AAega4z4_tu2YquarSyTIl7qLzWvIefVpXkLKsAaeeIUo5lh3vs$



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-04  9:53 ` Gabriel Paubert
@ 2022-11-04 10:02   ` Sebastian Huber
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Huber @ 2022-11-04 10:02 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: GCC Development

On 04/11/2022 10:53, Gabriel Paubert wrote:
>> 2. Use
>>
>>    if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) ==
>> 0)
>>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);
>>
>> if 32-bit atomics are available.
> This assumes little-endian byte order.

Yes, but this approach would also work on big-endian architectures. You 
just have to use other addresses. I guess the compiler knows for which 
endianess it generates code.

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-04  8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber
  2022-11-04  9:53 ` Gabriel Paubert
@ 2022-11-05 11:18 ` Richard Biener
  2022-11-08  6:22   ` Sebastian Huber
  2022-12-07  9:55 ` Sebastian Huber
  2 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-11-05 11:18 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> Hello,
>
> even recent 32-bit architectures such as RISC-V do not support 64-bit
> atomic operations.  Using -fprofile-update=atomic for the 32-bit RISC-V
> RV32GC ISA yields:
>
> warning: target does not support atomic profile update, single mode is
> selected
>
> For multi-threaded applications it is quite important to use atomic
> counter increments to get valid coverage data. I think this fall back is
> not really good. Maybe we should consider using this approach from Jakub
> Jelinek for 32-bit architectures lacking 64-bit atomic operations:
>
>    if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> == 0)
>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> __ATOMIC_RELAXED);
>
> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334
>
> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally
> reduce the gcov type size to 32 bits. I am not really sure if this was a
> good idea. Longer running executables may observe counter overflows
> leading to invalid coverage data. If someone wants atomic updates, then
> the updates should be atomic even if this means to use a library
> implementation (libatomic).
>
> What about the following approach if -fprofile-update=atomic is given:
>
> 1. Use 64-bit atomics if available.
>
> 2. Use
>
>    if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> == 0)
>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> __ATOMIC_RELAXED);
>
> if 32-bit atomics are available.
>
> 3. Else use a library call (libatomic).

sounds good, though a library call would really be prohibitly expensive?

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-05 11:18 ` Richard Biener
@ 2022-11-08  6:22   ` Sebastian Huber
  2022-11-08 10:25     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-11-08  6:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 05.11.22 12:18, Richard Biener wrote:
> On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber
> <sebastian.huber@embedded-brains.de>  wrote:
>> Hello,
>>
>> even recent 32-bit architectures such as RISC-V do not support 64-bit
>> atomic operations.  Using -fprofile-update=atomic for the 32-bit RISC-V
>> RV32GC ISA yields:
>>
>> warning: target does not support atomic profile update, single mode is
>> selected
>>
>> For multi-threaded applications it is quite important to use atomic
>> counter increments to get valid coverage data. I think this fall back is
>> not really good. Maybe we should consider using this approach from Jakub
>> Jelinek for 32-bit architectures lacking 64-bit atomic operations:
>>
>>     if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
>> == 0)
>>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
>> __ATOMIC_RELAXED);
>>
>> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334
>>
>> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally
>> reduce the gcov type size to 32 bits. I am not really sure if this was a
>> good idea. Longer running executables may observe counter overflows
>> leading to invalid coverage data. If someone wants atomic updates, then
>> the updates should be atomic even if this means to use a library
>> implementation (libatomic).
>>
>> What about the following approach if -fprofile-update=atomic is given:
>>
>> 1. Use 64-bit atomics if available.
>>
>> 2. Use
>>
>>     if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
>> == 0)
>>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
>> __ATOMIC_RELAXED);
>>
>> if 32-bit atomics are available.
>>
>> 3. Else use a library call (libatomic).
> sounds good, though a library call would really be prohibitly expensive?

I someone wants to profile a multi-threaded application and selects 
-fprofile-update=atomic, then probably a correct result is preferred. 
You still have the option to select -fprofile-update=prefer-atomic.

For 2. I have to modify:

void
gimple_gen_edge_profiler (int edgeno, edge e)
{
   tree one;

   one = build_int_cst (gcov_type_node, 1);

   if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
     {
       /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
       tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
       tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
				      ? BUILT_IN_ATOMIC_FETCH_ADD_8:
				      BUILT_IN_ATOMIC_FETCH_ADD_4);
       gcall *stmt = gimple_build_call (f, 3, addr, one,
				       build_int_cst (integer_type_node,
						      MEMMODEL_RELAXED));
       gsi_insert_on_edge (e, stmt);
     }

Is

if (WORDS_BIG_ENDIAN)

the right way to check for big/little endian?

How do I get ((unsigned int *) &val) + 1 from tree addr?

It would be great to have a code example for the construction of the "if 
(f()) f();".

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-08  6:22   ` Sebastian Huber
@ 2022-11-08 10:25     ` Richard Biener
  2022-11-08 12:00       ` Sebastian Huber
  2022-12-05  7:26       ` Sebastian Huber
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2022-11-08 10:25 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Tue, Nov 8, 2022 at 7:22 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 05.11.22 12:18, Richard Biener wrote:
> > On Fri, Nov 4, 2022 at 9:28 AM Sebastian Huber
> > <sebastian.huber@embedded-brains.de>  wrote:
> >> Hello,
> >>
> >> even recent 32-bit architectures such as RISC-V do not support 64-bit
> >> atomic operations.  Using -fprofile-update=atomic for the 32-bit RISC-V
> >> RV32GC ISA yields:
> >>
> >> warning: target does not support atomic profile update, single mode is
> >> selected
> >>
> >> For multi-threaded applications it is quite important to use atomic
> >> counter increments to get valid coverage data. I think this fall back is
> >> not really good. Maybe we should consider using this approach from Jakub
> >> Jelinek for 32-bit architectures lacking 64-bit atomic operations:
> >>
> >>     if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> >> == 0)
> >>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> >> __ATOMIC_RELAXED);
> >>
> >> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334
> >>
> >> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally
> >> reduce the gcov type size to 32 bits. I am not really sure if this was a
> >> good idea. Longer running executables may observe counter overflows
> >> leading to invalid coverage data. If someone wants atomic updates, then
> >> the updates should be atomic even if this means to use a library
> >> implementation (libatomic).
> >>
> >> What about the following approach if -fprofile-update=atomic is given:
> >>
> >> 1. Use 64-bit atomics if available.
> >>
> >> 2. Use
> >>
> >>     if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> >> == 0)
> >>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> >> __ATOMIC_RELAXED);
> >>
> >> if 32-bit atomics are available.
> >>
> >> 3. Else use a library call (libatomic).
> > sounds good, though a library call would really be prohibitly expensive?
>
> I someone wants to profile a multi-threaded application and selects
> -fprofile-update=atomic, then probably a correct result is preferred.
> You still have the option to select -fprofile-update=prefer-atomic.
>
> For 2. I have to modify:
>
> void
> gimple_gen_edge_profiler (int edgeno, edge e)
> {
>    tree one;
>
>    one = build_int_cst (gcov_type_node, 1);
>
>    if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
>      {
>        /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
>        tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
>        tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
>                                       ? BUILT_IN_ATOMIC_FETCH_ADD_8:
>                                       BUILT_IN_ATOMIC_FETCH_ADD_4);
>        gcall *stmt = gimple_build_call (f, 3, addr, one,
>                                        build_int_cst (integer_type_node,
>                                                       MEMMODEL_RELAXED));
>        gsi_insert_on_edge (e, stmt);
>      }
>
> Is
>
> if (WORDS_BIG_ENDIAN)
>
> the right way to check for big/little endian?

Yes.

> How do I get ((unsigned int *) &val) + 1 from tree addr?
>
> It would be great to have a code example for the construction of the "if
> (f()) f();".

I think for the function above we need to emit __atomic_fetch_add_8,
not the emulated form because we cannot insert the required control
flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
lowered after the instrumentation took place.

There's currently no helper to create a diamond so the canonical
way is to create a GIMPLE_COND, split the block after this stmt,
split the outgoing edge and then redirect edges to form a half-diamond.
move_sese_in_condition has most of that CFG manipulation (but
it performs sth different)

Richard.

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-08 10:25     ` Richard Biener
@ 2022-11-08 12:00       ` Sebastian Huber
  2022-11-08 13:52         ` Richard Biener
  2022-12-05  7:26       ` Sebastian Huber
  1 sibling, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-11-08 12:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 08.11.22 11:25, Richard Biener wrote:
>> How do I get ((unsigned int *) &val) + 1 from tree addr?
>>
>> It would be great to have a code example for the construction of the "if
>> (f()) f();".
> I think for the function above we need to emit __atomic_fetch_add_8,
> not the emulated form because we cannot insert the required control
> flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
> lowered after the instrumentation took place.

Ok, I am not a compiler expert, so I have just a vague feeling how this 
works. I am not sure which piece is responsible for the "lowering" of 
this particular __atomic_fetch_add_8. I guess we don't want to split all 
__atomic_fetch_add_8 into this if (f()) f(); form?

> 
> There's currently no helper to create a diamond so the canonical
> way is to create a GIMPLE_COND, split the block after this stmt,
> split the outgoing edge and then redirect edges to form a half-diamond.
> move_sese_in_condition has most of that CFG manipulation (but
> it performs sth different)

Thanks, I will probably able to do this with a bit trial and error.

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-08 12:00       ` Sebastian Huber
@ 2022-11-08 13:52         ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2022-11-08 13:52 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Tue, Nov 8, 2022 at 1:00 PM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 08.11.22 11:25, Richard Biener wrote:
> >> How do I get ((unsigned int *) &val) + 1 from tree addr?
> >>
> >> It would be great to have a code example for the construction of the "if
> >> (f()) f();".
> > I think for the function above we need to emit __atomic_fetch_add_8,
> > not the emulated form because we cannot insert the required control
> > flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
> > lowered after the instrumentation took place.
>
> Ok, I am not a compiler expert, so I have just a vague feeling how this
> works. I am not sure which piece is responsible for the "lowering" of
> this particular __atomic_fetch_add_8. I guess we don't want to split all
> __atomic_fetch_add_8 into this if (f()) f(); form?

I think we should do it right after the profile instrumentation commits
the edge insertions.  And yes, we only want to lower those that
are not natively supported (but have native support for fetch_add_4).
Not sure how to determine this.

>
> >
> > There's currently no helper to create a diamond so the canonical
> > way is to create a GIMPLE_COND, split the block after this stmt,
> > split the outgoing edge and then redirect edges to form a half-diamond.
> > move_sese_in_condition has most of that CFG manipulation (but
> > it performs sth different)
>
> Thanks, I will probably able to do this with a bit trial and error.
>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-08 10:25     ` Richard Biener
  2022-11-08 12:00       ` Sebastian Huber
@ 2022-12-05  7:26       ` Sebastian Huber
  2022-12-05  7:44         ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-12-05  7:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 08/11/2022 11:25, Richard Biener wrote:
>> It would be great to have a code example for the construction of the "if
>> (f()) f();".
> I think for the function above we need to emit __atomic_fetch_add_8,
> not the emulated form because we cannot insert the required control
> flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
> lowered after the instrumentation took place.

Would it help to change the

     if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
== 0)
       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
__ATOMIC_RELAXED);

into

     unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1, 
__ATOMIC_RELAXED)
== 0)
     v = (unsigned int)(v == 0);
     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
__ATOMIC_RELAXED);

to get rid of an inserted control flow?

On riscv this is optimized to:

         li      a4,1
         amoadd.w a5,a4,0(a0)
         addi    a5,a5,1
         seqz    a5,a5
         addi    a4,a0,4
         amoadd.w zero,a5,0(a4)


-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-05  7:26       ` Sebastian Huber
@ 2022-12-05  7:44         ` Richard Biener
  2022-12-06 13:11           ` Sebastian Huber
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-12-05  7:44 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 08/11/2022 11:25, Richard Biener wrote:
> >> It would be great to have a code example for the construction of the "if
> >> (f()) f();".
> > I think for the function above we need to emit __atomic_fetch_add_8,
> > not the emulated form because we cannot insert the required control
> > flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
> > lowered after the instrumentation took place.
>
> Would it help to change the
>
>      if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> == 0)
>        __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> __ATOMIC_RELAXED);
>
> into
>
>      unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1,
> __ATOMIC_RELAXED)
> == 0)
>      v = (unsigned int)(v == 0);
>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> __ATOMIC_RELAXED);

that's supposed to add 'v' instead of 1?  Possibly use uint32_t here
(aka uint32_type_node).

>
> to get rid of an inserted control flow?

That for sure wouldn't require any changes to how the profile
instrumentation works,
so yes it would be simpler.

Richard.

> On riscv this is optimized to:
>
>          li      a4,1
>          amoadd.w a5,a4,0(a0)
>          addi    a5,a5,1
>          seqz    a5,a5
>          addi    a4,a0,4
>          amoadd.w zero,a5,0(a4)
>
>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-05  7:44         ` Richard Biener
@ 2022-12-06 13:11           ` Sebastian Huber
  2022-12-06 16:08             ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-12-06 13:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 05/12/2022 08:44, Richard Biener wrote:
> On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber
> <sebastian.huber@embedded-brains.de>  wrote:
>> On 08/11/2022 11:25, Richard Biener wrote:
>>>> It would be great to have a code example for the construction of the "if
>>>> (f()) f();".
>>> I think for the function above we need to emit __atomic_fetch_add_8,
>>> not the emulated form because we cannot insert the required control
>>> flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
>>> lowered after the instrumentation took place.
>> Would it help to change the
>>
>>       if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
>> == 0)
>>         __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
>> __ATOMIC_RELAXED);
>>
>> into
>>
>>       unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1,
>> __ATOMIC_RELAXED)
>> == 0)
>>       v = (unsigned int)(v == 0);
>>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
>> __ATOMIC_RELAXED);
> that's supposed to add 'v' instead of 1?  Possibly use uint32_t here
> (aka uint32_type_node).
> 
>> to get rid of an inserted control flow?
> That for sure wouldn't require any changes to how the profile
> instrumentation works,
> so yes it would be simpler.

Yes, this seems to work. After a bit of trial and error I ended up with 
something in gimple_gen_edge_profiler() like this (endian support is 
missing):

   else if (flag_profile_update == PROFILE_UPDATE_SPLIT_ATOMIC)
     {
       tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
       tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
       gcall *stmt1 = gimple_build_call (f, 3, addr, one,
					build_int_cst (integer_type_node,
						      MEMMODEL_RELAXED));
       tree low = create_tmp_var (uint32_type_node);
       gimple_call_set_lhs (stmt1, low);
       tree is_zero = create_tmp_var (boolean_type_node);
       gassign *stmt2 = gimple_build_assign (is_zero, EQ_EXPR, low,
					    build_zero_cst (uint32_type_node));
       tree high_inc = create_tmp_var (uint32_type_node);
       gassign *stmt3 = gimple_build_assign (high_inc, COND_EXPR, is_zero,
					    build_one_cst (uint32_type_node),
					    build_zero_cst (uint32_type_node));
       tree addr_high = create_tmp_var (TREE_TYPE (addr));
       gassign *stmt4 = gimple_build_assign (addr_high, addr);
       gassign *stmt5 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
					    addr_high,
					    build_int_cst (size_type_node, 4));
       gcall *stmt6 = gimple_build_call (f, 3, addr_high, high_inc,
					build_int_cst (integer_type_node,
						       MEMMODEL_RELAXED));
       gsi_insert_on_edge (e, stmt1);
       gsi_insert_on_edge (e, stmt2);
       gsi_insert_on_edge (e, stmt3);
       gsi_insert_on_edge (e, stmt4);
       gsi_insert_on_edge (e, stmt5);
       gsi_insert_on_edge (e, stmt6);
     }

It can be probably simplified. The generated code:

         .type   f, @function
f:
         lui     a4,%hi(__gcov0.f)
         li      a3,1
         addi    a4,a4,%lo(__gcov0.f)
         amoadd.w a5,a3,0(a4)
         lui     a4,%hi(__gcov0.f+4)
         addi    a5,a5,1
         seqz    a5,a5
         addi    a4,a4,%lo(__gcov0.f+4)
         amoadd.w zero,a5,0(a4)
         li      a0,3
         ret

looks good for this code:

int f(void)
{
   return 3;
}

The loading of the high address could be probably optimized from

         lui     a4,%hi(__gcov0.f+4)
         addi    a4,a4,%lo(__gcov0.f+4)

to

         addi    a4,a4,4

I wasn't able to figure out how to do this.

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-06 13:11           ` Sebastian Huber
@ 2022-12-06 16:08             ` Richard Biener
  2022-12-07  8:51               ` Sebastian Huber
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-12-06 16:08 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Tue, Dec 6, 2022 at 2:11 PM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 05/12/2022 08:44, Richard Biener wrote:
> > On Mon, Dec 5, 2022 at 8:26 AM Sebastian Huber
> > <sebastian.huber@embedded-brains.de>  wrote:
> >> On 08/11/2022 11:25, Richard Biener wrote:
> >>>> It would be great to have a code example for the construction of the "if
> >>>> (f()) f();".
> >>> I think for the function above we need to emit __atomic_fetch_add_8,
> >>> not the emulated form because we cannot insert the required control
> >>> flow (if (f()) f()) on an edge.  The __atomic_fetch_add_8 should then be
> >>> lowered after the instrumentation took place.
> >> Would it help to change the
> >>
> >>       if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED)
> >> == 0)
> >>         __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> >> __ATOMIC_RELAXED);
> >>
> >> into
> >>
> >>       unsigned int v = __atomic_add_fetch_4 ((unsigned int *) &val, 1,
> >> __ATOMIC_RELAXED)
> >> == 0)
> >>       v = (unsigned int)(v == 0);
> >>       __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1,
> >> __ATOMIC_RELAXED);
> > that's supposed to add 'v' instead of 1?  Possibly use uint32_t here
> > (aka uint32_type_node).
> >
> >> to get rid of an inserted control flow?
> > That for sure wouldn't require any changes to how the profile
> > instrumentation works,
> > so yes it would be simpler.
>
> Yes, this seems to work. After a bit of trial and error I ended up with
> something in gimple_gen_edge_profiler() like this (endian support is
> missing):
>
>    else if (flag_profile_update == PROFILE_UPDATE_SPLIT_ATOMIC)
>      {
>        tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
>        tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
>        gcall *stmt1 = gimple_build_call (f, 3, addr, one,
>                                         build_int_cst (integer_type_node,
>                                                       MEMMODEL_RELAXED));
>        tree low = create_tmp_var (uint32_type_node);
>        gimple_call_set_lhs (stmt1, low);
>        tree is_zero = create_tmp_var (boolean_type_node);
>        gassign *stmt2 = gimple_build_assign (is_zero, EQ_EXPR, low,
>                                             build_zero_cst (uint32_type_node));
>        tree high_inc = create_tmp_var (uint32_type_node);
>        gassign *stmt3 = gimple_build_assign (high_inc, COND_EXPR, is_zero,
>                                             build_one_cst (uint32_type_node),
>                                             build_zero_cst (uint32_type_node));
>        tree addr_high = create_tmp_var (TREE_TYPE (addr));
>        gassign *stmt4 = gimple_build_assign (addr_high, addr);
>        gassign *stmt5 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
>                                             addr_high,
>                                             build_int_cst (size_type_node, 4));
>        gcall *stmt6 = gimple_build_call (f, 3, addr_high, high_inc,
>                                         build_int_cst (integer_type_node,
>                                                        MEMMODEL_RELAXED));
>        gsi_insert_on_edge (e, stmt1);
>        gsi_insert_on_edge (e, stmt2);
>        gsi_insert_on_edge (e, stmt3);
>        gsi_insert_on_edge (e, stmt4);
>        gsi_insert_on_edge (e, stmt5);
>        gsi_insert_on_edge (e, stmt6);
>      }
>
> It can be probably simplified.

Likely.  I'd use the gimple_build () API from gimple-fold.h which
builds the expression(s) to a gimple_seq creating necessary temporaries
on-the-fly and then insert that sequence on the edge.

But even the above should work.

 The generated code:
>
>          .type   f, @function
> f:
>          lui     a4,%hi(__gcov0.f)
>          li      a3,1
>          addi    a4,a4,%lo(__gcov0.f)
>          amoadd.w a5,a3,0(a4)
>          lui     a4,%hi(__gcov0.f+4)
>          addi    a5,a5,1
>          seqz    a5,a5
>          addi    a4,a4,%lo(__gcov0.f+4)
>          amoadd.w zero,a5,0(a4)
>          li      a0,3
>          ret
>
> looks good for this code:
>
> int f(void)
> {
>    return 3;
> }
>
> The loading of the high address could be probably optimized from
>
>          lui     a4,%hi(__gcov0.f+4)
>          addi    a4,a4,%lo(__gcov0.f+4)
>
> to
>
>          addi    a4,a4,4
>
> I wasn't able to figure out how to do this.

I think that's something for the backend - we're not good
CSEing parts of an "invariant" address and the above might
be the form required when relocations are needed.

Richard.

>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-06 16:08             ` Richard Biener
@ 2022-12-07  8:51               ` Sebastian Huber
  2022-12-07  9:09                 ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-12-07  8:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 06.12.22 17:08, Richard Biener wrote:
> Likely.  I'd use the gimple_build () API from gimple-fold.h which
> builds the expression(s) to a gimple_seq creating necessary temporaries
> on-the-fly and then insert that sequence on the edge.

Thanks, I will have a look at this.

I am struggling to convert a uint32_type_node node to a gcov_type_node 
(64-bit). I tried to use this:

       if (result != NULL_TREE)
	{
           tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name);
	  gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR, 
build1 (VIEW_CONVERT_EXPR, gcov_type_node,
					   high));
           tree tmp2 = make_temp_ssa_name (gcov_type_node, NULL, name);
	  gassign *stmt8 = gimple_build_assign (tmp2, LSHIFT_EXPR, tmp1, 
build_int_cst (integer_type_node, 32));
	  gassign *stmt9 = gimple_build_assign (result, BIT_IOR_EXPR, tmp2, tmp1);
	  gsi_insert_after (gsi, stmt7, GSI_NEW_STMT);
	  gsi_insert_after (gsi, stmt8, GSI_NEW_STMT);
	  gsi_insert_after (gsi, stmt9, GSI_NEW_STMT);
	}

This ends up in:

../test.c: In function 'f':
../test.c:4:1: error: conversion of register to a different size in 
'view_convert_expr'
     4 | }
       | ^
VIEW_CONVERT_EXPR<long long int>(PROF_time_profiler_15);

PROF_time_profile_9 = VIEW_CONVERT_EXPR<long long 
int>(PROF_time_profiler_15);
during IPA pass: profile
../test.c:4:1: internal compiler error: verify_gimple failed
0xdddc95 verify_gimple_in_cfg(function*, bool, bool)
         /home/EB/sebastian_h/src/gcc/gcc/tree-cfg.cc:5647
0xc20221 execute_function_todo
         /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2091
0xc1efd6 do_per_function
         /home/EB/sebastian_h/src/gcc/gcc/passes.cc:1701
0xc20416 execute_todo
         /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2145
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-07  8:51               ` Sebastian Huber
@ 2022-12-07  9:09                 ` Richard Biener
  2022-12-07  9:24                   ` Sebastian Huber
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-12-07  9:09 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 06.12.22 17:08, Richard Biener wrote:
> > Likely.  I'd use the gimple_build () API from gimple-fold.h which
> > builds the expression(s) to a gimple_seq creating necessary temporaries
> > on-the-fly and then insert that sequence on the edge.
>
> Thanks, I will have a look at this.
>
> I am struggling to convert a uint32_type_node node to a gcov_type_node
> (64-bit). I tried to use this:
>
>        if (result != NULL_TREE)
>         {
>            tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name);
>           gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR,
> build1 (VIEW_CONVERT_EXPR, gcov_type_node,
>                                            high));

You want

  gimple_build_assign (result, NOP_EXPR, high);

here (a conversion, from unsigned it will zero-extend)


>            tree tmp2 = make_temp_ssa_name (gcov_type_node, NULL, name);
>           gassign *stmt8 = gimple_build_assign (tmp2, LSHIFT_EXPR, tmp1,
> build_int_cst (integer_type_node, 32));
>           gassign *stmt9 = gimple_build_assign (result, BIT_IOR_EXPR, tmp2, tmp1);
>           gsi_insert_after (gsi, stmt7, GSI_NEW_STMT);
>           gsi_insert_after (gsi, stmt8, GSI_NEW_STMT);
>           gsi_insert_after (gsi, stmt9, GSI_NEW_STMT);
>         }
>
> This ends up in:
>
> ../test.c: In function 'f':
> ../test.c:4:1: error: conversion of register to a different size in
> 'view_convert_expr'
>      4 | }
>        | ^
> VIEW_CONVERT_EXPR<long long int>(PROF_time_profiler_15);
>
> PROF_time_profile_9 = VIEW_CONVERT_EXPR<long long
> int>(PROF_time_profiler_15);
> during IPA pass: profile
> ../test.c:4:1: internal compiler error: verify_gimple failed
> 0xdddc95 verify_gimple_in_cfg(function*, bool, bool)
>          /home/EB/sebastian_h/src/gcc/gcc/tree-cfg.cc:5647
> 0xc20221 execute_function_todo
>          /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2091
> 0xc1efd6 do_per_function
>          /home/EB/sebastian_h/src/gcc/gcc/passes.cc:1701
> 0xc20416 execute_todo
>          /home/EB/sebastian_h/src/gcc/gcc/passes.cc:2145
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-07  9:09                 ` Richard Biener
@ 2022-12-07  9:24                   ` Sebastian Huber
  2022-12-07 11:49                     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Huber @ 2022-12-07  9:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development



On 07.12.22 10:09, Richard Biener wrote:
> On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber
> <sebastian.huber@embedded-brains.de>  wrote:
>> On 06.12.22 17:08, Richard Biener wrote:
>>> Likely.  I'd use the gimple_build () API from gimple-fold.h which
>>> builds the expression(s) to a gimple_seq creating necessary temporaries
>>> on-the-fly and then insert that sequence on the edge.
>> Thanks, I will have a look at this.
>>
>> I am struggling to convert a uint32_type_node node to a gcov_type_node
>> (64-bit). I tried to use this:
>>
>>         if (result != NULL_TREE)
>>          {
>>             tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name);
>>            gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR,
>> build1 (VIEW_CONVERT_EXPR, gcov_type_node,
>>                                             high));
> You want
> 
>    gimple_build_assign (result, NOP_EXPR, high);
> 
> here (a conversion, from unsigned it will zero-extend)

Thanks, with this NOP_EXPR it did work. I have now a proof of concept 
ready. Should I wait for the GCC 14 development cycle or can I post a 
patch set now?

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-11-04  8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber
  2022-11-04  9:53 ` Gabriel Paubert
  2022-11-05 11:18 ` Richard Biener
@ 2022-12-07  9:55 ` Sebastian Huber
  2 siblings, 0 replies; 17+ messages in thread
From: Sebastian Huber @ 2022-12-07  9:55 UTC (permalink / raw)
  To: GCC Development



On 04.11.22 09:27, Sebastian Huber wrote:
> Hello,
> 
> even recent 32-bit architectures such as RISC-V do not support 64-bit 
> atomic operations.  Using -fprofile-update=atomic for the 32-bit RISC-V 
> RV32GC ISA yields:
> 
> warning: target does not support atomic profile update, single mode is 
> selected
> 
> For multi-threaded applications it is quite important to use atomic 
> counter increments to get valid coverage data. I think this fall back is 
> not really good. Maybe we should consider using this approach from Jakub 
> Jelinek for 32-bit architectures lacking 64-bit atomic operations:
> 
>    if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) 
> == 0)
>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, 
> __ATOMIC_RELAXED);
> 
> https://patchwork.ozlabs.org/project/gcc/patch/19c4a81d-6ecd-8c6e-b641-e257c1959baf@suse.cz/#1447334
> 
> Last year I added the TARGET_GCOV_TYPE_SIZE target hook to optionally 
> reduce the gcov type size to 32 bits. I am not really sure if this was a 
> good idea. Longer running executables may observe counter overflows 
> leading to invalid coverage data. If someone wants atomic updates, then 
> the updates should be atomic even if this means to use a library 
> implementation (libatomic).
> 
> What about the following approach if -fprofile-update=atomic is given:
> 
> 1. Use 64-bit atomics if available.
> 
> 2. Use
> 
>    if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) 
> == 0)
>      __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, 
> __ATOMIC_RELAXED);
> 
> if 32-bit atomics are available.

This approach works fine for the edge counters in 
gimple_gen_edge_profiler() because we don't have to read the counter 
value. We just have to do an increment. In gimple_gen_time_profiler() we 
have to do this:

/* Emit: counters[0] = ++__gcov_time_profiler_counter.  */

So here we have to do an atomic increment and fetch the value. This 
doesn't work with the approach above. For example let thread A increment 
the lower part from 0xfffffffe to 0xffffffff, then let thread B 
increment the lower part from 0xffffffff to 0x0, then the higher part 
from 0x7 to  0x8, then let thread A read 0x8. Thread A would then get 
0x8_ffffffff instead of the correct 0x7_ffffffff.

> 
> 3. Else use a library call (libatomic).
> 

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: -fprofile-update=atomic vs. 32-bit architectures
  2022-12-07  9:24                   ` Sebastian Huber
@ 2022-12-07 11:49                     ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2022-12-07 11:49 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: GCC Development

On Wed, Dec 7, 2022 at 10:24 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
>
>
> On 07.12.22 10:09, Richard Biener wrote:
> > On Wed, Dec 7, 2022 at 9:51 AM Sebastian Huber
> > <sebastian.huber@embedded-brains.de>  wrote:
> >> On 06.12.22 17:08, Richard Biener wrote:
> >>> Likely.  I'd use the gimple_build () API from gimple-fold.h which
> >>> builds the expression(s) to a gimple_seq creating necessary temporaries
> >>> on-the-fly and then insert that sequence on the edge.
> >> Thanks, I will have a look at this.
> >>
> >> I am struggling to convert a uint32_type_node node to a gcov_type_node
> >> (64-bit). I tried to use this:
> >>
> >>         if (result != NULL_TREE)
> >>          {
> >>             tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, name);
> >>            gassign *stmt7 = gimple_build_assign (result, VIEW_CONVERT_EXPR,
> >> build1 (VIEW_CONVERT_EXPR, gcov_type_node,
> >>                                             high));
> > You want
> >
> >    gimple_build_assign (result, NOP_EXPR, high);
> >
> > here (a conversion, from unsigned it will zero-extend)
>
> Thanks, with this NOP_EXPR it did work. I have now a proof of concept
> ready. Should I wait for the GCC 14 development cycle or can I post a
> patch set now?

You can surely post a patch now, if it addresses a bug it could be even
considered.

Richard.

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-12-07 11:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04  8:27 -fprofile-update=atomic vs. 32-bit architectures Sebastian Huber
2022-11-04  9:53 ` Gabriel Paubert
2022-11-04 10:02   ` Sebastian Huber
2022-11-05 11:18 ` Richard Biener
2022-11-08  6:22   ` Sebastian Huber
2022-11-08 10:25     ` Richard Biener
2022-11-08 12:00       ` Sebastian Huber
2022-11-08 13:52         ` Richard Biener
2022-12-05  7:26       ` Sebastian Huber
2022-12-05  7:44         ` Richard Biener
2022-12-06 13:11           ` Sebastian Huber
2022-12-06 16:08             ` Richard Biener
2022-12-07  8:51               ` Sebastian Huber
2022-12-07  9:09                 ` Richard Biener
2022-12-07  9:24                   ` Sebastian Huber
2022-12-07 11:49                     ` Richard Biener
2022-12-07  9:55 ` Sebastian Huber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).