Libatomic 16B

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Libatomic 16B
@ 2022-02-23 16:42 Satish Vasudeva
  2022-02-24 16:42 ` Satish Vasudeva
  2022-02-24 19:09 ` Xi Ruoyao
  0 siblings, 2 replies; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-23 16:42 UTC (permalink / raw)
  To: gcc-help

Hi Team,

I was looking at the hotspots in our software stack and interestingly I see
libat_load_16_i1 seems to be one of the top in the list.

I am trying to understand why that is the case. My suspicion is some kind
of lock usage for 16B atomic accesses.

I came across this discussion but frankly I am still confused.
https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html

Do you think the overhead of libat_load_16_i1 is due to spinlock usage?
Also reading some other Intel CPU docs, it seems like the CPU does support
loading 16B in single access. In that case can we optimize this for
performance?

Thanks and appreciate your help.

Satish

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-23 16:42 Libatomic 16B Satish Vasudeva
@ 2022-02-24 16:42 ` Satish Vasudeva
  2022-02-25 13:53   ` Florian Weimer
  2022-02-24 19:09 ` Xi Ruoyao
  1 sibling, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-24 16:42 UTC (permalink / raw)
  To: gcc-help

I looked into this further. Seems like libat_load_16_i1 is implementing the
load 16B as "*lock* *cmpxchg16b* (%*rdi*)"
This is assuming that the CPU doesn't support 16B loads in a single
transaction. How can I compile libatomics to use intrinsics for load 16B
instead of LOCK cmpxchg?

Appreciate your response.

Satish

On Wed, Feb 23, 2022 at 8:42 AM Satish Vasudeva <
satish.vasudeva@cohesity.com> wrote:

> Hi Team,
>
> I was looking at the hotspots in our software stack and interestingly I
> see libat_load_16_i1 seems to be one of the top in the list.
>
> I am trying to understand why that is the case. My suspicion is some kind
> of lock usage for 16B atomic accesses.
>
> I came across this discussion but frankly I am still confused.
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html
>
> Do you think the overhead of libat_load_16_i1 is due to spinlock usage?
> Also reading some other Intel CPU docs, it seems like the CPU does support
> loading 16B in single access. In that case can we optimize this for
> performance?
>
> Thanks and appreciate your help.
>
> Satish
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-23 16:42 Libatomic 16B Satish Vasudeva
  2022-02-24 16:42 ` Satish Vasudeva
@ 2022-02-24 19:09 ` Xi Ruoyao
  2022-02-24 19:35   ` Satish Vasudeva
  1 sibling, 1 reply; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-24 19:09 UTC (permalink / raw)
  To: Satish Vasudeva, gcc-help

On Wed, 2022-02-23 at 08:42 -0800, Satish Vasudeva via Gcc-help wrote:
> Hi Team,
> 
> I was looking at the hotspots in our software stack and interestingly I see
> libat_load_16_i1 seems to be one of the top in the list.
> 
> I am trying to understand why that is the case. My suspicion is some kind
> of lock usage for 16B atomic accesses.
> 
> I came across this discussion but frankly I am still confused.
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html
> 
> Do you think the overhead of libat_load_16_i1 is due to spinlock usage?
> Also reading some other Intel CPU docs, it seems like the CPU does support
> loading 16B in single access. In that case can we optimize this for
> performance?

Open a issue at https://gcc.gnu.org/bugzilla, with the reference to the
Intel CPU doc prove that some specific models supports loading 128-bit.

Don't use "it seems like", nobody wants to write some nasty SSE code and
then find it doesn't work on any CPU.
-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 19:09 ` Xi Ruoyao
@ 2022-02-24 19:35   ` Satish Vasudeva
  2022-02-24 20:05     ` Xi Ruoyao
  0 siblings, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-24 19:35 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: gcc-help

Thanks for the response.

Looking further into libatomic library code, I do see 16B move instructions
have been used for atomic_exchange code like below. Just wondering why it
is not generating a intrinsic __atomic_load_16 using this instruction.

*movdq**a* 0x0(%*rbp*),%*xmm0*





On Thu, Feb 24, 2022 at 11:09 AM Xi Ruoyao <xry111@mengyan1223.wang> wrote:

> On Wed, 2022-02-23 at 08:42 -0800, Satish Vasudeva via Gcc-help wrote:
> > Hi Team,
> >
> > I was looking at the hotspots in our software stack and interestingly I
> see
> > libat_load_16_i1 seems to be one of the top in the list.
> >
> > I am trying to understand why that is the case. My suspicion is some kind
> > of lock usage for 16B atomic accesses.
> >
> > I came across this discussion but frankly I am still confused.
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html
> >
> > Do you think the overhead of libat_load_16_i1 is due to spinlock usage?
> > Also reading some other Intel CPU docs, it seems like the CPU does
> support
> > loading 16B in single access. In that case can we optimize this for
> > performance?
>
> Open a issue at https://gcc.gnu.org/bugzilla, with the reference to the
> Intel CPU doc prove that some specific models supports loading 128-bit.
>
> Don't use "it seems like", nobody wants to write some nasty SSE code and
> then find it doesn't work on any CPU.
> --
> Xi Ruoyao <xry111@mengyan1223.wang>
> School of Aerospace Science and Technology, Xidian University
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 19:35   ` Satish Vasudeva
@ 2022-02-24 20:05     ` Xi Ruoyao
  2022-02-24 20:13       ` Segher Boessenkool
  0 siblings, 1 reply; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-24 20:05 UTC (permalink / raw)
  To: Satish Vasudeva; +Cc: gcc-help

On Thu, 2022-02-24 at 11:35 -0800, Satish Vasudeva wrote:
> Thanks for the response.
> 
> Looking further into libatomic library code, I do see 16B move
> instructions have been used for atomic_exchange code like below. Just
> wondering why it is not generating a intrinsic __atomic_load_16 using
> this instruction.
> 
> movdqa0x0(%rbp),%xmm0

Because both Intel and AMD have not claimed "this is atomic".   In
__atomic_exchange movdqa is used as a normal data move instruction
(actually, GCC optimized memcpy calls in libatomic code to this).
-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 20:05     ` Xi Ruoyao
@ 2022-02-24 20:13       ` Segher Boessenkool
  2022-02-24 20:38         ` Satish Vasudeva
  0 siblings, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2022-02-24 20:13 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Satish Vasudeva, gcc-help

On Fri, Feb 25, 2022 at 04:05:28AM +0800, Xi Ruoyao via Gcc-help wrote:
> On Thu, 2022-02-24 at 11:35 -0800, Satish Vasudeva wrote:
> > Thanks for the response.
> > 
> > Looking further into libatomic library code, I do see 16B move
> > instructions have been used for atomic_exchange code like below. Just
> > wondering why it is not generating a intrinsic __atomic_load_16 using
> > this instruction.
> > 
> > movdqa0x0(%rbp),%xmm0
> 
> Because both Intel and AMD have not claimed "this is atomic".   In
> __atomic_exchange movdqa is used as a normal data move instruction
> (actually, GCC optimized memcpy calls in libatomic code to this).

Yup.  Even on cores where this is atomic internally it is not atomic
when used on a system with a 64-bit (or 72-bit) memory bus.


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 20:13       ` Segher Boessenkool
@ 2022-02-24 20:38         ` Satish Vasudeva
  2022-02-25  8:35           ` Stefan Ring
  0 siblings, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-24 20:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Xi Ruoyao, gcc-help

Thanks for the comments.

Please let into this intel architecture manual , section 8.1.1

https://cdrdv2.intel.com/v1/dl/getContent/671190

I think Intel claims 16B operations are atomic , unless I am missing
something.

On Thu, Feb 24, 2022 at 12:16 PM Segher Boessenkool <
segher@kernel.crashing.org> wrote:

> On Fri, Feb 25, 2022 at 04:05:28AM +0800, Xi Ruoyao via Gcc-help wrote:
> > On Thu, 2022-02-24 at 11:35 -0800, Satish Vasudeva wrote:
> > > Thanks for the response.
> > >
> > > Looking further into libatomic library code, I do see 16B move
> > > instructions have been used for atomic_exchange code like below. Just
> > > wondering why it is not generating a intrinsic __atomic_load_16 using
> > > this instruction.
> > >
> > > movdqa0x0(%rbp),%xmm0
> >
> > Because both Intel and AMD have not claimed "this is atomic".   In
> > __atomic_exchange movdqa is used as a normal data move instruction
> > (actually, GCC optimized memcpy calls in libatomic code to this).
>
> Yup.  Even on cores where this is atomic internally it is not atomic
> when used on a system with a 64-bit (or 72-bit) memory bus.
>
>
> Segher
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 20:38         ` Satish Vasudeva
@ 2022-02-25  8:35           ` Stefan Ring
  2022-02-25  8:48             ` Xi Ruoyao
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Ring @ 2022-02-25  8:35 UTC (permalink / raw)
  To: gcc-help

On Thu, Feb 24, 2022 at 9:39 PM Satish Vasudeva via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> Please let into this intel architecture manual , section 8.1.1
>
> https://cdrdv2.intel.com/v1/dl/getContent/671190
>
> I think Intel claims 16B operations are atomic , unless I am missing
> something.

Interesting. This seems to be a somewhat recent addition, and the
mailing list discussion linked to above predates it. Coincidentally, I
pulled a copy of the Intel manuals at almost exactly the same time as
this discussion, and sure enough, it does not yet contain the
paragraph about 16 byte operations.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25  8:35           ` Stefan Ring
@ 2022-02-25  8:48             ` Xi Ruoyao
  2022-02-25 14:01               ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-25  8:48 UTC (permalink / raw)
  To: Stefan Ring, gcc-help

On Fri, 2022-02-25 at 09:35 +0100, Stefan Ring via Gcc-help wrote:
> On Thu, Feb 24, 2022 at 9:39 PM Satish Vasudeva via Gcc-help
> <gcc-help@gcc.gnu.org> wrote:
> > 
> > Please let into this intel architecture manual , section 8.1.1
> > 
> > https://cdrdv2.intel.com/v1/dl/getContent/671190
> > 
> > I think Intel claims 16B operations are atomic , unless I am missing
> > something.
> 
> Interesting. This seems to be a somewhat recent addition, and the
> mailing list discussion linked to above predates it. Coincidentally, I
> pulled a copy of the Intel manuals at almost exactly the same time as
> this discussion, and sure enough, it does not yet contain the
> paragraph about 16 byte operations.

It seems an addition in Dec 2021 revision:
https://cdrdv2.intel.com/v1/dl/getContent/671294

Create an issue in bugzilla then?
-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-24 16:42 ` Satish Vasudeva
@ 2022-02-25 13:53   ` Florian Weimer
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2022-02-25 13:53 UTC (permalink / raw)
  To: Satish Vasudeva via Gcc-help; +Cc: Satish Vasudeva

* Satish Vasudeva via Gcc-help:

> I looked into this further. Seems like libat_load_16_i1 is implementing the
> load 16B as "*lock* *cmpxchg16b* (%*rdi*)"
> This is assuming that the CPU doesn't support 16B loads in a single
> transaction. How can I compile libatomics to use intrinsics for load 16B
> instead of LOCK cmpxchg?

As far as I know, it's the only reliable way to implement a 16B load on
x86-64.  The Intel SDM explicitly says this:

| An x87 instruction or an SSE instructions that accesses data larger
| than a quadword may be implemented using multiple memory accesses.

(Section 8.1.1 in Volume 3A in my copy.)

I wish we had a plain 128-bit atomic load instruction, but we don't.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25  8:48             ` Xi Ruoyao
@ 2022-02-25 14:01               ` Florian Weimer
  2022-02-25 14:10                 ` Alexander Monakov
  2022-02-25 14:25                 ` Xi Ruoyao
  0 siblings, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2022-02-25 14:01 UTC (permalink / raw)
  To: Xi Ruoyao via Gcc-help

* Xi Ruoyao via Gcc-help:

> On Fri, 2022-02-25 at 09:35 +0100, Stefan Ring via Gcc-help wrote:
>> On Thu, Feb 24, 2022 at 9:39 PM Satish Vasudeva via Gcc-help
>> <gcc-help@gcc.gnu.org> wrote:
>> > 
>> > Please let into this intel architecture manual , section 8.1.1
>> > 
>> > https://cdrdv2.intel.com/v1/dl/getContent/671190
>> > 
>> > I think Intel claims 16B operations are atomic , unless I am missing
>> > something.
>> 
>> Interesting. This seems to be a somewhat recent addition, and the
>> mailing list discussion linked to above predates it. Coincidentally, I
>> pulled a copy of the Intel manuals at almost exactly the same time as
>> this discussion, and sure enough, it does not yet contain the
>> paragraph about 16 byte operations.
>
> It seems an addition in Dec 2021 revision:
> https://cdrdv2.intel.com/v1/dl/getContent/671294
>
> Create an issue in bugzilla then?

Yes please.  I should have read the whole thread first. 8-)

The AMD manual doesn't say this yet, so any optimization needs to be
restricted to Intel CPUs for now.  I'll reach out to AMD to get
clarification.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 14:01               ` Florian Weimer
@ 2022-02-25 14:10                 ` Alexander Monakov
  2022-02-25 14:16                   ` Xi Ruoyao
  2022-02-25 14:25                 ` Xi Ruoyao
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Monakov @ 2022-02-25 14:10 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Xi Ruoyao via Gcc-help

On Fri, 25 Feb 2022, Florian Weimer via Gcc-help wrote:

> * Xi Ruoyao via Gcc-help:
> 
> > On Fri, 2022-02-25 at 09:35 +0100, Stefan Ring via Gcc-help wrote:
> >> On Thu, Feb 24, 2022 at 9:39 PM Satish Vasudeva via Gcc-help
> >> <gcc-help@gcc.gnu.org> wrote:
> >> > 
> >> > Please let into this intel architecture manual , section 8.1.1
> >> > 
> >> > https://cdrdv2.intel.com/v1/dl/getContent/671190
> >> > 
> >> > I think Intel claims 16B operations are atomic , unless I am missing
> >> > something.
> >> 
> >> Interesting. This seems to be a somewhat recent addition, and the
> >> mailing list discussion linked to above predates it. Coincidentally, I
> >> pulled a copy of the Intel manuals at almost exactly the same time as
> >> this discussion, and sure enough, it does not yet contain the
> >> paragraph about 16 byte operations.
> >
> > It seems an addition in Dec 2021 revision:
> > https://cdrdv2.intel.com/v1/dl/getContent/671294
> >
> > Create an issue in bugzilla then?
> 
> Yes please.  I should have read the whole thread first. 8-)
> 
> The AMD manual doesn't say this yet, so any optimization needs to be
> restricted to Intel CPUs for now.  I'll reach out to AMD to get
> clarification.

This StackOverflow question has evidence that both Intel (Core Duo) and
AMD (Opteron 2435) can tear 128-bit loads. So neither manufacturer can
give a retroactive guarantee.

https://stackoverflow.com/questions/7646018/sse-instructions-which-cpus-can-do-atomic-16b-memory-operations

Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 14:10                 ` Alexander Monakov
@ 2022-02-25 14:16                   ` Xi Ruoyao
  0 siblings, 0 replies; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-25 14:16 UTC (permalink / raw)
  To: Alexander Monakov, Florian Weimer; +Cc: Xi Ruoyao via Gcc-help

On Fri, 2022-02-25 at 17:10 +0300, Alexander Monakov via Gcc-help wrote:

> > > https://cdrdv2.intel.com/v1/dl/getContent/671294

TL;DR: Intel says on their CPUs with AVX, 128-bit loads (with movdqa)
are atomic, see page 393 of this doc.  And this is updated in Dec 2021,
so you may need to re-download the Intel SDM to get a latest copy.

> > > Create an issue in bugzilla then?
> > 
> > Yes please.  I should have read the whole thread first. 8-)
> > 
> > The AMD manual doesn't say this yet, so any optimization needs to be
> > restricted to Intel CPUs for now.  I'll reach out to AMD to get
> > clarification.
> 
> This StackOverflow question has evidence that both Intel (Core Duo)
> and
> AMD (Opteron 2435) can tear 128-bit loads.

Core Duo does not have AVX, and AMD has not make any guarantee for the
atomicity of 128-bit load.  So we can't use movdqa for 128-bit atomics
on those old Intel and (old or new) AMD models.

> So neither manufacturer can
> give a retroactive guarantee.



> 
> https://stackoverflow.com/questions/7646018/sse-instructions-which-cpus-can-do-atomic-16b-memory-operations
> 
> Alexander

-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 14:01               ` Florian Weimer
  2022-02-25 14:10                 ` Alexander Monakov
@ 2022-02-25 14:25                 ` Xi Ruoyao
  2022-02-25 17:05                   ` Satish Vasudeva
  1 sibling, 1 reply; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-25 14:25 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Stefan Ring, Satish Vasudeva, Xi Ruoyao via Gcc-help

On Fri, 2022-02-25 at 15:01 +0100, Florian Weimer wrote:

> > It seems an addition in Dec 2021 revision:
> > https://cdrdv2.intel.com/v1/dl/getContent/671294
> > 
> > Create an issue in bugzilla then?
> 
> Yes please.  I should have read the whole thread first. 8-)

Opened as https://gcc.gnu.org/PR104688
-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 14:25                 ` Xi Ruoyao
@ 2022-02-25 17:05                   ` Satish Vasudeva
  2022-02-25 17:16                     ` Xi Ruoyao
  0 siblings, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-25 17:05 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Florian Weimer, Stefan Ring, Xi Ruoyao via Gcc-help

Thanks for a quick action on this.

I see that a patch has been posted.

I am new to this, can you please clarify what is the build option for new
and older Intel CPUs?

Satish

On Fri, Feb 25, 2022 at 6:25 AM Xi Ruoyao <xry111@mengyan1223.wang> wrote:

> On Fri, 2022-02-25 at 15:01 +0100, Florian Weimer wrote:
>
> > > It seems an addition in Dec 2021 revision:
> > > https://cdrdv2.intel.com/v1/dl/getContent/671294
> > >
> > > Create an issue in bugzilla then?
> >
> > Yes please.  I should have read the whole thread first. 8-)
>
> Opened as https://gcc.gnu.org/PR104688
> --
> Xi Ruoyao <xry111@mengyan1223.wang>
> School of Aerospace Science and Technology, Xidian University
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 17:05                   ` Satish Vasudeva
@ 2022-02-25 17:16                     ` Xi Ruoyao
  2022-02-25 17:25                       ` Satish Vasudeva
  0 siblings, 1 reply; 19+ messages in thread
From: Xi Ruoyao @ 2022-02-25 17:16 UTC (permalink / raw)
  To: Satish Vasudeva; +Cc: Florian Weimer, Stefan Ring, Xi Ruoyao via Gcc-help

On Fri, 2022-02-25 at 09:05 -0800, Satish Vasudeva wrote:
> Thanks for a quick action on this.
> 
> I see that a patch has been posted. 
> 
> I am new to this, can you please clarify what is the build option for
> new and older Intel CPUs?

You don't need to add any build option if you'll use the posted patch. 
The patch uses ifunc (https://sourceware.org/glibc/wiki/GNU_IFUNC)
feature.  It means libatomic will automatically select a best variant of
16B atomic load applicable for the CPU when it's loaded at runtime.

> > Opened as https://gcc.gnu.org/PR104688
-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 17:16                     ` Xi Ruoyao
@ 2022-02-25 17:25                       ` Satish Vasudeva
  2022-03-02  0:16                         ` Satish Vasudeva
  0 siblings, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-02-25 17:25 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Florian Weimer, Stefan Ring, Xi Ruoyao via Gcc-help

That's a great answer. Thank you

Have a nice weekend.

On Fri, Feb 25, 2022 at 9:16 AM Xi Ruoyao <xry111@mengyan1223.wang> wrote:

> On Fri, 2022-02-25 at 09:05 -0800, Satish Vasudeva wrote:
> > Thanks for a quick action on this.
> >
> > I see that a patch has been posted.
> >
> > I am new to this, can you please clarify what is the build option for
> > new and older Intel CPUs?
>
> You don't need to add any build option if you'll use the posted patch.
> The patch uses ifunc (https://sourceware.org/glibc/wiki/GNU_IFUNC)
> feature.  It means libatomic will automatically select a best variant of
> 16B atomic load applicable for the CPU when it's loaded at runtime.
>
> > > Opened as https://gcc.gnu.org/PR104688
> --
> Xi Ruoyao <xry111@mengyan1223.wang>
> School of Aerospace Science and Technology, Xidian University
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-02-25 17:25                       ` Satish Vasudeva
@ 2022-03-02  0:16                         ` Satish Vasudeva
  2022-03-02  5:55                           ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Satish Vasudeva @ 2022-03-02  0:16 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Florian Weimer, Stefan Ring, Xi Ruoyao via Gcc-help

Hi,

Just a quick clarification.

Looking back at the description in
https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html
It sounds like CAS based implementation is a problem for volatile atomic
loads. Can any one please elaborate  what is the issue with volatile atomic
loads. I am trying to do risk analysis in our code.

Thanks
Satish


On Fri, Feb 25, 2022 at 9:25 AM Satish Vasudeva <
satish.vasudeva@cohesity.com> wrote:

> That's a great answer. Thank you
>
> Have a nice weekend.
>
> On Fri, Feb 25, 2022 at 9:16 AM Xi Ruoyao <xry111@mengyan1223.wang> wrote:
>
>> On Fri, 2022-02-25 at 09:05 -0800, Satish Vasudeva wrote:
>> > Thanks for a quick action on this.
>> >
>> > I see that a patch has been posted.
>> >
>> > I am new to this, can you please clarify what is the build option for
>> > new and older Intel CPUs?
>>
>> You don't need to add any build option if you'll use the posted patch.
>> The patch uses ifunc (https://sourceware.org/glibc/wiki/GNU_IFUNC)
>> feature.  It means libatomic will automatically select a best variant of
>> 16B atomic load applicable for the CPU when it's loaded at runtime.
>>
>> > > Opened as https://gcc.gnu.org/PR104688
>> --
>> Xi Ruoyao <xry111@mengyan1223.wang>
>> School of Aerospace Science and Technology, Xidian University
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Libatomic 16B
  2022-03-02  0:16                         ` Satish Vasudeva
@ 2022-03-02  5:55                           ` Florian Weimer
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2022-03-02  5:55 UTC (permalink / raw)
  To: Satish Vasudeva; +Cc: Xi Ruoyao, Stefan Ring, Xi Ruoyao via Gcc-help

* Satish Vasudeva:

> Looking back at the description in
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html It
> sounds like CAS based implementation is a problem for volatile atomic
> loads.  Can any one please elaborate what is the issue with volatile
> atomic loads. I am trying to do risk analysis in our code.

The page could be mapped read-only (say if it's in memory shared across
processes).  Reading such values using CAS will fault, so CAS is not a
full replacement.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-03-02  5:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 16:42 Libatomic 16B Satish Vasudeva
2022-02-24 16:42 ` Satish Vasudeva
2022-02-25 13:53   ` Florian Weimer
2022-02-24 19:09 ` Xi Ruoyao
2022-02-24 19:35   ` Satish Vasudeva
2022-02-24 20:05     ` Xi Ruoyao
2022-02-24 20:13       ` Segher Boessenkool
2022-02-24 20:38         ` Satish Vasudeva
2022-02-25  8:35           ` Stefan Ring
2022-02-25  8:48             ` Xi Ruoyao
2022-02-25 14:01               ` Florian Weimer
2022-02-25 14:10                 ` Alexander Monakov
2022-02-25 14:16                   ` Xi Ruoyao
2022-02-25 14:25                 ` Xi Ruoyao
2022-02-25 17:05                   ` Satish Vasudeva
2022-02-25 17:16                     ` Xi Ruoyao
2022-02-25 17:25                       ` Satish Vasudeva
2022-03-02  0:16                         ` Satish Vasudeva
2022-03-02  5:55                           ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).