public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* use of fma
@ 2021-04-28  7:23 Paul Zimmermann
  2021-04-28  8:25 ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Zimmermann @ 2021-04-28  7:23 UTC (permalink / raw)
  To: libc-alpha

       Hi,

I noticed that on recent x86_64 with fma (fused-multiply-add) in hardware,
as claimed by /proc/cpuinfo, glibc does not use fma by default.

The reason seems to be that with the default flags (-O2 -g), gcc does not
define __FP_FAST_FMA:

tomate$ gcc -O2 -g -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
tomate$ echo $?
1

but it is defined with -march=native:

tomate$ gcc -O2 -g -march=native -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
tomate$ echo $?
0

However, on PowerPC __FP_FAST_FMA is defined without -march=native:

pzimmermann@drac-12:~$ gcc -O2 -g -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
pzimmermann@drac-12:~$ echo $?
0

Would it make sense to add -march=native to CFLAGS, or to add an option
like --enable-fma to configure?

Paul






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28  7:23 use of fma Paul Zimmermann
@ 2021-04-28  8:25 ` Florian Weimer
  2021-04-28  8:42   ` Paul Zimmermann
  2021-04-28 15:00   ` Tulio Magno Quites Machado Filho
  0 siblings, 2 replies; 9+ messages in thread
From: Florian Weimer @ 2021-04-28  8:25 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: libc-alpha

* Paul Zimmermann:

> However, on PowerPC __FP_FAST_FMA is defined without -march=native:
>
> pzimmermann@drac-12:~$ gcc -O2 -g -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
> pzimmermann@drac-12:~$ echo $?
> 0

I assume that this is powerpc64le, which has a POWER8 baseline.  The
switch to little endian neatly eliminated requirements for backwards
compatibility.

> Would it make sense to add -march=native to CFLAGS, or to add an option
> like --enable-fma to configure?

This is already covered by the existing mechanisms for compiler flag
changes.  The problem is that not all x86 CPUs currently in production
support FMA.  We aren't even at the stage yet where we could discuss
phasing out support for old CPUs.  So building everything to require FMA
by default would break things for users.

We already have some FMA-using function variants selected by IFUNC
resolvers.  Search for “ifunc-fma” in the source tree for examples.
More could be added if beneficial.

If you have FMA-capable hardware and want to build glibc to take
advantage of it unconditionally (without IFUNCs), use GCC 11 and
-march=x86-64-v3.  It should be compatible with all such CPUs, and the
build will also use additional CPU features not present in the x86-64
baseline specification.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28  8:25 ` Florian Weimer
@ 2021-04-28  8:42   ` Paul Zimmermann
  2021-04-28  9:06     ` Florian Weimer
  2021-04-28 17:29     ` Joseph Myers
  2021-04-28 15:00   ` Tulio Magno Quites Machado Filho
  1 sibling, 2 replies; 9+ messages in thread
From: Paul Zimmermann @ 2021-04-28  8:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

       Dear Florian,

> > However, on PowerPC __FP_FAST_FMA is defined without -march=native:
> >
> > pzimmermann@drac-12:~$ gcc -O2 -g -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
> > pzimmermann@drac-12:~$ echo $?
> > 0
> 
> I assume that this is powerpc64le, which has a POWER8 baseline.  The
> switch to little endian neatly eliminated requirements for backwards
> compatibility.

/proc/cpuinfo says POWER8NVL, altivec supported
and config.log: uname -r = 4.19.0-16-powerpc64le

> > Would it make sense to add -march=native to CFLAGS, or to add an option
> > like --enable-fma to configure?
> 
> This is already covered by the existing mechanisms for compiler flag
> changes.  The problem is that not all x86 CPUs currently in production
> support FMA.  We aren't even at the stage yet where we could discuss
> phasing out support for old CPUs.  So building everything to require FMA
> by default would break things for users.
> 
> We already have some FMA-using function variants selected by IFUNC
> resolvers.  Search for “ifunc-fma” in the source tree for examples.
> More could be added if beneficial.
> 
> If you have FMA-capable hardware and want to build glibc to take
> advantage of it unconditionally (without IFUNCs), use GCC 11 and
> -march=x86-64-v3.  It should be compatible with all such CPUs, and the
> build will also use additional CPU features not present in the x86-64
> baseline specification.

thanks Florian. However this is quite technical and not easily accessible
to the average user. I understand that with default configure, the binary
produced should run on any x86_64. What I suggest is a configure option
which would take advantage of all nice features available on the processor
where we compile, with no guarantee whatsoever that it runs on any other
processor. In the above case the -march=x86-64-v3 option would be added
by configure automatically.

Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28  8:42   ` Paul Zimmermann
@ 2021-04-28  9:06     ` Florian Weimer
  2021-04-28 17:29     ` Joseph Myers
  1 sibling, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2021-04-28  9:06 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: libc-alpha

* Paul Zimmermann:

>> This is already covered by the existing mechanisms for compiler flag
>> changes.  The problem is that not all x86 CPUs currently in production
>> support FMA.  We aren't even at the stage yet where we could discuss
>> phasing out support for old CPUs.  So building everything to require FMA
>> by default would break things for users.
>> 
>> We already have some FMA-using function variants selected by IFUNC
>> resolvers.  Search for “ifunc-fma” in the source tree for examples.
>> More could be added if beneficial.
>> 
>> If you have FMA-capable hardware and want to build glibc to take
>> advantage of it unconditionally (without IFUNCs), use GCC 11 and
>> -march=x86-64-v3.  It should be compatible with all such CPUs, and the
>> build will also use additional CPU features not present in the x86-64
>> baseline specification.
>
> thanks Florian. However this is quite technical and not easily accessible
> to the average user. I understand that with default configure, the binary
> produced should run on any x86_64. What I suggest is a configure option
> which would take advantage of all nice features available on the processor
> where we compile, with no guarantee whatsoever that it runs on any other
> processor. In the above case the -march=x86-64-v3 option would be added
> by configure automatically.

But there are inherent complexities here.  We have some statically
linked code in libc_nonshared.a that will be used even if an application
is run on top of a glibc that was built with different compiler settings
(e.g., -march=native on a different machine).  So building with
-march=native (or other -march= options) could have unexpected side
effects, and the impact needs to be considered carefully.  That's why
I'm hesitant to offer configure-level support for this.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28  8:25 ` Florian Weimer
  2021-04-28  8:42   ` Paul Zimmermann
@ 2021-04-28 15:00   ` Tulio Magno Quites Machado Filho
  2021-04-28 15:18     ` Paul Zimmermann
  1 sibling, 1 reply; 9+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2021-04-28 15:00 UTC (permalink / raw)
  To: Florian Weimer, Paul Zimmermann; +Cc: libc-alpha

Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> writes:

> * Paul Zimmermann:
>
>> However, on PowerPC __FP_FAST_FMA is defined without -march=native:
>>
>> pzimmermann@drac-12:~$ gcc -O2 -g -dM -E -xc /dev/null | grep -q __FP_FAST_FMA
>> pzimmermann@drac-12:~$ echo $?
>> 0
>
> I assume that this is powerpc64le, which has a POWER8 baseline.  The
> switch to little endian neatly eliminated requirements for backwards
> compatibility.

Side note: single and double-precision multiply-add have been available since
the PPC ISA and POWER1 ISA respectively.
So, they're also available by default on powerpc64 and powerpc builds
supporting hw fpu.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28 15:00   ` Tulio Magno Quites Machado Filho
@ 2021-04-28 15:18     ` Paul Zimmermann
  2021-04-28 15:45       ` Tulio Magno Quites Machado Filho
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Zimmermann @ 2021-04-28 15:18 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho; +Cc: fweimer, libc-alpha

> Side note: single and double-precision multiply-add have been available since
> the PPC ISA and POWER1 ISA respectively.
> So, they're also available by default on powerpc64 and powerpc builds
> supporting hw fpu.

thank you Tulio Magno. By the way, does anyone know of any hardware with
extended double and/or quadruple precision multiply-add?

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28 15:18     ` Paul Zimmermann
@ 2021-04-28 15:45       ` Tulio Magno Quites Machado Filho
  2021-04-28 17:46         ` Joseph Myers
  0 siblings, 1 reply; 9+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2021-04-28 15:45 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: fweimer, libc-alpha

Paul Zimmermann <Paul.Zimmermann@inria.fr> writes:

>> Side note: single and double-precision multiply-add have been available since
>> the PPC ISA and POWER1 ISA respectively.
>> So, they're also available by default on powerpc64 and powerpc builds
>> supporting hw fpu.
>
> thank you Tulio Magno. By the way, does anyone know of any hardware with
> extended double and/or quadruple precision multiply-add?

That's a good example of FMA that is not available by default on any of the
ABIs because it has been added to the POWER ISA 3.0 (e.g. POWER9).  :-D

You need a new enough compiler (GCC >= 7.4), but you can use gcc135 (POWER9)
from the GCC Compile Farm project:

[tuliom@gcc135 ~]$ /opt/at12.0/bin/gcc -O2 -mcpu=power9 -g -dM -E -xc /dev/null | grep __FP_FAST_FMA
#define __FP_FAST_FMAF128 1
#define __FP_FAST_FMAF 1
#define __FP_FAST_FMAF64x 1
#define __FP_FAST_FMAF32 1
#define __FP_FAST_FMAF64 1
#define __FP_FAST_FMAF32x 1
#define __FP_FAST_FMA 1

Notice I used -mcpu instead of -march.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28  8:42   ` Paul Zimmermann
  2021-04-28  9:06     ` Florian Weimer
@ 2021-04-28 17:29     ` Joseph Myers
  1 sibling, 0 replies; 9+ messages in thread
From: Joseph Myers @ 2021-04-28 17:29 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: Florian Weimer, libc-alpha

On Wed, 28 Apr 2021, Paul Zimmermann wrote:

> produced should run on any x86_64. What I suggest is a configure option
> which would take advantage of all nice features available on the processor
> where we compile, with no guarantee whatsoever that it runs on any other
> processor. In the above case the -march=x86-64-v3 option would be added

That option is CC="gcc -march=native" (you can pass CC on the configure 
command line, I don't see any need for a separate option for this; note 
that we'd like to phase out the existing --with-cpu option (which affects 
both sysdeps directories and compiler options) and instead purely rely on 
the setting of CC).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: use of fma
  2021-04-28 15:45       ` Tulio Magno Quites Machado Filho
@ 2021-04-28 17:46         ` Joseph Myers
  0 siblings, 0 replies; 9+ messages in thread
From: Joseph Myers @ 2021-04-28 17:46 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho; +Cc: Paul Zimmermann, fweimer, libc-alpha

On Wed, 28 Apr 2021, Tulio Magno Quites Machado Filho via Libc-alpha wrote:

> Paul Zimmermann <Paul.Zimmermann@inria.fr> writes:
> 
> >> Side note: single and double-precision multiply-add have been available since
> >> the PPC ISA and POWER1 ISA respectively.
> >> So, they're also available by default on powerpc64 and powerpc builds
> >> supporting hw fpu.
> >
> > thank you Tulio Magno. By the way, does anyone know of any hardware with
> > extended double and/or quadruple precision multiply-add?
> 
> That's a good example of FMA that is not available by default on any of the
> ABIs because it has been added to the POWER ISA 3.0 (e.g. POWER9).  :-D

And ia64 can do fma for the ldbl-96 format (with an optional narrowing of 
the infinite-precision result to binary32 or binary64 included in the 
operation).  RISC-V defines a 'Q' extension for binary128 arithmetic which 
includes fma, but I don't know of any implementations of that 'Q' 
extension (and it's not supported in GCC, for example).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-28 17:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28  7:23 use of fma Paul Zimmermann
2021-04-28  8:25 ` Florian Weimer
2021-04-28  8:42   ` Paul Zimmermann
2021-04-28  9:06     ` Florian Weimer
2021-04-28 17:29     ` Joseph Myers
2021-04-28 15:00   ` Tulio Magno Quites Machado Filho
2021-04-28 15:18     ` Paul Zimmermann
2021-04-28 15:45       ` Tulio Magno Quites Machado Filho
2021-04-28 17:46         ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).