public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon
@ 2021-10-15 16:15 pc at us dot ibm.com
  2021-10-15 16:37 ` [Bug target/102783] " pinskia at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: pc at us dot ibm.com @ 2021-10-15 16:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

            Bug ID: 102783
           Summary: [powerpc] FPSCR manipulations cannot be relied upon
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pc at us dot ibm.com
  Target Milestone: ---

On all Power targets which support hardware floating-point, there are a few
manipulations of the Floating-Point Status and Control Register (FPSCR) that
have side-effects for subsequent floating-point computation. For example,
changing the floating-point rounding mode, or changing whether floating-point
exceptions are enabled.

There are many ways to effect those manipulations:
- The set of fenv(1) calls
- A handful of builtins:
  __builtin_fpscr_set_rn
  __builtin_mtfsf
  __builtin_mtfsb{0,1}
- Inline asm using the appropriate instructions (mffsce, mffscdrn{i},
mffscrn{i}, mtfsf{i}, mtfsb{0,1})

The problem is that if any of the above methods are not effected in an
out-of-line function, there is no way at present to restrict instruction
scheduling such that nearby floating-point computations are prevented from
moving before or after the FPSCR changes. (Possibly resulting in computation
using a wrong rounding mode, or unexpected FP exceptions.)

With asm statements, one could add artificial read and write dependencies to
the  input or output (if any) of the FPSCR manipulations and
previous/subsequent FP computations, but this is not always practicable.
(Current glibc is an example.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
@ 2021-10-15 16:37 ` pinskia at gcc dot gnu.org
  2021-10-15 19:27 ` pthaugen at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-15 16:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There is a few other bugs which very similar to this one. Gcc not implementing
a pragma is one of them.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
  2021-10-15 16:37 ` [Bug target/102783] " pinskia at gcc dot gnu.org
@ 2021-10-15 19:27 ` pthaugen at gcc dot gnu.org
  2021-10-15 19:31 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pthaugen at gcc dot gnu.org @ 2021-10-15 19:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

pthaugen at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pthaugen at gcc dot gnu.org

--- Comment #2 from pthaugen at gcc dot gnu.org ---
I’ll note that an inline asm stmt appears to be a barrier for the scheduler,
but apparently not for other parts of the compiler. For example on the
following code:

double d;
void foo(double *dp, double c)
{
  double e;

  e = c + d;
  asm volatile ("");
  *dp = e + d;
  return;
} 

The scheduling dumps show that the asm volatile has dependencies on all insns
before and after it. But that doesn’t really help because the first addition
stmt gets moved past the asm volatile at expand time.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
  2021-10-15 16:37 ` [Bug target/102783] " pinskia at gcc dot gnu.org
  2021-10-15 19:27 ` pthaugen at gcc dot gnu.org
@ 2021-10-15 19:31 ` segher at gcc dot gnu.org
  2021-10-15 19:53 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-10-15 19:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-10-15
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Confirmed.

This is about the control part.  The status part has similar issues as well
but needs opposite ordering; we do not have any ordering right now, that is
the problem.

We have the same issues for vectors with the VSCR.  That one has only one
status bit: SAT, for saturation, and we set that explicitly in all insns that
do set it.  All of those are unusual, done via builtins, etc.  We model a VSCR
register just for this.  It also has one control bit: NJ, "non-java", it
disables strict IEEE arithmetic, which was useful for improved performance on
old cores.  We do not actually order setting that relative to insns that use
that control bit, but I have never actually seen anything set that bit, so the
issue does not practically exist there.

But for FP we need to order setting the control bits relative to any FP
computational insn, and reading the status bits as well.  There currently is
no way in GCC to say this.  It might be best to have an hook to say what
control bits there are, what insns care about which control bits, and what
insns set each of those bits.  And similar for status bits.

Does this sound generic enough, does it serve the needs of all targets?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (2 preceding siblings ...)
  2021-10-15 19:31 ` segher at gcc dot gnu.org
@ 2021-10-15 19:53 ` pinskia at gcc dot gnu.org
  2021-10-18  6:27 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-15 19:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=20785,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=34678

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
PR 20785 and bug 34678 come to mind for the generic issue on the gimple and rtl
levels.  There are many other linked bugs on those two too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (3 preceding siblings ...)
  2021-10-15 19:53 ` pinskia at gcc dot gnu.org
@ 2021-10-18  6:27 ` rguenth at gcc dot gnu.org
  2021-10-18 21:52 ` joseph at codesourcery dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-18  6:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Even out-of-line does not help if there are visible CSE/association
opportunities across such call.  A workaround is to make the out-of-line
function __attribute__((returns_twice)) which should insert artificial control
flow
preventing such transforms.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (4 preceding siblings ...)
  2021-10-18  6:27 ` rguenth at gcc dot gnu.org
@ 2021-10-18 21:52 ` joseph at codesourcery dot com
  2021-10-19 10:28 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: joseph at codesourcery dot com @ 2021-10-18 21:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #6 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
Generically (and if the command-line options are such that floating-point 
control / status bits are to be respected by optimizations), *any* 
function call might access or modify floating-point control and status 
bits, subject to e.g. const functions not being able to access them, pure 
functions not being able to modify them, functions whose body is known 
having properties based on analysis of that body, built-in functions 
having semantics based on what the compiler knows about those functions.  
And then a subset of asms may similarly access or modify them (based on 
inputs / outputs / clobbers, but maybe on some architectures existing 
practice doesn't provide a register name that inputs / outputs / clobbers 
can use to refer to floating-point state).

Then you'd need something like Marc Glisse's -ffenv-access patches (August 
2020) to represent the other side of things, how floating-point operations 
also access / modify such bits.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (5 preceding siblings ...)
  2021-10-18 21:52 ` joseph at codesourcery dot com
@ 2021-10-19 10:28 ` segher at gcc dot gnu.org
  2021-10-19 10:41 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-10-19 10:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> Even out-of-line does not help if there are visible CSE/association
> opportunities across such call.

Yeah, good point.

> A workaround is to make the out-of-line
> function __attribute__((returns_twice)) which should insert artificial
> control flow
> preventing such transforms.

Is there anything that guarantees that to work (other than our actual
current implementation)?  It is much more stringent / expensive than we
would want, but if it is the best we can do...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (6 preceding siblings ...)
  2021-10-19 10:28 ` segher at gcc dot gnu.org
@ 2021-10-19 10:41 ` segher at gcc dot gnu.org
  2021-10-19 15:30 ` joseph at codesourcery dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-10-19 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to joseph@codesourcery.com from comment #6)
> Generically (and if the command-line options are such that floating-point 
> control / status bits are to be respected by optimizations), *any* 
> function call might access or modify floating-point control and status 
> bits, subject to e.g. const functions not being able to access them, pure 
> functions not being able to modify them, functions whose body is known 
> having properties based on analysis of that body, built-in functions 
> having semantics based on what the compiler knows about those functions.  

If FENV_ACCESS is OFF most of those things can be ignored as well.  But
FENV_ACCESS is much too blunt a hammer for most of our uses.

> And then a subset of asms may similarly access or modify them (based on 
> inputs / outputs / clobbers, but maybe on some architectures existing 
> practice doesn't provide a register name that inputs / outputs / clobbers 
> can use to refer to floating-point state).

Like PowerPC.  But we *do* model vscr (vector status and control register).
It won't be hard to add fpscr.

> Then you'd need something like Marc Glisse's -ffenv-access patches (August 
> 2020) to represent the other side of things, how floating-point operations 
> also access / modify such bits.

Yeah, we need something for normal computational FP insns to clobber (on
PowerPC load/store insns never change the fpscr / fenv, but I bet that is
different on other archs).

Thanks for the pointer, I'll find Marc's work.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (7 preceding siblings ...)
  2021-10-19 10:41 ` segher at gcc dot gnu.org
@ 2021-10-19 15:30 ` joseph at codesourcery dot com
  2021-10-28 14:07 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: joseph at codesourcery dot com @ 2021-10-19 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #9 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Tue, 19 Oct 2021, segher at gcc dot gnu.org via Gcc-bugs wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783
> 
> --- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> (In reply to joseph@codesourcery.com from comment #6)
> > Generically (and if the command-line options are such that floating-point 
> > control / status bits are to be respected by optimizations), *any* 
> > function call might access or modify floating-point control and status 
> > bits, subject to e.g. const functions not being able to access them, pure 
> > functions not being able to modify them, functions whose body is known 
> > having properties based on analysis of that body, built-in functions 
> > having semantics based on what the compiler knows about those functions.  
> 
> If FENV_ACCESS is OFF most of those things can be ignored as well.  But
> FENV_ACCESS is much too blunt a hammer for most of our uses.

My recent discussions with Roger Sayle 
<https://gcc.gnu.org/pipermail/gcc-patches/2021-September/thread.html#580252>, 
and bug 54192 as referenced therein, may be helpful for more details of 
how FENV_ACCESS could be split up.  (At present we have -ftrapping-math, 
on by default, and -frounding-math, off by default.  I suspect that if 
-ftrapping-math really restricted optimizations enough to avoid all 
problematic code reordering / removal in the presence of function calls 
possibly reading and writing exception flags, it would actually inhibit 
optimization more than a full implementation of -frounding-math would: a 
full -frounding-math only means that arithmetic *reads* the rounding mode, 
whereas a full -ftrapping-math means that arithmetic *writes* to the 
exception flags.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (8 preceding siblings ...)
  2021-10-19 15:30 ` joseph at codesourcery dot com
@ 2021-10-28 14:07 ` rguenth at gcc dot gnu.org
  2022-08-26 12:11 ` glisse at gcc dot gnu.org
  2023-01-07 21:14 ` glisse at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-28 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to joseph@codesourcery.com from comment #9)
> On Tue, 19 Oct 2021, segher at gcc dot gnu.org via Gcc-bugs wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783
> > 
> > --- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> > (In reply to joseph@codesourcery.com from comment #6)
> > > Generically (and if the command-line options are such that floating-point 
> > > control / status bits are to be respected by optimizations), *any* 
> > > function call might access or modify floating-point control and status 
> > > bits, subject to e.g. const functions not being able to access them, pure 
> > > functions not being able to modify them, functions whose body is known 
> > > having properties based on analysis of that body, built-in functions 
> > > having semantics based on what the compiler knows about those functions.  
> > 
> > If FENV_ACCESS is OFF most of those things can be ignored as well.  But
> > FENV_ACCESS is much too blunt a hammer for most of our uses.
> 
> My recent discussions with Roger Sayle 
> <https://gcc.gnu.org/pipermail/gcc-patches/2021-September/thread.
> html#580252>, 
> and bug 54192 as referenced therein, may be helpful for more details of 
> how FENV_ACCESS could be split up.  (At present we have -ftrapping-math, 
> on by default, and -frounding-math, off by default.  I suspect that if 
> -ftrapping-math really restricted optimizations enough to avoid all 
> problematic code reordering / removal in the presence of function calls 
> possibly reading and writing exception flags, it would actually inhibit 
> optimization more than a full implementation of -frounding-math would: a 
> full -frounding-math only means that arithmetic *reads* the rounding mode, 
> whereas a full -ftrapping-math means that arithmetic *writes* to the 
> exception flags.)

But one interesting detail is that those writes can be re-ordered when
they are not (synchronously) observed since the exception flags as
produced by arithmetic are "sticky".  That makes mapping their dataflow
to SSA not very precise, you'd have to make arithmetic produce flags
and merge them at use points.

Anyway, I think we can reach a good enough implementation without actually
implementing any data flow by simply restricting what we do to stmts.
It will of course require manual intervention in passes that can break
things rather than having the restriction being visible by data flow that's
checked anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (9 preceding siblings ...)
  2021-10-28 14:07 ` rguenth at gcc dot gnu.org
@ 2022-08-26 12:11 ` glisse at gcc dot gnu.org
  2023-01-07 21:14 ` glisse at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2022-08-26 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #8)
> Thanks for the pointer, I'll find Marc's work.

Since I had forgotten where it was, let me write here that it is git branch
/users/glisse/fenv

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
  2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
                   ` (10 preceding siblings ...)
  2022-08-26 12:11 ` glisse at gcc dot gnu.org
@ 2023-01-07 21:14 ` glisse at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2023-01-07 21:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #11)
> Since I had forgotten where it was, let me write here that it is git branch
> /users/glisse/fenv

Since it became impossible (hooks) to push to that branch a while ago, I should
post somewhere the FIXME file I couldn't push last year:

Looking at LLVM, I notice that my design in the gcc fenv branch seems to be
missing a fundamental piece: it has nothing preventing "normal" operations from
outside from migrating towards the protected region, where they may end up
using an unexpected rounding mode (unprotected doesn't mean any rounding mode,
it means the default one), or setting flags that we will observe.
One idea to prevent this would be to make sure that there are no normal FP
operations in functions that have protected operations (does that mean we
should mark functions? Just checking if there is a protected FP op doesn't work
if we call a function that does the op).
This means that we should turn all FP operations of the function into protected
ones (possibly with more relaxed flags if they are not in the protected
region), and we should also do that whenever inlining mixed functions. And
cross my fingers that the compiler doesn't start using FP ops out of thin air.
Would that be sufficient?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-07 21:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 16:15 [Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon pc at us dot ibm.com
2021-10-15 16:37 ` [Bug target/102783] " pinskia at gcc dot gnu.org
2021-10-15 19:27 ` pthaugen at gcc dot gnu.org
2021-10-15 19:31 ` segher at gcc dot gnu.org
2021-10-15 19:53 ` pinskia at gcc dot gnu.org
2021-10-18  6:27 ` rguenth at gcc dot gnu.org
2021-10-18 21:52 ` joseph at codesourcery dot com
2021-10-19 10:28 ` segher at gcc dot gnu.org
2021-10-19 10:41 ` segher at gcc dot gnu.org
2021-10-19 15:30 ` joseph at codesourcery dot com
2021-10-28 14:07 ` rguenth at gcc dot gnu.org
2022-08-26 12:11 ` glisse at gcc dot gnu.org
2023-01-07 21:14 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).