public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Segher Boessenkool <segher@kernel.crashing.org>
To: "Paul A. Clarke" <pc@us.ibm.com>
Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com
Subject: Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics
Date: Fri, 8 Oct 2021 17:31:11 -0500	[thread overview]
Message-ID: <20211008223111.GU10333@gate.crashing.org> (raw)
In-Reply-To: <20211008192728.GF243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com>

On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > This is not a basic asm (it contains a ":"; that is not just an easy way
> > to see it, it is the *definition* of basic vs. extended asm).
> 
> Ah, basic vs extended. I learned something today... thanks for your
> patience!

To expand a little: any asm with operands is extended asm.  And without
operands can be either:  asm("eieio");  is basic, while  asm("eieio" : );
is extended.  This matters because semantics are a bit different.

> I see. Thanks for the reference. If I understand correctly, volatile
> prevents some optimizations based on the defined inputs/outputs, but
> the asm could still be subject to reordering.

"asm volatile" means there is a side effect in the asm.  This means that
it has to be executed on the real machine the same as on the abstract
machine, with the side effects in the same order.

It can still be reordered, modulo those restrictions.  It can be merged
with an identical asm as well.  And the compiler can split this into two
identical asms on two paths.

In this case you might want a side effect (the instructions writes to
the FPSCR after all).  But you need this to be tied to the FP code that
you want the flags to be changed for, and to the restore of the flags,
and finally you need to prevent other FP code from being scheduled in
between.

You need more for that than just volatile, and the solution may well
make volatile not wanted: tying the insns together somehow will
naturally make the flags restored to a sane situation again, so the
whole group can be removed if you want, etc.

> In this particular case, I don't think it's an issue with respect to
> reordering.  The code in question is:
> +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> 
> The output (__fpscr_save) is a source for the following assignment,
> so the order should be respected, no?

Other FP code can be interleaved, and then do the wrong thing.

> With respect to volatile, I worry about removing it, because I do
> indeed need that instruction to execute in order to clear the FPSCR
> exception enable bits. That side-effect is not otherwise known to the
> compiler.

Yes.  But as said above, volatile isn't enough to get this to behave
correctly.

The easiest way out is to write this all in one piece of (inline) asm.

> > > > You do not need any of that __ either.
> > > 
> > > I'm surprised that I don't. A .h file needs to be concerned about the
> > > namespace it inherits, no?
> > 
> > These are local variables in a function though.  You get such
> > complexities in macros, but never in functions, where everything is
> > scoped.  Local variables are a great thing.  And macros are a bad thing!
> 
> They are local variables in a function *in an include file*, though.
> If a user's preprocessor macro just happens to match a local variable name
> there could be problems, right?

Of course.  This is why traditionally macro names are ALL_CAPS :-)  So
in practice it doesn't matter, and in practice many users use __ names
themselves as well.

But you are right.  I just don't see it will help practically :-(


Segher

  reply	other threads:[~2021-10-08 22:34 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:03 [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
2021-08-27 13:44   ` Bill Schmidt
2021-08-27 13:47     ` Bill Schmidt
2021-08-30 21:16     ` Paul A. Clarke
2021-08-30 21:24       ` Bill Schmidt
2021-10-07 23:08       ` Segher Boessenkool
2021-10-07 23:39   ` Segher Boessenkool
2021-10-08  1:04     ` Paul A. Clarke
2021-10-08 17:39       ` Segher Boessenkool
2021-10-08 19:27         ` Paul A. Clarke
2021-10-08 22:31           ` Segher Boessenkool [this message]
2021-10-11 13:46             ` Paul A. Clarke
2021-10-11 16:28               ` Segher Boessenkool
2021-10-11 17:31                 ` Paul A. Clarke
2021-10-11 22:04                   ` Segher Boessenkool
2021-10-12 19:35                     ` Paul A. Clarke
2021-10-12 22:25                       ` Segher Boessenkool
2021-10-19  0:36                         ` Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics Paul A. Clarke
2021-08-27 13:47   ` Bill Schmidt
2021-10-11 19:28   ` Segher Boessenkool
2021-10-12  1:42     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics Paul A. Clarke
2021-08-27 13:48   ` Bill Schmidt
2021-10-11 20:50   ` Segher Boessenkool
2021-10-12  1:47     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 4/6] rs6000: Support SSE4.1 "cvt" intrinsics Paul A. Clarke
2021-08-27 13:49   ` Bill Schmidt
2021-10-11 21:52   ` Segher Boessenkool
2021-10-12  1:51     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics Paul A. Clarke
2021-08-27 15:21   ` Bill Schmidt
2021-08-27 18:52     ` Paul A. Clarke
2021-10-11 23:07   ` Segher Boessenkool
2021-10-12  1:55     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
2021-08-27 15:25   ` Bill Schmidt
2021-10-12  0:11   ` Segher Boessenkool
2021-10-13 17:04     ` Paul A. Clarke
2021-10-13 23:47       ` Segher Boessenkool
2021-10-19  0:26         ` Paul A. Clarke
2021-09-16 14:59 ` [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-10-04 18:26   ` Paul A. Clarke
2021-10-07 22:25 ` Segher Boessenkool
2021-10-08  0:29   ` Paul A. Clarke
2021-10-12  0:15     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008223111.GU10333@gate.crashing.org \
    --to=segher@kernel.crashing.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=pc@us.ibm.com \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).