public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Paul A. Clarke" <pc@us.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com
Subject: Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics
Date: Mon, 18 Oct 2021 19:36:20 -0500	[thread overview]
Message-ID: <20211019003620.GB10303@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> (raw)
In-Reply-To: <20211012222532.GL10333@gate.crashing.org>

On Tue, Oct 12, 2021 at 05:25:32PM -0500, Segher Boessenkool wrote:
> On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> > {
> >   fenv_union_t old;
> >   register fenv_union_t __fr;
> >   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
> >   ctx->env = old.fenv = __fr.fenv; 
> >   ctx->updated_status = (r != (old.l & 3));
> > }
> 
> (Should use "n", not "i", only numbers are allowed, not e.g. the address
> of something.  This actually can matter, in unusual cases.)

Noted, will submit a change to glibc when I get a chance. Thanks!

> This orders the updating of RN before the store to __fr.fenv .  There is
> no other ordering ensured here.
> 
> The store to __fr.env obviously has to stay in order with anything that
> can alias it, if that store isn't optimised away completely later.
> 
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feresetround_ppc (fenv_t *envp)
> > { 
> >   fenv_union_t new = { .fenv = *envp };
> >   register fenv_union_t __fr;
> >   __fr.l = new.l & 3;
> >   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
> > }
> 
> This both reads from and stores to __fr.fenv, the asm has to stay
> between those two accesses (in the machine code).  If the code that
> actually depends on the modified RN depends onb that __fr.fenv some way,
> all will be fine.
> 
> > double
> > __sin (double x)
> > {
> >   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
> >   libc_feholdsetround_ppc_ctx (&ctx, (0));
> >   /* floating point intensive code.  */
> >   return retval;
> > }
> 
> ... but there is no such dependency.  The cleanup attribute does not
> give any such ordering either afaik.
> 
> > There's not much to it, really.  "mffscrni" on the way in to save and set
> > a required rounding mode, and "mffscrn" on the way out to restore it.
> 
> Yes.  But the code making use of the modified RN needs to have some
> artificial dependencies with the RN setters, perhaps via __fr.fenv .
> 
> > > Calling a real function (that does not even need a stack frame, just a
> > > blr) is not terribly expensive, either.
> > 
> > Not ideal, better would be better.
> 
> Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
> simple, *robust* solution over some nice, faster,nicely faster way of
> doing the wrong thing.

Understand, and agree. 

> > > > > > Would creating a __builtin_mffsce be another solution?
> > > > > 
> > > > > Yes.  And not a bad idea in the first place.
> > > > 
> > > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > > difference between "asm" and builtin, how does using a builtin solve the
> > > > problem?
> > > 
> > > You will have to make the builtin solve it.  What a builtin can do is
> > > virtually unlimited.  What an asm can do is not: it just outputs some
> > > assembler language, and does in/out/clobber constraints.  You can do a
> > > *lot* with that, but it is much more limited than everything you can do
> > > in the compiler!  :-)
> > > 
> > > The fact remains that there is no way in RTL (or Gimple for that matter)
> > > to express things like rounding mode changes.  You will need to
> > > artificially make some barriers.
> > 
> > I know there is __builtin_set_fpscr_rn that generates mffscrn.
> 
> Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
> provide the ordering either.  It *cannot*: you need to cooperate with
> whatever you are ordering against.  There is no way in GCC to say "this
> is an FP insn and has to stay in order with all FP control writes and FP
> status reads".
> 
> Maybe now you see why I like external functions for this :-)
> 
> > This
> > is not used in the code above because I believe it first appears in
> > GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> > a return value, which would be handy in this case).  Does the
> > implementation of that builtin meet the requirements needed here,
> > to prevent reordering of FP computation across instantiations of the
> > builtin?  If not, is there a model on which to base an implementation
> > of __builtin_mffsce (or some preferred name)?
> 
> It depends on what you are actually ordering, unfortunately.

What I hear is that for the specific requirements and restrictions here,
there is nothing special that another builtin, like a theoretical
__builtin_mffsce implemented like __builtin_fpscr_set_rn, can provide
to solve the issue under discussion.  The dependencies need to be expressed
such that the compiler understand them, and there is no way to do so
with the current implementation of __builtin_fpscr_set_rn.

With some effort, and proper visibility, the dependencies can be expressed
using "asm". I believe that's the case here, and will submit a v2 for
review shortly.

For the general case of inlines, builtins, or asm without visibility,
I've opened an issue for GCC to consider accommodation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783.

Thanks so much for your help!

PC

  reply	other threads:[~2021-10-19  0:36 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:03 [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
2021-08-27 13:44   ` Bill Schmidt
2021-08-27 13:47     ` Bill Schmidt
2021-08-30 21:16     ` Paul A. Clarke
2021-08-30 21:24       ` Bill Schmidt
2021-10-07 23:08       ` Segher Boessenkool
2021-10-07 23:39   ` Segher Boessenkool
2021-10-08  1:04     ` Paul A. Clarke
2021-10-08 17:39       ` Segher Boessenkool
2021-10-08 19:27         ` Paul A. Clarke
2021-10-08 22:31           ` Segher Boessenkool
2021-10-11 13:46             ` Paul A. Clarke
2021-10-11 16:28               ` Segher Boessenkool
2021-10-11 17:31                 ` Paul A. Clarke
2021-10-11 22:04                   ` Segher Boessenkool
2021-10-12 19:35                     ` Paul A. Clarke
2021-10-12 22:25                       ` Segher Boessenkool
2021-10-19  0:36                         ` Paul A. Clarke [this message]
2021-08-23 19:03 ` [PATCH v3 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics Paul A. Clarke
2021-08-27 13:47   ` Bill Schmidt
2021-10-11 19:28   ` Segher Boessenkool
2021-10-12  1:42     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics Paul A. Clarke
2021-08-27 13:48   ` Bill Schmidt
2021-10-11 20:50   ` Segher Boessenkool
2021-10-12  1:47     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 4/6] rs6000: Support SSE4.1 "cvt" intrinsics Paul A. Clarke
2021-08-27 13:49   ` Bill Schmidt
2021-10-11 21:52   ` Segher Boessenkool
2021-10-12  1:51     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics Paul A. Clarke
2021-08-27 15:21   ` Bill Schmidt
2021-08-27 18:52     ` Paul A. Clarke
2021-10-11 23:07   ` Segher Boessenkool
2021-10-12  1:55     ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
2021-08-27 15:25   ` Bill Schmidt
2021-10-12  0:11   ` Segher Boessenkool
2021-10-13 17:04     ` Paul A. Clarke
2021-10-13 23:47       ` Segher Boessenkool
2021-10-19  0:26         ` Paul A. Clarke
2021-09-16 14:59 ` [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-10-04 18:26   ` Paul A. Clarke
2021-10-07 22:25 ` Segher Boessenkool
2021-10-08  0:29   ` Paul A. Clarke
2021-10-12  0:15     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211019003620.GB10303@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com \
    --to=pc@us.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).