Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	 Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.
Date: Fri, 8 Jul 2022 09:23:45 +0200	[thread overview]
Message-ID: <CAFULd4YkAL2zfK1=N_nFkmj7aF64bkthUwrwU3jcZ7m9_0WYkw@mail.gmail.com> (raw)
In-Reply-To: <003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com>

On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.
>
> The motivating example is the simple code snippet:
>
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
> {
>   return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which uses the target built-in to generate an adc instruction, adding
> together A and B with the incoming carry flag already set.  Currently
> for this mainline GCC generates (with -O2):
>
>         movl    $1, %eax
>         addb    $-1, %al
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret
>
> where the first two instructions (to load 1 into a byte register and
> then add 255 to it) are the idiom used to set the carry flag.  This
> is a little inefficient as x86 has a "stc" instruction for precisely
> this purpose.  With the attached patch we now generate:
>
>         stc
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret
>
> The central part of the patch is the addition of x86_stc and x86_clc
> define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))"
> and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using
> x86_stc appropriately in the ix86_expand_builtin.
>
> Alas this change exposes two latent bugs/issues in the compiler.
> The first is that there are several peephole2s in i386.md that propagate
> the flags register, but take its mode from the SET_SRC rather than
> preserve the mode of the original SET_DEST.  The other, which is
> being discussed with Segher, is that the middle-end's simplify-rtx
> inappropriately tries to interpret/optimize MODE_CC comparisons,
> converting the above adc into an add, as it mistakenly believes
> (ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when
> the mode of the comparison is MODE_CCC.
>
> I believe Segher will review (and hopefully approve) the middle-end
> chunk of this patch independently, but hopefully this backend patch
> provides the necessary context to explain why that change is needed.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  Ok for mainline?
>
>
> 2022-07-08  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
>         Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
>         * config/i386/i386.md (x86_clc): New define_insn.
>         (x86_stc): Likewise, new define_insn to set the carry flag.
>         (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
>         recognize (and eliminate) the carry flag being copied to itself.
>         (neg<mode>_ccc_1): Renamed from *neg<mode>_ccc_1 for gen function.
>         (define_peephole2): Use match_operand of flags_reg_operand to
>         capture and preserve the mode of FLAGS_REG.
>         (define_peephole2): Likewise.
>
>         * simplify-rtx.cc (simplify_const_relational_operation): Handle
>         case where both operands of a MODE_CC comparison have been
>         simplified to constant integers.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/stc-1.c: New test case.

Please split out the peephole2 part of the patch. This part is
pre-approved and should be committed independently of the main part of
the patch.

Thanks,
Uros.

>
>
> Thanks in advance (both Uros and Segher),
> Roger
> --
>
> > -----Original Message-----
> > From: Segher Boessenkool <segher@kernel.crashing.org>
> > Sent: 07 July 2022 23:39
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Be careful with MODE_CC in
> > simplify_const_relational_operation.
> >
> > Hi!
> >
> > On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> > > I think it's fair to describe RTL's representation of condition flags
> > > using MODE_CC as a little counter-intuitive.
> >
> > "A little challenging", and you should see that as a good thing, as a
> puzzle to
> > crack :-)
> >
> > > For example, the i386
> > > backend represents the carry flag (in adc instructions) using RTL of
> > > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to
> > > be taken not to treat this like a normal RTX expression, after all LTU
> > > (less-than-unsigned) against const0_rtx would normally always be
> > > false.
> >
> > A comparison of a MODE_CC thing against 0 means the result of a
> > *previous* comparison (or other cc setter) is looked at.  Usually it
> simply looks
> > at some condition bits in a flags register.  It does not do any actual
> comparison:
> > that has been done before (if at all even).
> >
> > > Hence, MODE_CC comparisons need to be treated with caution, and
> > > simplify_const_relational_operation returns early (to avoid
> > > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.
> >
> > Not just to avoid problems: there simply isn't enough information to do a
> > correct job.
> >
> > > However, consider the (currently) hypothetical situation, where the
> > > RTL optimizers determine that a previous instruction unconditionally
> > > sets or clears the carry flag, and this gets propagated by combine
> > > into the above expression, we'd end up with something that looks like
> > > (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says.
> > > Fortunately, simplify_const_relational_operation is passed the
> > > original mode of the comparison (cmp_mode, the original mode of op0)
> > > which can be checked for MODE_CC, even when op0 is now VOIDmode
> > > (const_int) after the substitution.  Defending against this is clearly
> > > the right thing to do.
> > >
> > > More controversially, rather than just abort
> > > simplification/optimization in this case, we can use the comparison
> > > operator to infer/select the semantics of the CC_MODE flag.
> > > Hopefully, whenever a backend uses LTU, it represents the (set) carry
> > > flag (and behaves like i386.md), in which case the result of the
> simplified
> > expression is the first operand.
> > > [If there's no standardization of semantics across backends, then we
> > > should always just return 0; but then miss potential optimizations].
> >
> > On PowerPC, ltu means the result of an unsigned comparison (we have
> > instructions for that, cmpl[wd][i] mainly) was "smaller than".  It does
> not mean
> > anything is unsigned smaller than zero.  It also has nothing to do with
> carries,
> > which are done via a different register (the XER).
> >
> > > +  /* Handle MODE_CC comparisons that have been simplified to
> > > +     constants.  */
> > > +  if (GET_MODE_CLASS (mode) == MODE_CC
> > > +      && op1 == const0_rtx
> > > +      && CONST_INT_P (op0))
> > > +    {
> > > +      /* LTU represents the carry flag.  */
> > > +      if (code == LTU)
> > > +   return op0 == const0_rtx ? const0_rtx : const_true_rtx;
> > > +      return 0;
> > > +    }
> > > +
> > >    /* We can't simplify MODE_CC values since we don't know what the
> > >       actual comparison is.  */
> >
> > ^^^
> > This comment is 100% true.  We cannot simplify any MODE_CC comparison
> > without having more context.  The combiner does have that context when it
> > tries to combine the CC setter with the CC consumer, for example.
> >
> > Do you have some piece of motivating example code?
> >
> >
> > Segher

next prev parent reply	other threads:[~2022-07-08  7:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  7:14 Roger Sayle
2022-07-08  7:23 ` Uros Bizjak [this message]
2022-07-08  7:59 ` Uros Bizjak
2022-07-08 21:08 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4YkAL2zfK1=N_nFkmj7aF64bkthUwrwU3jcZ7m9_0WYkw@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).