From: "Roger Sayle" <roger@nextmovesoftware.com>
To: <gcc-patches@gcc.gnu.org>
Cc: "'Segher Boessenkool'" <segher@kernel.crashing.org>,
"'Uros Bizjak'" <ubizjak@gmail.com>
Subject: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.
Date: Fri, 8 Jul 2022 08:14:58 +0100 [thread overview]
Message-ID: <003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 7070 bytes --]
This patch adds support for x86's single-byte encoded stc (set carry flag)
and clc (clear carry flag) instructions to i386.md.
The motivating example is the simple code snippet:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
{
return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}
which uses the target built-in to generate an adc instruction, adding
together A and B with the incoming carry flag already set. Currently
for this mainline GCC generates (with -O2):
movl $1, %eax
addb $-1, %al
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
where the first two instructions (to load 1 into a byte register and
then add 255 to it) are the idiom used to set the carry flag. This
is a little inefficient as x86 has a "stc" instruction for precisely
this purpose. With the attached patch we now generate:
stc
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
The central part of the patch is the addition of x86_stc and x86_clc
define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))"
and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using
x86_stc appropriately in the ix86_expand_builtin.
Alas this change exposes two latent bugs/issues in the compiler.
The first is that there are several peephole2s in i386.md that propagate
the flags register, but take its mode from the SET_SRC rather than
preserve the mode of the original SET_DEST. The other, which is
being discussed with Segher, is that the middle-end's simplify-rtx
inappropriately tries to interpret/optimize MODE_CC comparisons,
converting the above adc into an add, as it mistakenly believes
(ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when
the mode of the comparison is MODE_CCC.
I believe Segher will review (and hopefully approve) the middle-end
chunk of this patch independently, but hopefully this backend patch
provides the necessary context to explain why that change is needed.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures. Ok for mainline?
2022-07-08 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
* config/i386/i386.md (x86_clc): New define_insn.
(x86_stc): Likewise, new define_insn to set the carry flag.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(neg<mode>_ccc_1): Renamed from *neg<mode>_ccc_1 for gen function.
(define_peephole2): Use match_operand of flags_reg_operand to
capture and preserve the mode of FLAGS_REG.
(define_peephole2): Likewise.
* simplify-rtx.cc (simplify_const_relational_operation): Handle
case where both operands of a MODE_CC comparison have been
simplified to constant integers.
gcc/testsuite/ChangeLog
* gcc.target/i386/stc-1.c: New test case.
Thanks in advance (both Uros and Segher),
Roger
--
> -----Original Message-----
> From: Segher Boessenkool <segher@kernel.crashing.org>
> Sent: 07 July 2022 23:39
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Be careful with MODE_CC in
> simplify_const_relational_operation.
>
> Hi!
>
> On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> > I think it's fair to describe RTL's representation of condition flags
> > using MODE_CC as a little counter-intuitive.
>
> "A little challenging", and you should see that as a good thing, as a
puzzle to
> crack :-)
>
> > For example, the i386
> > backend represents the carry flag (in adc instructions) using RTL of
> > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to
> > be taken not to treat this like a normal RTX expression, after all LTU
> > (less-than-unsigned) against const0_rtx would normally always be
> > false.
>
> A comparison of a MODE_CC thing against 0 means the result of a
> *previous* comparison (or other cc setter) is looked at. Usually it
simply looks
> at some condition bits in a flags register. It does not do any actual
comparison:
> that has been done before (if at all even).
>
> > Hence, MODE_CC comparisons need to be treated with caution, and
> > simplify_const_relational_operation returns early (to avoid
> > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.
>
> Not just to avoid problems: there simply isn't enough information to do a
> correct job.
>
> > However, consider the (currently) hypothetical situation, where the
> > RTL optimizers determine that a previous instruction unconditionally
> > sets or clears the carry flag, and this gets propagated by combine
> > into the above expression, we'd end up with something that looks like
> > (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says.
> > Fortunately, simplify_const_relational_operation is passed the
> > original mode of the comparison (cmp_mode, the original mode of op0)
> > which can be checked for MODE_CC, even when op0 is now VOIDmode
> > (const_int) after the substitution. Defending against this is clearly
> > the right thing to do.
> >
> > More controversially, rather than just abort
> > simplification/optimization in this case, we can use the comparison
> > operator to infer/select the semantics of the CC_MODE flag.
> > Hopefully, whenever a backend uses LTU, it represents the (set) carry
> > flag (and behaves like i386.md), in which case the result of the
simplified
> expression is the first operand.
> > [If there's no standardization of semantics across backends, then we
> > should always just return 0; but then miss potential optimizations].
>
> On PowerPC, ltu means the result of an unsigned comparison (we have
> instructions for that, cmpl[wd][i] mainly) was "smaller than". It does
not mean
> anything is unsigned smaller than zero. It also has nothing to do with
carries,
> which are done via a different register (the XER).
>
> > + /* Handle MODE_CC comparisons that have been simplified to
> > + constants. */
> > + if (GET_MODE_CLASS (mode) == MODE_CC
> > + && op1 == const0_rtx
> > + && CONST_INT_P (op0))
> > + {
> > + /* LTU represents the carry flag. */
> > + if (code == LTU)
> > + return op0 == const0_rtx ? const0_rtx : const_true_rtx;
> > + return 0;
> > + }
> > +
> > /* We can't simplify MODE_CC values since we don't know what the
> > actual comparison is. */
>
> ^^^
> This comment is 100% true. We cannot simplify any MODE_CC comparison
> without having more context. The combiner does have that context when it
> tries to combine the CC setter with the CC consumer, for example.
>
> Do you have some piece of motivating example code?
>
>
> Segher
[-- Attachment #2: patchcy4.txt --]
[-- Type: text/plain, Size: 7797 bytes --]
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8bc5430..2b6d24d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13531,8 +13531,6 @@ rdseed_step:
arg3 = CALL_EXPR_ARG (exp, 3); /* unsigned int *sum_out. */
op1 = expand_normal (arg0);
- if (!integer_zerop (arg0))
- op1 = copy_to_mode_reg (QImode, convert_to_mode (QImode, op1, 1));
op2 = expand_normal (arg1);
if (!register_operand (op2, mode0))
@@ -13550,7 +13548,7 @@ rdseed_step:
}
op0 = gen_reg_rtx (mode0);
- if (integer_zerop (arg0))
+ if (op1 == const0_rtx)
{
/* If arg0 is 0, optimize right away into add or sub
instruction that sets CCCmode flags. */
@@ -13560,7 +13558,14 @@ rdseed_step:
else
{
/* Generate CF from input operand. */
- emit_insn (gen_addqi3_cconly_overflow (op1, constm1_rtx));
+ if (!CONST_INT_P (op1))
+ {
+ op1 = convert_to_mode (QImode, op1, 1);
+ op1 = copy_to_mode_reg (QImode, op1);
+ emit_insn (gen_negqi_ccc_1 (op1, op1));
+ }
+ else
+ emit_insn (gen_x86_stc ());
/* Generate instruction that consumes CF. */
op1 = gen_rtx_REG (CCCmode, FLAGS_REG);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5b53841..1cc3989 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1765,6 +1765,22 @@
(set_attr "bdver1_decode" "direct")
(set_attr "mode" "SI")])
+(define_insn "x86_clc"
+ [(set (reg:CCC FLAGS_REG) (const_int 0))]
+ ""
+ "stc"
+ [(set_attr "length" "1")
+ (set_attr "length_immediate" "0")
+ (set_attr "modrm" "0")])
+
+(define_insn "x86_stc"
+ [(set (reg:CCC FLAGS_REG) (const_int 1))]
+ ""
+ "stc"
+ [(set_attr "length" "1")
+ (set_attr "length_immediate" "0")
+ (set_attr "modrm" "0")])
+
;; Pentium Pro can do both steps in one go.
;; (these instructions set flags directly)
@@ -7735,6 +7751,14 @@
"#"
"&& 1"
[(const_int 0)])
+
+(define_insn_and_split "*setcc_qi_negqi_ccc_1_<mode>"
+ [(set (reg:CCC FLAGS_REG)
+ (ltu:CCC (reg:CC_CCC FLAGS_REG) (const_int 0)))]
+ "ix86_pre_reload_split ()"
+ "#"
+ "&& 1"
+ [(const_int 0)])
\f
;; Overflow setting add instructions
@@ -11218,7 +11242,7 @@
[(set_attr "type" "negnot")
(set_attr "mode" "SI")])
-(define_insn "*neg<mode>_ccc_1"
+(define_insn "neg<mode>_ccc_1"
[(set (reg:CCC FLAGS_REG)
(ne:CCC
(match_operand:SWI 1 "nonimmediate_operand" "0")
@@ -15072,7 +15096,7 @@
;; Convert setcc + movzbl to xor + setcc if operands don't overlap.
(define_peephole2
- [(set (reg FLAGS_REG) (match_operand 0))
+ [(set (match_operand 4 "flags_reg_operand") (match_operand 0))
(set (match_operand:QI 1 "register_operand")
(match_operator:QI 2 "ix86_comparison_operator"
[(reg FLAGS_REG) (const_int 0)]))
@@ -15086,13 +15110,12 @@
(set (strict_low_part (match_dup 5))
(match_dup 2))]
{
- operands[4] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
operands[5] = gen_lowpart (QImode, operands[3]);
ix86_expand_clear (operands[3]);
})
(define_peephole2
- [(parallel [(set (reg FLAGS_REG) (match_operand 0))
+ [(parallel [(set (match_operand 5 "flags_reg_operand") (match_operand 0))
(match_operand 4)])
(set (match_operand:QI 1 "register_operand")
(match_operator:QI 2 "ix86_comparison_operator"
@@ -15110,14 +15133,13 @@
(set (strict_low_part (match_dup 6))
(match_dup 2))]
{
- operands[5] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
operands[6] = gen_lowpart (QImode, operands[3]);
ix86_expand_clear (operands[3]);
})
(define_peephole2
- [(set (reg FLAGS_REG) (match_operand 0))
- (parallel [(set (reg FLAGS_REG) (match_operand 1))
+ [(set (match_operand 6 "flags_reg_operand") (match_operand 0))
+ (parallel [(set (match_operand 7 "flags_reg_operand") (match_operand 1))
(match_operand 5)])
(set (match_operand:QI 2 "register_operand")
(match_operator:QI 3 "ix86_comparison_operator"
@@ -15138,8 +15160,6 @@
(set (strict_low_part (match_dup 8))
(match_dup 3))]
{
- operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
- operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
operands[8] = gen_lowpart (QImode, operands[4]);
ix86_expand_clear (operands[4]);
})
@@ -15147,7 +15167,7 @@
;; Similar, but match zero extend with andsi3.
(define_peephole2
- [(set (reg FLAGS_REG) (match_operand 0))
+ [(set (match_operand 4 "flags_reg_operand") (match_operand 0))
(set (match_operand:QI 1 "register_operand")
(match_operator:QI 2 "ix86_comparison_operator"
[(reg FLAGS_REG) (const_int 0)]))
@@ -15161,13 +15181,12 @@
(set (strict_low_part (match_dup 5))
(match_dup 2))]
{
- operands[4] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
operands[5] = gen_lowpart (QImode, operands[3]);
ix86_expand_clear (operands[3]);
})
(define_peephole2
- [(parallel [(set (reg FLAGS_REG) (match_operand 0))
+ [(parallel [(set (match_operand 5 "flags_reg_operand") (match_operand 0))
(match_operand 4)])
(set (match_operand:QI 1 "register_operand")
(match_operator:QI 2 "ix86_comparison_operator"
@@ -15186,14 +15205,13 @@
(set (strict_low_part (match_dup 6))
(match_dup 2))]
{
- operands[5] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
operands[6] = gen_lowpart (QImode, operands[3]);
ix86_expand_clear (operands[3]);
})
(define_peephole2
- [(set (reg FLAGS_REG) (match_operand 0))
- (parallel [(set (reg FLAGS_REG) (match_operand 1))
+ [(set (match_operand 6 "flags_reg_operand") (match_operand 0))
+ (parallel [(set (match_operand 7 "flags_reg_operand") (match_operand 1))
(match_operand 5)])
(set (match_operand:QI 2 "register_operand")
(match_operator:QI 3 "ix86_comparison_operator"
@@ -15215,8 +15233,6 @@
(set (strict_low_part (match_dup 8))
(match_dup 3))]
{
- operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
- operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
operands[8] = gen_lowpart (QImode, operands[4]);
ix86_expand_clear (operands[4]);
})
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index fa20665..73ec5c7 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -6026,6 +6026,18 @@ simplify_const_relational_operation (enum rtx_code code,
return 0;
}
+ /* Handle MODE_CC comparisons that have been simplified to
+ constants. */
+ if (GET_MODE_CLASS (mode) == MODE_CC
+ && op1 == const0_rtx
+ && CONST_INT_P (op0))
+ {
+ /* LTU represents the carry flag. */
+ if (code == LTU)
+ return op0 == const0_rtx ? const0_rtx : const_true_rtx;
+ return 0;
+ }
+
/* We can't simplify MODE_CC values since we don't know what the
actual comparison is. */
if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
diff --git a/gcc/testsuite/gcc.target/i386/stc-1.c b/gcc/testsuite/gcc.target/i386/stc-1.c
new file mode 100644
index 0000000..857c939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/stc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int u32;
+
+unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
+{
+ return __builtin_ia32_addcarryx_u32 (1, a, b, c);
+}
+
+unsigned int bar (unsigned int b, unsigned int *c)
+{
+ return __builtin_ia32_addcarryx_u32 (1, 2, b, c);
+}
+
+unsigned int baz (unsigned int a, unsigned int *c)
+{
+ return __builtin_ia32_addcarryx_u32 (1, a, 3, c);
+}
+
+/* { dg-final { scan-assembler "stc" } } */
next reply other threads:[~2022-07-08 7:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-08 7:14 Roger Sayle [this message]
2022-07-08 7:23 ` Uros Bizjak
2022-07-08 7:59 ` Uros Bizjak
2022-07-08 21:08 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com' \
--to=roger@nextmovesoftware.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).