* [x86_64 PATCH] Avoid andb %dil when optimizing for size.
@ 2022-04-12 15:42 Roger Sayle
2022-04-12 16:36 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2022-04-12 15:42 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 898 bytes --]
This stand-alone patch avoids (-Os) regressions from a fix for PR 70321.
The simple test case below has the unfortunate property that on x86_64
it is larger when compiled with -Os than when compiled with -O2.
int foo(char x)
{
return (x & 123) != 0;
}
The issue is x86's complex instruction encoding, where andb $XX,%dil
requires more bytes than andl $XX,%edi. This patch provides a peephole2
to convert *and_qi_2_maybe_si into *andsi_2 for the problematic cases.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures. Ok for mainline?
2022-04-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
*andsi_2 with -Os when the instruction encoding is shorter.
gcc/testsuite/ChangeLog
* gcc.target/i386/and-1.c: New test case.
Thanks in advance,
Roger
--
[-- Attachment #2: patchos3.txt --]
[-- Type: text/plain, Size: 1945 bytes --]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c74edd1..3522785 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10140,6 +10140,37 @@
[(set_attr "type" "alu")
(set_attr "mode" "<MODE>")])
+;; On TARGET_64BIT, andb $XX,%dil is larger than andl $XX,%edi so
+;; convert *and_qi_2_maybe_si into *andsi_2 when optimizing for size.
+
+(define_peephole2
+ [(parallel
+ [(set (match_operand 0 "flags_reg_operand")
+ (match_operator 1 "compare_operator"
+ [(and:QI (match_operand:QI 2 "register_operand")
+ (match_operand:QI 3 "immediate_operand"))
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:QI (match_dup 2) (match_dup 3)))])]
+ "TARGET_64BIT
+ && !TARGET_PARTIAL_REG_STALL
+ && optimize_insn_for_size_p ()
+ && ix86_match_ccmode (insn, CCNOmode)
+ && (INTVAL (operands[3]) & 0x80) == 0
+ /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */
+ && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO (operands[2]))
+ && peep2_reg_dead_p (1, operands[2])"
+ [(parallel
+ [(set (match_dup 0)
+ (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:SI (match_dup 2) (match_dup 3)))])]
+{
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
+ operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode);
+})
+
(define_expand "andqi_ext_1"
[(parallel
[(set (zero_extract:HI (match_operand:HI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/and-1.c b/gcc/testsuite/gcc.target/i386/and-1.c
new file mode 100644
index 0000000..11890d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/and-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Os" } */
+
+int foo(char x)
+{
+ return (x & 123) != 0;
+}
+
+/* { dg-final { scan-assembler-not "%dil" } } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
2022-04-12 15:42 [x86_64 PATCH] Avoid andb %dil when optimizing for size Roger Sayle
@ 2022-04-12 16:36 ` Uros Bizjak
2022-04-13 7:08 ` Roger Sayle
0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2022-04-12 16:36 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Tue, Apr 12, 2022 at 5:42 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This stand-alone patch avoids (-Os) regressions from a fix for PR 70321.
>
> The simple test case below has the unfortunate property that on x86_64
> it is larger when compiled with -Os than when compiled with -O2.
>
> int foo(char x)
> {
> return (x & 123) != 0;
> }
>
> The issue is x86's complex instruction encoding, where andb $XX,%dil
> requires more bytes than andl $XX,%edi. This patch provides a peephole2
> to convert *and_qi_2_maybe_si into *andsi_2 for the problematic cases.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures. Ok for mainline?
>
>
> 2022-04-12 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
> *andsi_2 with -Os when the instruction encoding is shorter.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/and-1.c: New test case.
+(define_peephole2
+ [(parallel
+ [(set (match_operand 0 "flags_reg_operand")
+ (match_operator 1 "compare_operator"
+ [(and:QI (match_operand:QI 2 "register_operand")
+ (match_operand:QI 3 "immediate_operand"))
const_int_operand here, you have INTVAL accessor for this operand below.
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:QI (match_dup 2) (match_dup 3)))])]
+ "TARGET_64BIT
+ && !TARGET_PARTIAL_REG_STALL
+ && optimize_insn_for_size_p ()
+ && ix86_match_ccmode (insn, CCNOmode)
+ && (INTVAL (operands[3]) & 0x80) == 0
val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
+ /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */
+ && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO (operands[2]))
+ && peep2_reg_dead_p (1, operands[2])"
Do we really need the above condition? Operand 2 is effectively inout
operand, and when ANDed in SImode with zero-extended immediate, it
will result in the same QImode value. The value outside QImode is not
guaranteed without using strict_low_part.
+ [(parallel
+ [(set (match_dup 0)
+ (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:SI (match_dup 2) (match_dup 3)))])]
+{
+ operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
gen_lowpart (Simode, operands[2]) should work here.
+ operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode);
No need to convert the immediate to SImode. The insn condition also
guarantees it to be unsigned.
+})
Uros.
> Thanks in advance,
> Roger
> --
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
2022-04-12 16:36 ` Uros Bizjak
@ 2022-04-13 7:08 ` Roger Sayle
2022-04-13 9:26 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2022-04-13 7:08 UTC (permalink / raw)
To: 'Uros Bizjak'; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 3723 bytes --]
Hi Uros,
Many thanks for the speedy review. Here's the revised version of the
patch incorporating all of your suggested fixes and improvements.
This has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures. Ok for mainline?
2022-04-13 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
*andsi_2 with -Os when the instruction encoding is shorter.
gcc/testsuite/ChangeLog
* gcc.target/i386/and-1.c: New test case.
Thanks again,
Roger
--
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 12 April 2022 17:37
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
>
> On Tue, Apr 12, 2022 at 5:42 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> >
> > This stand-alone patch avoids (-Os) regressions from a fix for PR 70321.
> >
> > The simple test case below has the unfortunate property that on x86_64
> > it is larger when compiled with -Os than when compiled with -O2.
> >
> > int foo(char x)
> > {
> > return (x & 123) != 0;
> > }
> >
> > The issue is x86's complex instruction encoding, where andb $XX,%dil
> > requires more bytes than andl $XX,%edi. This patch provides a
> > peephole2 to convert *and_qi_2_maybe_si into *andsi_2 for the problematic
> cases.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures. Ok for mainline?
> >
> >
> > 2022-04-12 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
> > *andsi_2 with -Os when the instruction encoding is shorter.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/and-1.c: New test case.
>
> +(define_peephole2
> + [(parallel
> + [(set (match_operand 0 "flags_reg_operand")
> + (match_operator 1 "compare_operator"
> + [(and:QI (match_operand:QI 2 "register_operand")
> + (match_operand:QI 3 "immediate_operand"))
>
> const_int_operand here, you have INTVAL accessor for this operand below.
>
> + (const_int 0)]))
> + (set (match_dup 2)
> + (and:QI (match_dup 2) (match_dup 3)))])] "TARGET_64BIT
> + && !TARGET_PARTIAL_REG_STALL
> + && optimize_insn_for_size_p ()
> + && ix86_match_ccmode (insn, CCNOmode)
> + && (INTVAL (operands[3]) & 0x80) == 0
>
> val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
>
> + /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */
> + && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO
> (operands[2]))
> + && peep2_reg_dead_p (1, operands[2])"
>
> Do we really need the above condition? Operand 2 is effectively inout operand,
> and when ANDed in SImode with zero-extended immediate, it will result in the
> same QImode value. The value outside QImode is not guaranteed without using
> strict_low_part.
>
> + [(parallel
> + [(set (match_dup 0)
> + (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
> + (const_int 0)]))
> + (set (match_dup 2)
> + (and:SI (match_dup 2) (match_dup 3)))])] {
> + operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
>
> gen_lowpart (Simode, operands[2]) should work here.
>
> + operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode);
>
> No need to convert the immediate to SImode. The insn condition also
> guarantees it to be unsigned.
>
> +})
>
> Uros.
>
> > Thanks in advance,
> > Roger
> > --
> >
[-- Attachment #2: patchos4.txt --]
[-- Type: text/plain, Size: 1847 bytes --]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c74edd1..4ad3223 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10140,6 +10140,35 @@
[(set_attr "type" "alu")
(set_attr "mode" "<MODE>")])
+;; On TARGET_64BIT, andb $XX,%dil is larger than andl $XX,%edi so
+;; convert *and_qi_2_maybe_si into *andsi_2 when optimizing for size.
+
+(define_peephole2
+ [(parallel
+ [(set (match_operand 0 "flags_reg_operand")
+ (match_operator 1 "compare_operator"
+ [(and:QI (match_operand:QI 2 "register_operand")
+ (match_operand:QI 3 "const_int_operand"))
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:QI (match_dup 2) (match_dup 3)))])]
+ "TARGET_64BIT
+ && !TARGET_PARTIAL_REG_STALL
+ && optimize_insn_for_size_p ()
+ && ix86_match_ccmode (insn, CCNOmode)
+ && val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
+ /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */
+ && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO (operands[2]))"
+ [(parallel
+ [(set (match_dup 0)
+ (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
+ (const_int 0)]))
+ (set (match_dup 2)
+ (and:SI (match_dup 2) (match_dup 3)))])]
+{
+ operands[2] = gen_lowpart (SImode, operands[2]);
+})
+
(define_expand "andqi_ext_1"
[(parallel
[(set (zero_extract:HI (match_operand:HI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/and-1.c b/gcc/testsuite/gcc.target/i386/and-1.c
new file mode 100644
index 0000000..11890d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/and-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Os" } */
+
+int foo(char x)
+{
+ return (x & 123) != 0;
+}
+
+/* { dg-final { scan-assembler-not "%dil" } } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
2022-04-13 7:08 ` Roger Sayle
@ 2022-04-13 9:26 ` Uros Bizjak
0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2022-04-13 9:26 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]
On Wed, Apr 13, 2022 at 9:08 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Many thanks for the speedy review. Here's the revised version of the
> patch incorporating all of your suggested fixes and improvements.
How about we go with an alternative route and enhance the *_maybe_si
instructions with your findings, like the attached patch?
Please note there is no dependency on TARGET_PARTIAL_REG_STALL. I
think that when -Os is in effect, we don't care for speed, but size
gets priority, disregarding runtime slowdowns.
Uros.
>
> This has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures. Ok for mainline?
>
> 2022-04-13 Roger Sayle <roger@nextmovesoftware.com>
> Uroš Bizjak <ubizjak@gmail.com>
>
> gcc/ChangeLog
> * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
> *andsi_2 with -Os when the instruction encoding is shorter.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/and-1.c: New test case.
>
>
> Thanks again,
> Roger
> --
>
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 12 April 2022 17:37
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [x86_64 PATCH] Avoid andb %dil when optimizing for size.
> >
> > On Tue, Apr 12, 2022 at 5:42 PM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > This stand-alone patch avoids (-Os) regressions from a fix for PR 70321.
> > >
> > > The simple test case below has the unfortunate property that on x86_64
> > > it is larger when compiled with -Os than when compiled with -O2.
> > >
> > > int foo(char x)
> > > {
> > > return (x & 123) != 0;
> > > }
> > >
> > > The issue is x86's complex instruction encoding, where andb $XX,%dil
> > > requires more bytes than andl $XX,%edi. This patch provides a
> > > peephole2 to convert *and_qi_2_maybe_si into *andsi_2 for the problematic
> > cases.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check with no new failures. Ok for mainline?
> > >
> > >
> > > 2022-04-12 Roger Sayle <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > > * config/i386/i386.md (peephole2): Transform *and_qi_maybe_si into
> > > *andsi_2 with -Os when the instruction encoding is shorter.
> > >
> > > gcc/testsuite/ChangeLog
> > > * gcc.target/i386/and-1.c: New test case.
> >
> > +(define_peephole2
> > + [(parallel
> > + [(set (match_operand 0 "flags_reg_operand")
> > + (match_operator 1 "compare_operator"
> > + [(and:QI (match_operand:QI 2 "register_operand")
> > + (match_operand:QI 3 "immediate_operand"))
> >
> > const_int_operand here, you have INTVAL accessor for this operand below.
> >
> > + (const_int 0)]))
> > + (set (match_dup 2)
> > + (and:QI (match_dup 2) (match_dup 3)))])] "TARGET_64BIT
> > + && !TARGET_PARTIAL_REG_STALL
> > + && optimize_insn_for_size_p ()
> > + && ix86_match_ccmode (insn, CCNOmode)
> > + && (INTVAL (operands[3]) & 0x80) == 0
> >
> > val_signbit_known_clear_p (QImode, INTVAL (operands[3]))
> >
> > + /* NON_Q_REG_P (operands[2]) : %sil, %dil, %bpl, %spl. */
> > + && LEGACY_INT_REG_P (operands[2]) && !QI_REGNO_P (REGNO
> > (operands[2]))
> > + && peep2_reg_dead_p (1, operands[2])"
> >
> > Do we really need the above condition? Operand 2 is effectively inout operand,
> > and when ANDed in SImode with zero-extended immediate, it will result in the
> > same QImode value. The value outside QImode is not guaranteed without using
> > strict_low_part.
> >
> > + [(parallel
> > + [(set (match_dup 0)
> > + (match_op_dup 1 [(and:SI (match_dup 2) (match_dup 3))
> > + (const_int 0)]))
> > + (set (match_dup 2)
> > + (and:SI (match_dup 2) (match_dup 3)))])] {
> > + operands[2] = gen_rtx_REG (SImode, REGNO (operands[2]));
> >
> > gen_lowpart (Simode, operands[2]) should work here.
> >
> > + operands[3] = gen_int_mode (INTVAL (operands[3]) & 0xff, SImode);
> >
> > No need to convert the immediate to SImode. The insn condition also
> > guarantees it to be unsigned.
> >
> > +})
> >
> > Uros.
> >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2834 bytes --]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c74edd1aaef..adad211c617 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9502,14 +9502,14 @@
[(set (reg FLAGS_REG)
(compare
(and:QI
- (match_operand:QI 0 "nonimmediate_operand" "%qm,*a,qm,r")
- (match_operand:QI 1 "nonmemory_operand" "q,n,n,n"))
+ (match_operand:QI 0 "nonimmediate_operand" "%qm,qm,r")
+ (match_operand:QI 1 "nonmemory_operand" "q,n,n"))
(const_int 0)))]
"ix86_match_ccmode (insn,
CONST_INT_P (operands[1])
&& INTVAL (operands[1]) >= 0 ? CCNOmode : CCZmode)"
{
- if (which_alternative == 3)
+ if (get_attr_mode (insn) == MODE_SI)
{
if (CONST_INT_P (operands[1]) && INTVAL (operands[1]) < 0)
operands[1] = GEN_INT (INTVAL (operands[1]) & 0xff);
@@ -9518,8 +9518,16 @@
return "test{b}\t{%1, %0|%0, %1}";
}
[(set_attr "type" "test")
- (set_attr "mode" "QI,QI,QI,SI")
- (set_attr "pent_pair" "uv,uv,np,np")])
+ (set (attr "mode")
+ (cond [(eq_attr "alternative" "2")
+ (const_string "SI")
+ (and (match_test "optimize_insn_for_size_p ()")
+ (and (match_operand 0 "ext_QIreg_operand")
+ (match_operand 1 "const_0_to_127_operand")))
+ (const_string "SI")
+ ]
+ (const_string "QI")))
+ (set_attr "pent_pair" "uv,np,np")])
(define_insn "*test<mode>_1"
[(set (reg FLAGS_REG)
@@ -10110,7 +10118,7 @@
CONST_INT_P (operands[2])
&& INTVAL (operands[2]) >= 0 ? CCNOmode : CCZmode)"
{
- if (which_alternative == 2)
+ if (get_attr_mode (insn) == MODE_SI)
{
if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) < 0)
operands[2] = GEN_INT (INTVAL (operands[2]) & 0xff);
@@ -10119,7 +10127,15 @@
return "and{b}\t{%2, %0|%0, %2}";
}
[(set_attr "type" "alu")
- (set_attr "mode" "QI,QI,SI")
+ (set (attr "mode")
+ (cond [(eq_attr "alternative" "2")
+ (const_string "SI")
+ (and (match_test "optimize_insn_for_size_p ()")
+ (and (match_operand 0 "ext_QIreg_operand")
+ (match_operand 2 "const_0_to_127_operand")))
+ (const_string "SI")
+ ]
+ (const_string "QI")))
;; Potential partial reg stall on alternative 2.
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "2")
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index a8cc17a054d..848a79a8d16 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -906,6 +906,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 63)")))
+;; Match 0 to 127.
+(define_predicate "const_0_to_127_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), 0, 127)")))
+
;; Match 0 to 255.
(define_predicate "const_0_to_255_operand"
(and (match_code "const_int")
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-13 9:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 15:42 [x86_64 PATCH] Avoid andb %dil when optimizing for size Roger Sayle
2022-04-12 16:36 ` Uros Bizjak
2022-04-13 7:08 ` Roger Sayle
2022-04-13 9:26 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).