* [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
@ 2021-08-13 0:48 Hongyu Wang
2021-08-13 0:57 ` Hongyu Wang
2021-08-13 7:21 ` Uros Bizjak
0 siblings, 2 replies; 8+ messages in thread
From: Hongyu Wang @ 2021-08-13 0:48 UTC (permalink / raw)
To: ubizjak; +Cc: gcc-patches
Hi,
For lea + zero_extendsidi insns, if dest of lea and src of zext are the
same, combine them with single leal under 64bit target since 32bit
register will be automatically zero-extended.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
PR target/101716
* config/i386/i386.md (*lea<mode>_zext): New define_insn.
(define_peephole2): New peephole2 to combine zero_extend
with lea.
gcc/testsuite/ChangeLog:
PR target/101716
* gcc.target/i386/pr101716.c: New test.
---
gcc/config/i386/i386.md | 20 ++++++++++++++++++++
gcc/testsuite/gcc.target/i386/pr101716.c | 11 +++++++++++
2 files changed, 31 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/pr101716.c
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4a8e8fea290..6739dbd799b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5187,6 +5187,26 @@
(const_string "SI")
(const_string "<MODE>")))])
+;; combine zero_extendsidi with lea to use leal.
+(define_insn "*lea<mode>_zext"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (zero_extend:DI
+ (match_operand:SWI48 1 "address_no_seg_operand" "Ts")))]
+ "TARGET_64BIT"
+ "lea{l}\t{%E1, %k0|%k0,%E1}")
+
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "address_no_seg_operand"))
+ (set (match_operand:DI 2 "general_reg_operand")
+ (zero_extend:DI (match_operand:SI 3 "general_reg_operand")))]
+ "TARGET_64BIT && ix86_hardreg_mov_ok (operands[2], operands[1])
+ && REGNO (operands[0]) == REGNO (operands[3])
+ && (REGNO (operands[2]) == REGNO (operands[3])
+ || peep2_reg_dead_p (2, operands[3]))"
+ [(set (match_dup 2)
+ (zero_extend:DI (match_dup 1)))])
+
(define_peephole2
[(set (match_operand:SWI48 0 "register_operand")
(match_operand:SWI48 1 "address_no_seg_operand"))]
diff --git a/gcc/testsuite/gcc.target/i386/pr101716.c b/gcc/testsuite/gcc.target/i386/pr101716.c
new file mode 100644
index 00000000000..0b684755c2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101716.c
@@ -0,0 +1,11 @@
+/* PR target/101716 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler "leal\[\\t \]\*eax" } } */
+/* { dg-final { scan-assembler-not "movl\[\\t \]\*eax" } } */
+
+unsigned long long sample1(unsigned long long m) {
+ unsigned int t = -1;
+ return (m << 1) & t;
+}
--
2.18.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-13 0:48 [PATCH] i386: Add peephole for lea and zero extend [PR 101716] Hongyu Wang
@ 2021-08-13 0:57 ` Hongyu Wang
2021-08-13 7:21 ` Uros Bizjak
1 sibling, 0 replies; 8+ messages in thread
From: Hongyu Wang @ 2021-08-13 0:57 UTC (permalink / raw)
To: Hongyu Wang; +Cc: Uros Bizjak, GCC Patches
Sorry for the typo, scan-assembler should be
+/* { dg-final { scan-assembler "leal\[\\t \]\[^\\n\]*eax" } } */
+/* { dg-final { scan-assembler-not "movl\[\\t \]\[^\\n\]*eax" } } */
Hongyu Wang via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年8月13日周五 上午8:49写道:
>
> Hi,
>
> For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> same, combine them with single leal under 64bit target since 32bit
> register will be automatically zero-extended.
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for master?
>
> gcc/ChangeLog:
>
> PR target/101716
> * config/i386/i386.md (*lea<mode>_zext): New define_insn.
> (define_peephole2): New peephole2 to combine zero_extend
> with lea.
>
> gcc/testsuite/ChangeLog:
>
> PR target/101716
> * gcc.target/i386/pr101716.c: New test.
> ---
> gcc/config/i386/i386.md | 20 ++++++++++++++++++++
> gcc/testsuite/gcc.target/i386/pr101716.c | 11 +++++++++++
> 2 files changed, 31 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr101716.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 4a8e8fea290..6739dbd799b 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5187,6 +5187,26 @@
> (const_string "SI")
> (const_string "<MODE>")))])
>
> +;; combine zero_extendsidi with lea to use leal.
> +(define_insn "*lea<mode>_zext"
> + [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> + (match_operand:SWI48 1 "address_no_seg_operand" "Ts")))]
> + "TARGET_64BIT"
> + "lea{l}\t{%E1, %k0|%k0,%E1}")
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "address_no_seg_operand"))
> + (set (match_operand:DI 2 "general_reg_operand")
> + (zero_extend:DI (match_operand:SI 3 "general_reg_operand")))]
> + "TARGET_64BIT && ix86_hardreg_mov_ok (operands[2], operands[1])
> + && REGNO (operands[0]) == REGNO (operands[3])
> + && (REGNO (operands[2]) == REGNO (operands[3])
> + || peep2_reg_dead_p (2, operands[3]))"
> + [(set (match_dup 2)
> + (zero_extend:DI (match_dup 1)))])
> +
> (define_peephole2
> [(set (match_operand:SWI48 0 "register_operand")
> (match_operand:SWI48 1 "address_no_seg_operand"))]
> diff --git a/gcc/testsuite/gcc.target/i386/pr101716.c b/gcc/testsuite/gcc.target/i386/pr101716.c
> new file mode 100644
> index 00000000000..0b684755c2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101716.c
> @@ -0,0 +1,11 @@
> +/* PR target/101716 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +/* { dg-final { scan-assembler "leal\[\\t \]\*eax" } } */
> +/* { dg-final { scan-assembler-not "movl\[\\t \]\*eax" } } */
> +
> +unsigned long long sample1(unsigned long long m) {
> + unsigned int t = -1;
> + return (m << 1) & t;
> +}
> --
> 2.18.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-13 0:48 [PATCH] i386: Add peephole for lea and zero extend [PR 101716] Hongyu Wang
2021-08-13 0:57 ` Hongyu Wang
@ 2021-08-13 7:21 ` Uros Bizjak
2021-08-16 8:11 ` Uros Bizjak
1 sibling, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2021-08-13 7:21 UTC (permalink / raw)
To: Hongyu Wang; +Cc: gcc-patches
On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> same, combine them with single leal under 64bit target since 32bit
> register will be automatically zero-extended.
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for master?
>
> gcc/ChangeLog:
>
> PR target/101716
> * config/i386/i386.md (*lea<mode>_zext): New define_insn.
> (define_peephole2): New peephole2 to combine zero_extend
> with lea.
>
> gcc/testsuite/ChangeLog:
>
> PR target/101716
> * gcc.target/i386/pr101716.c: New test.
This form should be covered by ix86_decompose_address via
address_no_seg_operand predicate. Combine creates:
Trying 6 -> 7:
6: {r86:DI=r87:DI<<0x1;clobber flags:CC;}
REG_DEAD r87:DI
REG_UNUSED flags:CC
7: r85:DI=zero_extend(r86:DI#0)
REG_DEAD r86:DI
Failed to match this instruction:
(set (reg:DI 85)
(and:DI (ashift:DI (reg:DI 87)
(const_int 1 [0x1]))
(const_int 4294967294 [0xfffffffe])))
which does not fit:
else if (GET_CODE (addr) == AND
&& const_32bit_mask (XEXP (addr, 1), DImode))
After reload, we lose SUBREG, so REE does not trigger on:
(insn 17 3 7 2 (set (reg:DI 0 ax [86])
(mult:DI (reg:DI 5 di [87])
(const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi}
(nil))
(insn 7 17 13 2 (set (reg:DI 0 ax [85])
(zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136
{*zero_extendsidi2}
(nil))
So, the question is if the combine pass really needs to zero-extend
with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
0xffffffff should be better and in line with canonical zero-extension
RTX.
> ---
> gcc/config/i386/i386.md | 20 ++++++++++++++++++++
> gcc/testsuite/gcc.target/i386/pr101716.c | 11 +++++++++++
> 2 files changed, 31 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr101716.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 4a8e8fea290..6739dbd799b 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5187,6 +5187,26 @@
> (const_string "SI")
> (const_string "<MODE>")))])
>
> +;; combine zero_extendsidi with lea to use leal.
> +(define_insn "*lea<mode>_zext"
> + [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> + (match_operand:SWI48 1 "address_no_seg_operand" "Ts")))]
> + "TARGET_64BIT"
> + "lea{l}\t{%E1, %k0|%k0,%E1}")
The above can lead to invalid RTX: (zero_extend:DI (... DImode RTX)).
Uros.
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "address_no_seg_operand"))
> + (set (match_operand:DI 2 "general_reg_operand")
> + (zero_extend:DI (match_operand:SI 3 "general_reg_operand")))]
> + "TARGET_64BIT && ix86_hardreg_mov_ok (operands[2], operands[1])
> + && REGNO (operands[0]) == REGNO (operands[3])
> + && (REGNO (operands[2]) == REGNO (operands[3])
> + || peep2_reg_dead_p (2, operands[3]))"
> + [(set (match_dup 2)
> + (zero_extend:DI (match_dup 1)))])
> +
> (define_peephole2
> [(set (match_operand:SWI48 0 "register_operand")
> (match_operand:SWI48 1 "address_no_seg_operand"))]
> diff --git a/gcc/testsuite/gcc.target/i386/pr101716.c b/gcc/testsuite/gcc.target/i386/pr101716.c
> new file mode 100644
> index 00000000000..0b684755c2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101716.c
> @@ -0,0 +1,11 @@
> +/* PR target/101716 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +/* { dg-final { scan-assembler "leal\[\\t \]\*eax" } } */
> +/* { dg-final { scan-assembler-not "movl\[\\t \]\*eax" } } */
> +
> +unsigned long long sample1(unsigned long long m) {
> + unsigned int t = -1;
> + return (m << 1) & t;
> +}
> --
> 2.18.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-13 7:21 ` Uros Bizjak
@ 2021-08-16 8:11 ` Uros Bizjak
2021-08-16 9:13 ` Hongyu Wang
0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2021-08-16 8:11 UTC (permalink / raw)
To: Hongyu Wang; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2368 bytes --]
On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Hi,
> >
> > For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> > same, combine them with single leal under 64bit target since 32bit
> > register will be automatically zero-extended.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/101716
> > * config/i386/i386.md (*lea<mode>_zext): New define_insn.
> > (define_peephole2): New peephole2 to combine zero_extend
> > with lea.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/101716
> > * gcc.target/i386/pr101716.c: New test.
>
> This form should be covered by ix86_decompose_address via
> address_no_seg_operand predicate. Combine creates:
>
> Trying 6 -> 7:
> 6: {r86:DI=r87:DI<<0x1;clobber flags:CC;}
> REG_DEAD r87:DI
> REG_UNUSED flags:CC
> 7: r85:DI=zero_extend(r86:DI#0)
> REG_DEAD r86:DI
> Failed to match this instruction:
> (set (reg:DI 85)
> (and:DI (ashift:DI (reg:DI 87)
> (const_int 1 [0x1]))
> (const_int 4294967294 [0xfffffffe])))
>
> which does not fit:
>
> else if (GET_CODE (addr) == AND
> && const_32bit_mask (XEXP (addr, 1), DImode))
>
> After reload, we lose SUBREG, so REE does not trigger on:
>
> (insn 17 3 7 2 (set (reg:DI 0 ax [86])
> (mult:DI (reg:DI 5 di [87])
> (const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi}
> (nil))
> (insn 7 17 13 2 (set (reg:DI 0 ax [85])
> (zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136
> {*zero_extendsidi2}
> (nil))
>
> So, the question is if the combine pass really needs to zero-extend
> with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> 0xffffffff should be better and in line with canonical zero-extension
> RTX.
Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
call in ix86_legitimate_address_p) for some (historic?) reason. It
looks to me that this restriction is not necessary, since
ix86_legitimize_address can canonicalize ASHIFT RTXes without
problems. The attached patch that survives bootstrap and regtest can
help in your case.
Uros.
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1604 bytes --]
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a03d6..9395716dd60 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10018,8 +10018,7 @@ ix86_live_on_entry (bitmap regs)
\f
/* Extract the parts of an RTL expression that is a valid memory address
for an instruction. Return 0 if the structure of the address is
- grossly off. Return -1 if the address contains ASHIFT, so it is not
- strictly valid, but still used for computing length of lea instruction. */
+ grossly off. */
int
ix86_decompose_address (rtx addr, struct ix86_address *out)
@@ -10029,7 +10028,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out)
HOST_WIDE_INT scale = 1;
rtx scale_rtx = NULL_RTX;
rtx tmp;
- int retval = 1;
addr_space_t seg = ADDR_SPACE_GENERIC;
/* Allow zero-extended SImode addresses,
@@ -10179,7 +10177,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out)
if ((unsigned HOST_WIDE_INT) scale > 3)
return 0;
scale = 1 << scale;
- retval = -1;
}
else
disp = addr; /* displacement */
@@ -10252,7 +10249,7 @@ ix86_decompose_address (rtx addr, struct ix86_address *out)
out->scale = scale;
out->seg = seg;
- return retval;
+ return 1;
}
\f
/* Return cost of the memory address x.
@@ -10765,7 +10762,7 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict)
HOST_WIDE_INT scale;
addr_space_t seg;
- if (ix86_decompose_address (addr, &parts) <= 0)
+ if (ix86_decompose_address (addr, &parts) == 0)
/* Decomposition failed. */
return false;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-16 8:11 ` Uros Bizjak
@ 2021-08-16 9:13 ` Hongyu Wang
2021-08-16 9:26 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Hongyu Wang @ 2021-08-16 9:13 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Hongyu Wang, gcc-patches
> So, the question is if the combine pass really needs to zero-extend
> with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> 0xffffffff should be better and in line with canonical zero-extension
> RTX.
The shift mask is generated in simplify_shift_const_1:
mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
int_result_mode);
rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
mask_rtx
= simplify_const_binary_operation (code, int_result_mode,
mask_rtx, count_rtx);
Can we adjust the count for ashift if nonzero_bits overlaps it?
> Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> call in ix86_legitimate_address_p) for some (historic?) reason. It
> looks to me that this restriction is not necessary, since
> ix86_legitimize_address can canonicalize ASHIFT RTXes without
> problems. The attached patch that survives bootstrap and regtest can
> help in your case.
We have a split to transform ashift to mult, I'm afraid it could not
help this issue.
Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年8月16日周一 下午4:12写道:
>
> On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > >
> > > Hi,
> > >
> > > For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> > > same, combine them with single leal under 64bit target since 32bit
> > > register will be automatically zero-extended.
> > >
> > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > Ok for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/101716
> > > * config/i386/i386.md (*lea<mode>_zext): New define_insn.
> > > (define_peephole2): New peephole2 to combine zero_extend
> > > with lea.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/101716
> > > * gcc.target/i386/pr101716.c: New test.
> >
> > This form should be covered by ix86_decompose_address via
> > address_no_seg_operand predicate. Combine creates:
> >
> > Trying 6 -> 7:
> > 6: {r86:DI=r87:DI<<0x1;clobber flags:CC;}
> > REG_DEAD r87:DI
> > REG_UNUSED flags:CC
> > 7: r85:DI=zero_extend(r86:DI#0)
> > REG_DEAD r86:DI
> > Failed to match this instruction:
> > (set (reg:DI 85)
> > (and:DI (ashift:DI (reg:DI 87)
> > (const_int 1 [0x1]))
> > (const_int 4294967294 [0xfffffffe])))
> >
> > which does not fit:
> >
> > else if (GET_CODE (addr) == AND
> > && const_32bit_mask (XEXP (addr, 1), DImode))
> >
> > After reload, we lose SUBREG, so REE does not trigger on:
> >
> > (insn 17 3 7 2 (set (reg:DI 0 ax [86])
> > (mult:DI (reg:DI 5 di [87])
> > (const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi}
> > (nil))
> > (insn 7 17 13 2 (set (reg:DI 0 ax [85])
> > (zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136
> > {*zero_extendsidi2}
> > (nil))
> >
> > So, the question is if the combine pass really needs to zero-extend
> > with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> > 0xffffffff should be better and in line with canonical zero-extension
> > RTX.
>
> Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> call in ix86_legitimate_address_p) for some (historic?) reason. It
> looks to me that this restriction is not necessary, since
> ix86_legitimize_address can canonicalize ASHIFT RTXes without
> problems. The attached patch that survives bootstrap and regtest can
> help in your case.
>
> Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-16 9:13 ` Hongyu Wang
@ 2021-08-16 9:26 ` Uros Bizjak
2021-08-24 15:22 ` Hongyu Wang
0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2021-08-16 9:26 UTC (permalink / raw)
To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches
On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > So, the question is if the combine pass really needs to zero-extend
> > with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> > 0xffffffff should be better and in line with canonical zero-extension
> > RTX.
>
> The shift mask is generated in simplify_shift_const_1:
>
> mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
> int_result_mode);
> rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> mask_rtx
> = simplify_const_binary_operation (code, int_result_mode,
> mask_rtx, count_rtx);
>
> Can we adjust the count for ashift if nonzero_bits overlaps it?
>
> > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> > call in ix86_legitimate_address_p) for some (historic?) reason. It
> > looks to me that this restriction is not necessary, since
> > ix86_legitimize_address can canonicalize ASHIFT RTXes without
> > problems. The attached patch that survives bootstrap and regtest can
> > help in your case.
>
> We have a split to transform ashift to mult, I'm afraid it could not
> help this issue.
If you want existing *lea<mode> to accept ASHIFT RTX, it uses
address_no_seg_operand predicate which uses address_operand predicate,
which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes.
Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-16 9:26 ` Uros Bizjak
@ 2021-08-24 15:22 ` Hongyu Wang
2021-08-25 7:45 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Hongyu Wang @ 2021-08-24 15:22 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Hongyu Wang, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]
Hi Uros,
Sorry for the late update. I have tried adjusting the combine pass but
found it is not easy to modify shift const, so I came up with an
alternative solution with your patch. It matches the non-canonical
zero-extend in ix86_decompose_address and adjust ix86_rtx_cost to
combine below pattern
(set (reg:DI 85)
(and:DI (ashift:DI (reg:DI 87)
(const_int 1 [0x1]))
(const_int 4294967294 [0xfffffffe])))
Survived bootstrap and regtest on x86-64-linux. Ok for master?
Uros Bizjak <ubizjak@gmail.com> 于2021年8月16日周一 下午5:26写道:
>
> On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> >
> > > So, the question is if the combine pass really needs to zero-extend
> > > with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> > > 0xffffffff should be better and in line with canonical zero-extension
> > > RTX.
> >
> > The shift mask is generated in simplify_shift_const_1:
> >
> > mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
> > int_result_mode);
> > rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> > mask_rtx
> > = simplify_const_binary_operation (code, int_result_mode,
> > mask_rtx, count_rtx);
> >
> > Can we adjust the count for ashift if nonzero_bits overlaps it?
> >
> > > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> > > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> > > call in ix86_legitimate_address_p) for some (historic?) reason. It
> > > looks to me that this restriction is not necessary, since
> > > ix86_legitimize_address can canonicalize ASHIFT RTXes without
> > > problems. The attached patch that survives bootstrap and regtest can
> > > help in your case.
> >
> > We have a split to transform ashift to mult, I'm afraid it could not
> > help this issue.
>
> If you want existing *lea<mode> to accept ASHIFT RTX, it uses
> address_no_seg_operand predicate which uses address_operand predicate,
> which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes.
>
> Uros.
[-- Attachment #2: 0001-i386-Optimize-lea-with-zero-extend.-PR-101716.patch --]
[-- Type: application/x-patch, Size: 4492 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]
2021-08-24 15:22 ` Hongyu Wang
@ 2021-08-25 7:45 ` Uros Bizjak
0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2021-08-25 7:45 UTC (permalink / raw)
To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches
On Tue, Aug 24, 2021 at 5:22 PM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> Hi Uros,
>
> Sorry for the late update. I have tried adjusting the combine pass but
> found it is not easy to modify shift const, so I came up with an
> alternative solution with your patch. It matches the non-canonical
> zero-extend in ix86_decompose_address and adjust ix86_rtx_cost to
> combine below pattern
>
> (set (reg:DI 85)
> (and:DI (ashift:DI (reg:DI 87)
> (const_int 1 [0x1]))
> (const_int 4294967294 [0xfffffffe])))
>
> Survived bootstrap and regtest on x86-64-linux. Ok for master?
gcc/ChangeLog:
PR target/101716
* config/i386/i386.c (ix86_live_on_entry): Adjust comment.
(ix86_decompose_address): Remove retval check for ASHIFT,
allow non-canonical zero extend if AND mask covers ASHIFT
count.
(ix86_legitimate_address_p): Adjust condition for decompose.
(ix86_rtx_costs): Adjust cost for lea with non-canonical
zero-extend.
OK.
Thanks,
Uros.
> Uros Bizjak <ubizjak@gmail.com> 于2021年8月16日周一 下午5:26写道:
>
> >
> > On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > >
> > > > So, the question is if the combine pass really needs to zero-extend
> > > > with 0xfffffffe, the left shift << 1 guarantees zero in the LSB, so
> > > > 0xffffffff should be better and in line with canonical zero-extension
> > > > RTX.
> > >
> > > The shift mask is generated in simplify_shift_const_1:
> > >
> > > mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
> > > int_result_mode);
> > > rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> > > mask_rtx
> > > = simplify_const_binary_operation (code, int_result_mode,
> > > mask_rtx, count_rtx);
> > >
> > > Can we adjust the count for ashift if nonzero_bits overlaps it?
> > >
> > > > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> > > > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> > > > call in ix86_legitimate_address_p) for some (historic?) reason. It
> > > > looks to me that this restriction is not necessary, since
> > > > ix86_legitimize_address can canonicalize ASHIFT RTXes without
> > > > problems. The attached patch that survives bootstrap and regtest can
> > > > help in your case.
> > >
> > > We have a split to transform ashift to mult, I'm afraid it could not
> > > help this issue.
> >
> > If you want existing *lea<mode> to accept ASHIFT RTX, it uses
> > address_no_seg_operand predicate which uses address_operand predicate,
> > which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes.
> >
> > Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-08-25 7:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 0:48 [PATCH] i386: Add peephole for lea and zero extend [PR 101716] Hongyu Wang
2021-08-13 0:57 ` Hongyu Wang
2021-08-13 7:21 ` Uros Bizjak
2021-08-16 8:11 ` Uros Bizjak
2021-08-16 9:13 ` Hongyu Wang
2021-08-16 9:26 ` Uros Bizjak
2021-08-24 15:22 ` Hongyu Wang
2021-08-25 7:45 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).