public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] x86_64: Use peephole2 to eliminate redundant moves.
@ 2020-08-11  7:34 Roger Sayle
  2020-08-11 10:14 ` Uros Bizjak
  0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2020-08-11  7:34 UTC (permalink / raw)
  To: 'GCC Patches'; +Cc: 'Uros Bizjak'

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]


The recent fix for mul_widen_cost revealed an interesting
quirk of ira/reload register allocation on x86_64.  As shown in
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html
for gcc.target/i386/pr71321.c we generate the following code that
performs unnecessary register shuffling.

        movl    $-51, %edx
        movl    %edx, %eax
        mulb    %dil

which is caused by reload generating the following instructions
(notice the set of the first register is dead in the 2nd insn):

(insn 7 4 36 2 (set (reg:QI 1 dx [94])
        (const_int -51 [0xffffffffffffffcd])) {*movqi_internal}
     (expr_list:REG_EQUIV (const_int -51 [0xffffffffffffffcd])
        (nil)))
(insn 36 7 8 2 (set (reg:QI 0 ax [93])
        (reg:QI 1 dx [94])) {*movqi_internal}
     (expr_list:REG_DEAD (reg:QI 1 dx [94])
        (nil)))

Various discussions in bugzilla seem to point to reload preferring
not to load constants directly into CLASS_LIKELY_SPILLED_P registers.
Whatever the cause, one solution (workaround), that doesn't involve
rewriting a register allocator, is to use peephole2 to spot this
weirdness and eliminate it.  In fact, this use case is (probably)
the reason peephole optimizers were originally developed, but it's
a little disappointing this application of them is still required
today.  On a positive note, this clean-up is cheap, as we're already
traversing the instruction stream with liveness (REG_DEAD notes)
already calculated.

With this peephole2 the above three instructions (from pr71321.c)
are replaced with:

        movl    $-51, %eax
        mulb    %dil

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  This peephole triggers
1435 during stage2 and stage3 of a bootstrap, and a further 1274
times during "make check".  The most common case is DX_REG->AX_REG
(as above) which occurs 421 times.  I've restricted this pattern to
immediate constant loads into general operand registers, which fixes
this particular problem, but broader predicates may help similar cases.
Ok for mainline?

2020-08-11  Roger Sayle  <roger@nextmovesoftware.com>

	* config/i386/i386.md (peephole2): Reduce unnecessary
	register shuffling produced by register allocation.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK


[-- Attachment #2: patchr2.txt --]
[-- Type: text/plain, Size: 730 bytes --]

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4e916bf..34a8946 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18946,6 +18946,16 @@
   operands[2] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
   ix86_expand_clear (operands[1]);
 })
+
+;; Reload dislikes loading constants directly into class_likely_spilled
+;; hard registers.  Try to tidy things up here.
+(define_peephole2
+  [(set (match_operand:SWI 0 "general_reg_operand")
+	(match_operand:SWI 1 "x86_64_immediate_operand"))
+   (set (match_operand:SWI 2 "general_reg_operand")
+	(match_dup 0))]
+  "peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2) (match_dup 1))])
 \f
 ;; Misc patterns (?)
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86_64: Use peephole2 to eliminate redundant moves.
  2020-08-11  7:34 [PATCH] x86_64: Use peephole2 to eliminate redundant moves Roger Sayle
@ 2020-08-11 10:14 ` Uros Bizjak
  2020-08-12  7:55   ` Roger Sayle
  0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2020-08-11 10:14 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches

On Tue, Aug 11, 2020 at 9:34 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> The recent fix for mul_widen_cost revealed an interesting
> quirk of ira/reload register allocation on x86_64.  As shown in
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html
> for gcc.target/i386/pr71321.c we generate the following code that
> performs unnecessary register shuffling.
>
>         movl    $-51, %edx
>         movl    %edx, %eax
>         mulb    %dil
>
> which is caused by reload generating the following instructions
> (notice the set of the first register is dead in the 2nd insn):
>
> (insn 7 4 36 2 (set (reg:QI 1 dx [94])
>         (const_int -51 [0xffffffffffffffcd])) {*movqi_internal}
>      (expr_list:REG_EQUIV (const_int -51 [0xffffffffffffffcd])
>         (nil)))
> (insn 36 7 8 2 (set (reg:QI 0 ax [93])
>         (reg:QI 1 dx [94])) {*movqi_internal}
>      (expr_list:REG_DEAD (reg:QI 1 dx [94])
>         (nil)))
>
> Various discussions in bugzilla seem to point to reload preferring
> not to load constants directly into CLASS_LIKELY_SPILLED_P registers.

This can extend the lifetime of a register over the instruction that
needs one of the CLASS_LIKELY_SPILLED_P registers. Various MUL, DIV
and even shift insns were able to choke the allocator for x86 targets,
so this is a small price to pay to avoid regalloc failure.

> Whatever the cause, one solution (workaround), that doesn't involve
> rewriting a register allocator, is to use peephole2 to spot this
> weirdness and eliminate it.  In fact, this use case is (probably)
> the reason peephole optimizers were originally developed, but it's
> a little disappointing this application of them is still required
> today.  On a positive note, this clean-up is cheap, as we're already
> traversing the instruction stream with liveness (REG_DEAD notes)
> already calculated.
>
> With this peephole2 the above three instructions (from pr71321.c)
> are replaced with:
>
>         movl    $-51, %eax
>         mulb    %dil
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  This peephole triggers
> 1435 during stage2 and stage3 of a bootstrap, and a further 1274
> times during "make check".  The most common case is DX_REG->AX_REG
> (as above) which occurs 421 times.  I've restricted this pattern to
> immediate constant loads into general operand registers, which fixes
> this particular problem, but broader predicates may help similar cases.
> Ok for mainline?
>
> 2020-08-11  Roger Sayle  <roger@nextmovesoftware.com>
>
>         * config/i386/i386.md (peephole2): Reduce unnecessary
>         register shuffling produced by register allocation.

LGTM, but I wonder if the allocator is also too conservative with
memory operands. Perhaps x86_64_general_operand can be used here.

Uros.
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] x86_64: Use peephole2 to eliminate redundant moves.
  2020-08-11 10:14 ` Uros Bizjak
@ 2020-08-12  7:55   ` Roger Sayle
  0 siblings, 0 replies; 3+ messages in thread
From: Roger Sayle @ 2020-08-12  7:55 UTC (permalink / raw)
  To: 'Uros Bizjak'; +Cc: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 4059 bytes --]

Hi Uros,

Many thanks for the review, and your help/explanation around constant
materialization on x86_64/i386.

Your suggestion to try x86_64_general_operand as a predicate was awesome!
Not only does this improvement survive "make bootstrap" and "make -k check"
on x86_64-pc-linux-gnu, and fix the previous test case, but this now triggers
15070 times during stage2 and stage3, and an additional 8926 times
during make check.  Many thanks for the impressive improvement.

Here's the version as committed.

2020-08-12  Roger Sayle  <roger@nextmovesoftware.com>
            Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
	* config/i386/i386.md (peephole2): Reduce unnecessary
	register shuffling produced by register allocation.

Cheers,
Roger
--

-----Original Message-----
From: Uros Bizjak <ubizjak@gmail.com> 
Sent: 11 August 2020 11:14
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86_64: Use peephole2 to eliminate redundant moves.

On Tue, Aug 11, 2020 at 9:34 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> The recent fix for mul_widen_cost revealed an interesting quirk of 
> ira/reload register allocation on x86_64.  As shown in 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html
> for gcc.target/i386/pr71321.c we generate the following code that 
> performs unnecessary register shuffling.
>
>         movl    $-51, %edx
>         movl    %edx, %eax
>         mulb    %dil
>
> which is caused by reload generating the following instructions 
> (notice the set of the first register is dead in the 2nd insn):
>
> (insn 7 4 36 2 (set (reg:QI 1 dx [94])
>         (const_int -51 [0xffffffffffffffcd])) {*movqi_internal}
>      (expr_list:REG_EQUIV (const_int -51 [0xffffffffffffffcd])
>         (nil)))
> (insn 36 7 8 2 (set (reg:QI 0 ax [93])
>         (reg:QI 1 dx [94])) {*movqi_internal}
>      (expr_list:REG_DEAD (reg:QI 1 dx [94])
>         (nil)))
>
> Various discussions in bugzilla seem to point to reload preferring not 
> to load constants directly into CLASS_LIKELY_SPILLED_P registers.

This can extend the lifetime of a register over the instruction that needs one of the CLASS_LIKELY_SPILLED_P registers. Various MUL, DIV and even shift insns were able to choke the allocator for x86 targets, so this is a small price to pay to avoid regalloc failure.

> Whatever the cause, one solution (workaround), that doesn't involve 
> rewriting a register allocator, is to use peephole2 to spot this 
> weirdness and eliminate it.  In fact, this use case is (probably) the 
> reason peephole optimizers were originally developed, but it's a 
> little disappointing this application of them is still required today.  
> On a positive note, this clean-up is cheap, as we're already 
> traversing the instruction stream with liveness (REG_DEAD notes) 
> already calculated.
>
> With this peephole2 the above three instructions (from pr71321.c) are 
> replaced with:
>
>         movl    $-51, %eax
>         mulb    %dil
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  This peephole triggers
> 1435 during stage2 and stage3 of a bootstrap, and a further 1274 times 
> during "make check".  The most common case is DX_REG->AX_REG (as 
> above) which occurs 421 times.  I've restricted this pattern to 
> immediate constant loads into general operand registers, which fixes 
> this particular problem, but broader predicates may help similar cases.
> Ok for mainline?
>
> 2020-08-11  Roger Sayle  <roger@nextmovesoftware.com>
>
>         * config/i386/i386.md (peephole2): Reduce unnecessary
>         register shuffling produced by register allocation.

LGTM, but I wonder if the allocator is also too conservative with memory operands. Perhaps x86_64_general_operand can be used here.

Uros.
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

[-- Attachment #2: patchr3.txt --]
[-- Type: text/plain, Size: 728 bytes --]

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4e916bf..f3799ac 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18946,6 +18946,16 @@
   operands[2] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
   ix86_expand_clear (operands[1]);
 })
+
+;; Reload dislikes loading constants directly into class_likely_spilled
+;; hard registers.  Try to tidy things up here.
+(define_peephole2
+  [(set (match_operand:SWI 0 "general_reg_operand")
+	(match_operand:SWI 1 "x86_64_general_operand"))
+   (set (match_operand:SWI 2 "general_reg_operand")
+	(match_dup 0))]
+  "peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2) (match_dup 1))])
 \f
 ;; Misc patterns (?)
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-12  7:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11  7:34 [PATCH] x86_64: Use peephole2 to eliminate redundant moves Roger Sayle
2020-08-11 10:14 ` Uros Bizjak
2020-08-12  7:55   ` Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).