* [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) @ 2019-01-07 22:51 Jakub Jelinek 2019-01-08 7:29 ` Uros Bizjak 2019-01-11 19:03 ` Jeff Law 0 siblings, 2 replies; 7+ messages in thread From: Jakub Jelinek @ 2019-01-07 22:51 UTC (permalink / raw) To: Uros Bizjak, Jeff Law; +Cc: gcc-patches Hi! As mentioned in that PR, we have a SI->DImode zero extension and RA happens to choose to zero extend from a SImode memory slot which is the low part of the DImode memory slot into which the zero extension is to be stored. Unfortunately, the RTL DSE part really doesn't have infrastructure to remember and, if needed, invalidate loads, it just remembers stores, so handling this generically is quite unlikely at least for GCC9. This patch just handles that through a peephole2 (other option would be to handle it in the define_split for the zero extension, but the peephole2 is likely to catch more things). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-01-07 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/79593 * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. --- gcc/config/i386/i386.md.jj 2019-01-01 12:37:31.564738571 +0100 +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100 @@ -18740,6 +18740,21 @@ (define_peephole2 const0_rtx); }) +;; Attempt to optimize away memory stores of values the memory already +;; has. See PR79593. +(define_peephole2 + [(set (match_operand 0 "register_operand") + (match_operand 1 "memory_operand")) + (set (match_dup 1) (match_dup 0))] + "REG_P (operands[0]) + && !STACK_REGNO_P (operands[0]) + && !MEM_VOLATILE_P (operands[1])" + [(set (match_dup 0) (match_dup 1))] +{ + if (peep2_reg_dead_p (1, operands[0])) + DONE; +}) + ;; Attempt to always use XOR for zeroing registers (including FP modes). (define_peephole2 [(set (match_operand 0 "general_reg_operand") Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-07 22:51 [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) Jakub Jelinek @ 2019-01-08 7:29 ` Uros Bizjak 2019-01-08 9:27 ` Jakub Jelinek 2019-01-11 19:03 ` Jeff Law 1 sibling, 1 reply; 7+ messages in thread From: Uros Bizjak @ 2019-01-08 7:29 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches On Mon, Jan 7, 2019 at 11:51 PM Jakub Jelinek <jakub@redhat.com> wrote: > > Hi! > > As mentioned in that PR, we have a SI->DImode zero extension and RA happens > to choose to zero extend from a SImode memory slot which is the low part of > the DImode memory slot into which the zero extension is to be stored. > Unfortunately, the RTL DSE part really doesn't have infrastructure to > remember and, if needed, invalidate loads, it just remembers stores, so > handling this generically is quite unlikely at least for GCC9. > > This patch just handles that through a peephole2 (other option would be to > handle it in the define_split for the zero extension, but the peephole2 is > likely to catch more things). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Is there a reason stack registers are excluded? Before stackreg pass, these registers are just like other hard registers. Other that that, there is no need for REG_P predicate; after reload we don't have subregs and register_operand will match only hard regs. Also, please put peep2_reg_dead_p predicate in the pattern predicate. Uros. > 2019-01-07 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/79593 > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. > > --- gcc/config/i386/i386.md.jj 2019-01-01 12:37:31.564738571 +0100 > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100 > @@ -18740,6 +18740,21 @@ (define_peephole2 > const0_rtx); > }) > > +;; Attempt to optimize away memory stores of values the memory already > +;; has. See PR79593. > +(define_peephole2 > + [(set (match_operand 0 "register_operand") > + (match_operand 1 "memory_operand")) > + (set (match_dup 1) (match_dup 0))] > + "REG_P (operands[0]) > + && !STACK_REGNO_P (operands[0]) > + && !MEM_VOLATILE_P (operands[1])" > + [(set (match_dup 0) (match_dup 1))] > +{ > + if (peep2_reg_dead_p (1, operands[0])) > + DONE; > +}) > + > ;; Attempt to always use XOR for zeroing registers (including FP modes). > (define_peephole2 > [(set (match_operand 0 "general_reg_operand") > > Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-08 7:29 ` Uros Bizjak @ 2019-01-08 9:27 ` Jakub Jelinek 2019-01-08 10:49 ` Uros Bizjak 0 siblings, 1 reply; 7+ messages in thread From: Jakub Jelinek @ 2019-01-08 9:27 UTC (permalink / raw) To: Uros Bizjak; +Cc: Jeff Law, gcc-patches On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote: > Is there a reason stack registers are excluded? Before stackreg pass, > these registers are just like other hard registers. I was just afraid of those, after stack pass removing a store but not load would result in the stack getting out of sync, and I wasn't sure if the stack pass will tolerate a RA chosen stack reg not being touched anymore. Also, doesn't loading a float/double/long double into a stack reg and storing again canonicalize it in any way? Say NaNs? Or unnormal or pseudo-denormal long doubles etc.? Tried that now with a sNaN and the load/store didn't change it, but haven't tried with exceptions enabled to see if it would raise one. If that is fine, I'll leave that out. Looking around, I was using a wrong macro anyway and am surprised nothing complained, it should have been STACK_REG_P rather than STACK_REGNO_P. > Other that that, there is no need for REG_P predicate; after reload we > don't have subregs and register_operand will match only hard regs. Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if reload_completed, just disallows subregs of mem after that. > Also, please put peep2_reg_dead_p predicate in the pattern predicate. I don't see how, that would mean I'd have to write two peephole2s instead of one. It tries to deal with two different cases, one is where the temporary reg is dead, in that case we can optimize away both the load or store, the second case is where the temporary reg isn't dead, in that case we can optimize away the store, but not the load. With the optimizing away of both load and store I was just trying to do a cheap DCE there. Looking around more, I actually think I need to replace (match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p (operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit. > > 2019-01-07 Jakub Jelinek <jakub@redhat.com> > > > > PR rtl-optimization/79593 > > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. > > > > --- gcc/config/i386/i386.md.jj 2019-01-01 12:37:31.564738571 +0100 > > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100 > > @@ -18740,6 +18740,21 @@ (define_peephole2 > > const0_rtx); > > }) > > > > +;; Attempt to optimize away memory stores of values the memory already > > +;; has. See PR79593. > > +(define_peephole2 > > + [(set (match_operand 0 "register_operand") > > + (match_operand 1 "memory_operand")) > > + (set (match_dup 1) (match_dup 0))] > > + "REG_P (operands[0]) > > + && !STACK_REGNO_P (operands[0]) > > + && !MEM_VOLATILE_P (operands[1])" > > + [(set (match_dup 0) (match_dup 1))] > > +{ > > + if (peep2_reg_dead_p (1, operands[0])) > > + DONE; > > +}) > > + > > ;; Attempt to always use XOR for zeroing registers (including FP modes). > > (define_peephole2 > > [(set (match_operand 0 "general_reg_operand") Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-08 9:27 ` Jakub Jelinek @ 2019-01-08 10:49 ` Uros Bizjak 2019-01-08 11:43 ` Jakub Jelinek 0 siblings, 1 reply; 7+ messages in thread From: Uros Bizjak @ 2019-01-08 10:49 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches On Tue, Jan 8, 2019 at 10:27 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote: > > Is there a reason stack registers are excluded? Before stackreg pass, > > these registers are just like other hard registers. > > I was just afraid of those, after stack pass removing a store but not > load would result in the stack getting out of sync, and I wasn't sure if the > stack pass will tolerate a RA chosen stack reg not being touched anymore. It does. There is a stack compensating code that handles these situations. > Also, doesn't loading a float/double/long double into a stack reg and > storing again canonicalize it in any way? Say NaNs? Or unnormal or > pseudo-denormal long doubles etc.? Tried that now with a sNaN and the > load/store didn't change it, but haven't tried with exceptions enabled to > see if it would raise one. FLD from memory in SF and DFmode is considered a conversion, and converts sNaN to NaN (and emits #IA exception). But sNaN handling is already busted in the compiler as RA is free to spill the register in non-XFmode. IMO, the peephole2 pattern is no worse than the current situation. > If that is fine, I'll leave that out. I think we are good to do so. > Looking around, I was using a wrong macro anyway and am surprised nothing > complained, it should have been STACK_REG_P rather than STACK_REGNO_P. > > > Other that that, there is no need for REG_P predicate; after reload we > > don't have subregs and register_operand will match only hard regs. > > Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if > reload_completed, just disallows subregs of mem after that. At least for x86, there are no SUBREGs after reload, otherwise other parts of the compiler would break. > > Also, please put peep2_reg_dead_p predicate in the pattern predicate. > > I don't see how, that would mean I'd have to write two peephole2s instead of > one. It tries to deal with two different cases, one is where the temporary > reg is dead, in that case we can optimize away both the load or store, the > second case is where the temporary reg isn't dead, in that case we can > optimize away the store, but not the load. With the optimizing away of both > load and store I was just trying to do a cheap DCE there. I didn't realize this is an optimization, a comment would be welcome here. Uros. > Looking around more, I actually think I need to replace > (match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p > (operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because > apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit. > > > > 2019-01-07 Jakub Jelinek <jakub@redhat.com> > > > > > > PR rtl-optimization/79593 > > > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. > > > > > > --- gcc/config/i386/i386.md.jj 2019-01-01 12:37:31.564738571 +0100 > > > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100 > > > @@ -18740,6 +18740,21 @@ (define_peephole2 > > > const0_rtx); > > > }) > > > > > > +;; Attempt to optimize away memory stores of values the memory already > > > +;; has. See PR79593. > > > +(define_peephole2 > > > + [(set (match_operand 0 "register_operand") > > > + (match_operand 1 "memory_operand")) > > > + (set (match_dup 1) (match_dup 0))] > > > + "REG_P (operands[0]) > > > + && !STACK_REGNO_P (operands[0]) > > > + && !MEM_VOLATILE_P (operands[1])" > > > + [(set (match_dup 0) (match_dup 1))] > > > +{ > > > + if (peep2_reg_dead_p (1, operands[0])) > > > + DONE; > > > +}) > > > + > > > ;; Attempt to always use XOR for zeroing registers (including FP modes). > > > (define_peephole2 > > > [(set (match_operand 0 "general_reg_operand") > > Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-08 10:49 ` Uros Bizjak @ 2019-01-08 11:43 ` Jakub Jelinek 2019-01-08 17:40 ` Uros Bizjak 0 siblings, 1 reply; 7+ messages in thread From: Jakub Jelinek @ 2019-01-08 11:43 UTC (permalink / raw) To: Uros Bizjak; +Cc: Jeff Law, gcc-patches On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote: > FLD from memory in SF and DFmode is considered a conversion, and > converts sNaN to NaN (and emits #IA exception). But sNaN handling is > already busted in the compiler as RA is free to spill the register in > non-XFmode. IMO, the peephole2 pattern is no worse than the current > situation. Ok. > At least for x86, there are no SUBREGs after reload, otherwise other > parts of the compiler would break. The new patch would really handle even a SUBREG there... > > I don't see how, that would mean I'd have to write two peephole2s instead of > > one. It tries to deal with two different cases, one is where the temporary > > reg is dead, in that case we can optimize away both the load or store, the > > second case is where the temporary reg isn't dead, in that case we can > > optimize away the store, but not the load. With the optimizing away of both > > load and store I was just trying to do a cheap DCE there. > > I didn't realize this is an optimization, a comment would be welcome here. Ugh, except that it doesn't work. peep2_reg_dead_p (1, operands[0]) is not what I meant, that is always false, as the register must be live in between the first and second instruction. I meant peep2_reg_dead_p (2, operands[0]), the register dead at the end of the second instruction, except we don't really support define_split/define_peephole2 splitting into zero instructions, DONE; in that case returns NULL like FAIL; does. So, let's just wait for DCE to finish it up. Here is what I'll bootstrap/regtest then. Added also reg_overlap_mentioned_p, in case there is e.g. movl (%eax,%edx), %eax movl %eax, (%eax,%edx) or similar and as I said earlier, explicit match_operand so that I can check MEM_VOLATILE_P on both MEMs. 2019-01-08 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/79593 * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. --- gcc/config/i386/i386.md.jj 2019-01-07 23:54:54.494800693 +0100 +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100 @@ -18740,6 +18740,18 @@ (define_peephole2 const0_rtx); }) +;; Attempt to optimize away memory stores of values the memory already +;; has. See PR79593. +(define_peephole2 + [(set (match_operand 0 "register_operand") + (match_operand 1 "memory_operand")) + (set (match_operand 2 "memory_operand") (match_dup 0))] + "!MEM_VOLATILE_P (operands[1]) + && !MEM_VOLATILE_P (operands[2]) + && rtx_equal_p (operands[1], operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 1))]) + ;; Attempt to always use XOR for zeroing registers (including FP modes). (define_peephole2 [(set (match_operand 0 "general_reg_operand") Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-08 11:43 ` Jakub Jelinek @ 2019-01-08 17:40 ` Uros Bizjak 0 siblings, 0 replies; 7+ messages in thread From: Uros Bizjak @ 2019-01-08 17:40 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches On Tue, Jan 8, 2019 at 12:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote: > > FLD from memory in SF and DFmode is considered a conversion, and > > converts sNaN to NaN (and emits #IA exception). But sNaN handling is > > already busted in the compiler as RA is free to spill the register in > > non-XFmode. IMO, the peephole2 pattern is no worse than the current > > situation. > > Ok. > > > At least for x86, there are no SUBREGs after reload, otherwise other > > parts of the compiler would break. > > The new patch would really handle even a SUBREG there... > > > > I don't see how, that would mean I'd have to write two peephole2s instead of > > > one. It tries to deal with two different cases, one is where the temporary > > > reg is dead, in that case we can optimize away both the load or store, the > > > second case is where the temporary reg isn't dead, in that case we can > > > optimize away the store, but not the load. With the optimizing away of both > > > load and store I was just trying to do a cheap DCE there. > > > > I didn't realize this is an optimization, a comment would be welcome here. > > Ugh, except that it doesn't work. peep2_reg_dead_p (1, operands[0]) > is not what I meant, that is always false, as the register must be live in > between the first and second instruction. I meant > peep2_reg_dead_p (2, operands[0]), the register dead at the end of the > second instruction, except we don't really support > define_split/define_peephole2 splitting into zero instructions, DONE; in > that case returns NULL like FAIL; does. So, let's just wait for DCE to > finish it up. > > Here is what I'll bootstrap/regtest then. Added also > reg_overlap_mentioned_p, in case there is e.g. > movl (%eax,%edx), %eax > movl %eax, (%eax,%edx) I doubt this would *ever* happen, but ... OK. > or similar and as I said earlier, explicit match_operand so that I can > check MEM_VOLATILE_P on both MEMs. > > 2019-01-08 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/79593 > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. OK for mainline. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2019-01-07 23:54:54.494800693 +0100 > +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100 > @@ -18740,6 +18740,18 @@ (define_peephole2 > const0_rtx); > }) > > +;; Attempt to optimize away memory stores of values the memory already > +;; has. See PR79593. > +(define_peephole2 > + [(set (match_operand 0 "register_operand") > + (match_operand 1 "memory_operand")) > + (set (match_operand 2 "memory_operand") (match_dup 0))] > + "!MEM_VOLATILE_P (operands[1]) > + && !MEM_VOLATILE_P (operands[2]) > + && rtx_equal_p (operands[1], operands[2]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 1))]) > + > ;; Attempt to always use XOR for zeroing registers (including FP modes). > (define_peephole2 > [(set (match_operand 0 "general_reg_operand") > > > Jakub ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) 2019-01-07 22:51 [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) Jakub Jelinek 2019-01-08 7:29 ` Uros Bizjak @ 2019-01-11 19:03 ` Jeff Law 1 sibling, 0 replies; 7+ messages in thread From: Jeff Law @ 2019-01-11 19:03 UTC (permalink / raw) To: Jakub Jelinek, Uros Bizjak; +Cc: gcc-patches On 1/7/19 3:51 PM, Jakub Jelinek wrote: > Hi! > > As mentioned in that PR, we have a SI->DImode zero extension and RA happens > to choose to zero extend from a SImode memory slot which is the low part of > the DImode memory slot into which the zero extension is to be stored. > Unfortunately, the RTL DSE part really doesn't have infrastructure to > remember and, if needed, invalidate loads, it just remembers stores, so > handling this generically is quite unlikely at least for GCC9. > > This patch just handles that through a peephole2 (other option would be to > handle it in the define_split for the zero extension, but the peephole2 is > likely to catch more things). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2019-01-07 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/79593 > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. Note I'd hacked up a POC in DSE and it caught something like 10 instances across a bootstrap. In the end I don't think this matters much in practice (peephole vs optimizing at the split point). Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-01-11 19:03 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-07 22:51 [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) Jakub Jelinek 2019-01-08 7:29 ` Uros Bizjak 2019-01-08 9:27 ` Jakub Jelinek 2019-01-08 10:49 ` Uros Bizjak 2019-01-08 11:43 ` Jakub Jelinek 2019-01-08 17:40 ` Uros Bizjak 2019-01-11 19:03 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).