public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand
@ 2021-12-20 22:22 H.J. Lu
  2021-12-23 14:41 ` H.J. Lu
  0 siblings, 1 reply; 4+ messages in thread
From: H.J. Lu @ 2021-12-20 22:22 UTC (permalink / raw)
  To: Jakub Jelinek, Uros Bizjak, Hongtao Liu, liwei.xu; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > The problem is in
> >
> > (define_memory_constraint "TARGET_MEM_CONSTRAINT"
> >   "Matches any valid memory."
> >   (and (match_code "mem")
> >        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
> >                                                  MEM_ADDR_SPACE (op))")))
> >
> > define_register_constraint allows LRA to convert the operand to the form
> > '(mem (reg X))', where X is a base register.  I am testing the v2 patch with
>
> If you mean replacing an immediate with a MEM containing that immediate,
> isn't that often the right thing though?
> I mean, if the register pressure is high and options are either spill some
> register, set it to immediate, use it in one instruction and then fill the
> spilled register (i.e. 2 memory loads), compared to a MEM use on the
> arithmetic instruction one MEM seems cheaper to me.  With -fPIC and the
> cst needing runtime relocation slightly less so of course.
>

We will check the performance impact on SPEC CPU 2017.
Here is the v2 patch.  Liwei,  can you help collect SPEC CPU 2017
impact of the enclosed patch?  Thanks.

> The code due to ivopts is trying to have something like
>   size_t a = (size_t) &tunable_list;
>   size_t b = 0xffffffffffffffa8 - a;
>   size_t c = x + b;
> and for that cst - &symbol one needs actually 2 registers, one to hold the
> constant and one to hold the (%rip) based address.
> (insn 790 789 791 111 (set (reg:DI 292)
>         (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 {*movdi_internal}
>      (nil))
> (insn 791 790 792 111 (set (reg:DI 293)
>         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
>      (nil))
> (insn 792 791 793 111 (parallel [
>             (set (reg:DI 291)
>                 (minus:DI (reg:DI 292)
>                     (reg:DI 293)))
>             (clobber (reg:CC 17 flags))
>         ]) "dl-tunables.c":304:15 299 {*subdi_1}
>      (nil))
> (insn 793 792 794 111 (parallel [
>             (set (reg:DI 294)
>                 (plus:DI (reg:DI 291)
>                     (reg:DI 198 [ ivtmp.176 ])))
>             (clobber (reg:CC 17 flags))
>         ]) "dl-tunables.c":304:15 226 {*adddi_1}
>      (nil))
> It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), %temp1
> and use a subtraction instead of addition in the last insn above, or of
> course in the particular case even consider the following 2 instructions
> that do:
> (insn 794 793 795 111 (set (reg:DI 296)
>         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
>      (nil))
> (insn 795 794 796 111 (parallel [
>             (set (reg:DI 295 [ cur ])
>                 (plus:DI (reg:DI 294)
>                     (reg:DI 296)))
>             (clobber (reg:CC 17 flags))
>         ]) "dl-tunables.c":304:15 226 {*adddi_1}
>      (nil))
> and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
> all.  Guess we don't figure that out due to the cast of one of those
> addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
> pointer.
>
>         Jakub
>


--
H.J.

[-- Attachment #2: v2-0001-ix86-Don-t-use-the-m-constraint-for-x86_64_genera.patch --]
[-- Type: application/x-patch, Size: 33232 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand
  2021-12-20 22:22 [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand H.J. Lu
@ 2021-12-23 14:41 ` H.J. Lu
  2021-12-24  8:19   ` Uros Bizjak
  0 siblings, 1 reply; 4+ messages in thread
From: H.J. Lu @ 2021-12-23 14:41 UTC (permalink / raw)
  To: Jakub Jelinek, Uros Bizjak, Hongtao Liu, liwei.xu; +Cc: GCC Patches

On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > > The problem is in
> > >
> > > (define_memory_constraint "TARGET_MEM_CONSTRAINT"
> > >   "Matches any valid memory."
> > >   (and (match_code "mem")
> > >        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
> > >                                                  MEM_ADDR_SPACE (op))")))
> > >
> > > define_register_constraint allows LRA to convert the operand to the form
> > > '(mem (reg X))', where X is a base register.  I am testing the v2 patch with
> >
> > If you mean replacing an immediate with a MEM containing that immediate,
> > isn't that often the right thing though?
> > I mean, if the register pressure is high and options are either spill some
> > register, set it to immediate, use it in one instruction and then fill the
> > spilled register (i.e. 2 memory loads), compared to a MEM use on the
> > arithmetic instruction one MEM seems cheaper to me.  With -fPIC and the
> > cst needing runtime relocation slightly less so of course.
> >
>
> We will check the performance impact on SPEC CPU 2017.
> Here is the v2 patch.  Liwei,  can you help collect SPEC CPU 2017
> impact of the enclosed patch?  Thanks.

We checked SPEC CPU 2017 performance with -O2 and -Ofast.
There is no performance regression.   OK for master?

> > The code due to ivopts is trying to have something like
> >   size_t a = (size_t) &tunable_list;
> >   size_t b = 0xffffffffffffffa8 - a;
> >   size_t c = x + b;
> > and for that cst - &symbol one needs actually 2 registers, one to hold the
> > constant and one to hold the (%rip) based address.
> > (insn 790 789 791 111 (set (reg:DI 292)
> >         (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 {*movdi_internal}
> >      (nil))
> > (insn 791 790 792 111 (set (reg:DI 293)
> >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> >      (nil))
> > (insn 792 791 793 111 (parallel [
> >             (set (reg:DI 291)
> >                 (minus:DI (reg:DI 292)
> >                     (reg:DI 293)))
> >             (clobber (reg:CC 17 flags))
> >         ]) "dl-tunables.c":304:15 299 {*subdi_1}
> >      (nil))
> > (insn 793 792 794 111 (parallel [
> >             (set (reg:DI 294)
> >                 (plus:DI (reg:DI 291)
> >                     (reg:DI 198 [ ivtmp.176 ])))
> >             (clobber (reg:CC 17 flags))
> >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> >      (nil))
> > It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), %temp1
> > and use a subtraction instead of addition in the last insn above, or of
> > course in the particular case even consider the following 2 instructions
> > that do:
> > (insn 794 793 795 111 (set (reg:DI 296)
> >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> >      (nil))
> > (insn 795 794 796 111 (parallel [
> >             (set (reg:DI 295 [ cur ])
> >                 (plus:DI (reg:DI 294)
> >                     (reg:DI 296)))
> >             (clobber (reg:CC 17 flags))
> >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> >      (nil))
> > and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
> > all.  Guess we don't figure that out due to the cast of one of those
> > addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
> > pointer.
> >
> >         Jakub
> >
>
>
> --
> H.J.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand
  2021-12-23 14:41 ` H.J. Lu
@ 2021-12-24  8:19   ` Uros Bizjak
  2021-12-26 17:02     ` H.J. Lu
  0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2021-12-24  8:19 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Hongtao Liu, liwei.xu, GCC Patches

On Thu, Dec 23, 2021 at 3:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > > > The problem is in
> > > >
> > > > (define_memory_constraint "TARGET_MEM_CONSTRAINT"
> > > >   "Matches any valid memory."
> > > >   (and (match_code "mem")
> > > >        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
> > > >                                                  MEM_ADDR_SPACE (op))")))
> > > >
> > > > define_register_constraint allows LRA to convert the operand to the form
> > > > '(mem (reg X))', where X is a base register.  I am testing the v2 patch with
> > >
> > > If you mean replacing an immediate with a MEM containing that immediate,
> > > isn't that often the right thing though?
> > > I mean, if the register pressure is high and options are either spill some
> > > register, set it to immediate, use it in one instruction and then fill the
> > > spilled register (i.e. 2 memory loads), compared to a MEM use on the
> > > arithmetic instruction one MEM seems cheaper to me.  With -fPIC and the
> > > cst needing runtime relocation slightly less so of course.
> > >
> >
> > We will check the performance impact on SPEC CPU 2017.
> > Here is the v2 patch.  Liwei,  can you help collect SPEC CPU 2017
> > impact of the enclosed patch?  Thanks.
>
> We checked SPEC CPU 2017 performance with -O2 and -Ofast.
> There is no performance regression.   OK for master?

OK if there are no further comments from Jakub.

Thanks,
Uros.

> > > The code due to ivopts is trying to have something like
> > >   size_t a = (size_t) &tunable_list;
> > >   size_t b = 0xffffffffffffffa8 - a;
> > >   size_t c = x + b;
> > > and for that cst - &symbol one needs actually 2 registers, one to hold the
> > > constant and one to hold the (%rip) based address.
> > > (insn 790 789 791 111 (set (reg:DI 292)
> > >         (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 {*movdi_internal}
> > >      (nil))
> > > (insn 791 790 792 111 (set (reg:DI 293)
> > >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > >      (nil))
> > > (insn 792 791 793 111 (parallel [
> > >             (set (reg:DI 291)
> > >                 (minus:DI (reg:DI 292)
> > >                     (reg:DI 293)))
> > >             (clobber (reg:CC 17 flags))
> > >         ]) "dl-tunables.c":304:15 299 {*subdi_1}
> > >      (nil))
> > > (insn 793 792 794 111 (parallel [
> > >             (set (reg:DI 294)
> > >                 (plus:DI (reg:DI 291)
> > >                     (reg:DI 198 [ ivtmp.176 ])))
> > >             (clobber (reg:CC 17 flags))
> > >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > >      (nil))
> > > It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), %temp1
> > > and use a subtraction instead of addition in the last insn above, or of
> > > course in the particular case even consider the following 2 instructions
> > > that do:
> > > (insn 794 793 795 111 (set (reg:DI 296)
> > >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > >      (nil))
> > > (insn 795 794 796 111 (parallel [
> > >             (set (reg:DI 295 [ cur ])
> > >                 (plus:DI (reg:DI 294)
> > >                     (reg:DI 296)))
> > >             (clobber (reg:CC 17 flags))
> > >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > >      (nil))
> > > and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
> > > all.  Guess we don't figure that out due to the cast of one of those
> > > addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
> > > pointer.
> > >
> > >         Jakub
> > >
> >
> >
> > --
> > H.J.
>
> Thanks.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand
  2021-12-24  8:19   ` Uros Bizjak
@ 2021-12-26 17:02     ` H.J. Lu
  0 siblings, 0 replies; 4+ messages in thread
From: H.J. Lu @ 2021-12-26 17:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, Hongtao Liu, liwei.xu, GCC Patches

On Fri, Dec 24, 2021 at 12:19 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Dec 23, 2021 at 3:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > > >
> > > > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote:
> > > > > The problem is in
> > > > >
> > > > > (define_memory_constraint "TARGET_MEM_CONSTRAINT"
> > > > >   "Matches any valid memory."
> > > > >   (and (match_code "mem")
> > > > >        (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0),
> > > > >                                                  MEM_ADDR_SPACE (op))")))
> > > > >
> > > > > define_register_constraint allows LRA to convert the operand to the form
> > > > > '(mem (reg X))', where X is a base register.  I am testing the v2 patch with
> > > >
> > > > If you mean replacing an immediate with a MEM containing that immediate,
> > > > isn't that often the right thing though?
> > > > I mean, if the register pressure is high and options are either spill some
> > > > register, set it to immediate, use it in one instruction and then fill the
> > > > spilled register (i.e. 2 memory loads), compared to a MEM use on the
> > > > arithmetic instruction one MEM seems cheaper to me.  With -fPIC and the
> > > > cst needing runtime relocation slightly less so of course.
> > > >
> > >
> > > We will check the performance impact on SPEC CPU 2017.
> > > Here is the v2 patch.  Liwei,  can you help collect SPEC CPU 2017
> > > impact of the enclosed patch?  Thanks.
> >
> > We checked SPEC CPU 2017 performance with -O2 and -Ofast.
> > There is no performance regression.   OK for master?
>
> OK if there are no further comments from Jakub.

I will check it tomorrow if there are further comments from Jakub.

Thanks.

> Thanks,
> Uros.
>
> > > > The code due to ivopts is trying to have something like
> > > >   size_t a = (size_t) &tunable_list;
> > > >   size_t b = 0xffffffffffffffa8 - a;
> > > >   size_t c = x + b;
> > > > and for that cst - &symbol one needs actually 2 registers, one to hold the
> > > > constant and one to hold the (%rip) based address.
> > > > (insn 790 789 791 111 (set (reg:DI 292)
> > > >         (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 {*movdi_internal}
> > > >      (nil))
> > > > (insn 791 790 792 111 (set (reg:DI 293)
> > > >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > > >      (nil))
> > > > (insn 792 791 793 111 (parallel [
> > > >             (set (reg:DI 291)
> > > >                 (minus:DI (reg:DI 292)
> > > >                     (reg:DI 293)))
> > > >             (clobber (reg:CC 17 flags))
> > > >         ]) "dl-tunables.c":304:15 299 {*subdi_1}
> > > >      (nil))
> > > > (insn 793 792 794 111 (parallel [
> > > >             (set (reg:DI 294)
> > > >                 (plus:DI (reg:DI 291)
> > > >                     (reg:DI 198 [ ivtmp.176 ])))
> > > >             (clobber (reg:CC 17 flags))
> > > >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > > >      (nil))
> > > > It would be smarter to rewrite the above into a lea 88+tunable_list(%rip), %temp1
> > > > and use a subtraction instead of addition in the last insn above, or of
> > > > course in the particular case even consider the following 2 instructions
> > > > that do:
> > > > (insn 794 793 795 111 (set (reg:DI 296)
> > > >         (symbol_ref:DI ("tunable_list") [flags 0x2]  <var_decl 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal}
> > > >      (nil))
> > > > (insn 795 794 796 111 (parallel [
> > > >             (set (reg:DI 295 [ cur ])
> > > >                 (plus:DI (reg:DI 294)
> > > >                     (reg:DI 296)))
> > > >             (clobber (reg:CC 17 flags))
> > > >         ]) "dl-tunables.c":304:15 226 {*adddi_1}
> > > >      (nil))
> > > > and find out that &tuneble_list - &tuneble_list is 0 and we don't need it at
> > > > all.  Guess we don't figure that out due to the cast of one of those
> > > > addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal
> > > > pointer.
> > > >
> > > >         Jakub
> > > >
> > >
> > >
> > > --
> > > H.J.
> >
> > Thanks.
> >
> > --
> > H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-26 17:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 22:22 [PATCH v2] ix86: Don't use the 'm' constraint for x86_64_general_operand H.J. Lu
2021-12-23 14:41 ` H.J. Lu
2021-12-24  8:19   ` Uros Bizjak
2021-12-26 17:02     ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).