public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [RFA] Improve strcmp expansion when one input is a constant string.
Date: Mon, 5 Jun 2023 08:29:36 +0200	[thread overview]
Message-ID: <CAFiYyc3H=+Li+ejJuCHsDWBbS_YtDYe4uP0XTpsOQedEkwKYkg@mail.gmail.com> (raw)
In-Reply-To: <a4d5b600-2e21-3fea-17f8-4c2764880409@gmail.com>

On Sun, Jun 4, 2023 at 11:41 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> While investigating a RISC-V backend patch from Jivan I noticed a
> regression in terms of dynamic instruction counts for the omnetpp
> benchmark in spec2017.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620577.html
>
> The code we we with Jivan's patch at expansion time looks like this for
> each character in the input string:
>
>
>
> (insn 6 5 7 (set (reg:SI 137)
>          (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM
> <char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1
>       (nil))
>
> (insn 7 6 8 (set (reg:DI 138)
>          (sign_extend:DI (plus:SI (reg:SI 137)
>                  (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 -1
>       (nil))
>
> (insn 8 7 9 (set (reg:SI 136)
>          (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 -1
>       (expr_list:REG_EQUAL (plus:SI (reg:SI 137)
>              (const_int -108 [0xffffffffffffff94]))
>          (nil)))
>
> (insn 9 8 10 (set (reg:DI 139)
>          (sign_extend:DI (reg:SI 136))) "j.c":5:11 -1
>       (nil))
>
> (jump_insn 10 9 11 (set (pc)
>          (if_then_else (ne (reg:DI 139)
>                  (const_int 0 [0]))
>              (label_ref 64)
>              (pc))) "j.c":5:11 -1
>       (nil))
>
>
> Ignore insn 9.  fwprop will turn it into a trivial copy from r138->r139
> which will ultimately propagate away.
>
>
> All the paths eventually transfer to control to the label in question,
> either by jumping or falling thru on the last character.  After a bit of
> cleanup by fwprop & friends we have:
>
>
>
> > (insn 6 3 7 2 (set (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
> >         (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM <char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 114 {zero_extendqisi2}
> >      (nil))
> > (insn 7 6 8 2 (set (reg:DI 138)
> >         (sign_extend:DI (plus:SI (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
> >                 (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 6 {addsi3_extended}
> >      (expr_list:REG_DEAD (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
> >         (nil)))
> > (insn 8 7 10 2 (set (reg:SI 136 [ MEM <char[1:12]> [(void *)x_2(D)]+11 ])
> >         (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 180 {*movsi_internal}
> >      (nil))
> > (jump_insn 10 8 73 2 (set (pc)
> >         (if_then_else (ne (reg:DI 138)
> >                 (const_int 0 [0]))
> >             (label_ref 64)
> >             (pc))) "j.c":5:11 243 {*branchdi}
> >      (expr_list:REG_DEAD (reg:DI 138)
> >         (int_list:REG_BR_PROB 536870916 (nil)))
> >  -> 64)
>
>
> insn 8 is the result of wanting the ultimate result of the strcmp to be
> an "int" type (SImode).    Note that (reg 136) is the result of the
> strcmp.  It gets set in each fragment of code that compares one element
> in the string.  It's also live after the strcmp sequence.   As a result
> combine isn't going to be able to clean this up.
>
> Note how (reg 136) births while (reg 138) is live and even though (reg
> 136) is a copy of (reg 138), IRA doesn't have the necessary code to
> determine that the regs do not conflict.  As a result (reg 136) and (reg
> 138) must be allocated different hard registers and we get code like this:
>
> >         lbu     a5,0(a0)        # 6     [c=28 l=4]  zero_extendqisi2/1
> >         addiw   a5,a5,-108      # 7     [c=8 l=4]  addsi3_extended/1
> >         mv      a4,a5   # 8     [c=4 l=4]  *movsi_internal/0
> >         bne     a5,zero,.L2     # 10    [c=4 l=4]  *branchdi
>
> Note the annoying "mv".
>
>
> Rather than do a conversion for each character, we could do each step in
> word_mode and do the conversion once at the end of the whole sequence.
>
> So for each character we expand to:
>
> > (insn 6 5 7 (set (reg:DI 138)
> >         (zero_extend:DI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM <char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1
> >      (nil))
> >
> > (insn 7 6 8 (set (reg:DI 137)
> >         (plus:DI (reg:DI 138)
> >             (const_int -108 [0xffffffffffffff94]))) "j.c":5:11 -1
> >      (nil))
> >
> > (jump_insn 8 7 9 (set (pc)
> >         (if_then_else (ne (reg:DI 137)
> >                 (const_int 0 [0]))
> >             (label_ref 41)
> >             (pc))) "j.c":5:11 -1
> >      (nil))
>
> Good.  Then at the end of the sequence we have:
> > (code_label 41 40 42 2 (nil) [0 uses])
> >
> > (insn 42 41 43 (set (reg:SI 136)
> >         (subreg:SI (reg:DI 137) 0)) "j.c":5:11 -1
> >      (nil))
>
> Which seems like exactly what we want.  At the assembly level we get:
>          lbu     a5,0(a0)        # 6     [c=28 l=4]  zero_extendqidi2/1
>          addi    a0,a5,-108      # 7     [c=4 l=4]  adddi3/1
>          bne     a0,zero,.L2     # 8     [c=4 l=4]  *branchdi
> [ ... ]
>
> At the end of the sequence we realize the narrowing subreg followed by
> an extnesion isn't necessary and just remove them.
>
> The ultimate result is omnetpp goes from a small regression to a small
> overall improvement with Jivan's patch.
>
> Bootstrapped and regression tested on x86.  Also built and run spec2017
> on riscv64.
>
> OK for the trunk?

But then for example x86 has smaller encoding for byte ops and while
widening is easily done later, truncation is not.

Btw, you failed to update the overall function comment which lists
the conversions applied.

Note I would have expected to use the mode of the load so we truly
elide some extensions, using word_mode looks like just another
mode here?  The key to note is probably

      op0 = convert_modes (mode, unit_mode, op0, 1);
      op1 = convert_modes (mode, unit_mode, op1, 1);
      rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
                                      result, 1, OPTAB_WIDEN);

which uses OPTAB_WIDEN - wouldn't it be better to pass in the
unconverted modes and leave the decision which mode to use
to OPTAB_WIDEN?  Should we somehow query the target for
the smallest mode from unit_mode it can do both the MINUS
and the compare?

Thanks,
Richard.

>
> Jeff

  reply	other threads:[~2023-06-05  6:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-04 21:41 Jeff Law
2023-06-05  6:29 ` Richard Biener [this message]
2023-06-05 13:37   ` Jeff Law
2023-06-05 18:41   ` Jeff Law
2023-06-06  6:47     ` Richard Biener
2023-06-07  3:02       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc3H=+Li+ejJuCHsDWBbS_YtDYe4uP0XTpsOQedEkwKYkg@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).