public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFA] Improve strcmp expansion when one input is a constant string.
@ 2023-06-04 21:41 Jeff Law
  2023-06-05  6:29 ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2023-06-04 21:41 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4861 bytes --]

While investigating a RISC-V backend patch from Jivan I noticed a 
regression in terms of dynamic instruction counts for the omnetpp 
benchmark in spec2017.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620577.html

The code we we with Jivan's patch at expansion time looks like this for 
each character in the input string:



(insn 6 5 7 (set (reg:SI 137)
         (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM 
<char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1
      (nil))

(insn 7 6 8 (set (reg:DI 138)
         (sign_extend:DI (plus:SI (reg:SI 137)
                 (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 -1
      (nil))

(insn 8 7 9 (set (reg:SI 136)
         (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 -1
      (expr_list:REG_EQUAL (plus:SI (reg:SI 137)
             (const_int -108 [0xffffffffffffff94]))
         (nil)))

(insn 9 8 10 (set (reg:DI 139)
         (sign_extend:DI (reg:SI 136))) "j.c":5:11 -1
      (nil))

(jump_insn 10 9 11 (set (pc)
         (if_then_else (ne (reg:DI 139)
                 (const_int 0 [0]))
             (label_ref 64)
             (pc))) "j.c":5:11 -1
      (nil))


Ignore insn 9.  fwprop will turn it into a trivial copy from r138->r139 
which will ultimately propagate away.


All the paths eventually transfer to control to the label in question, 
either by jumping or falling thru on the last character.  After a bit of 
cleanup by fwprop & friends we have:



> (insn 6 3 7 2 (set (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
>         (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM <char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 114 {zero_extendqisi2}
>      (nil)) 
> (insn 7 6 8 2 (set (reg:DI 138)
>         (sign_extend:DI (plus:SI (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
>                 (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 6 {addsi3_extended}
>      (expr_list:REG_DEAD (reg:SI 137 [ MEM <char[1:12]> [(void *)x_2(D)] ])
>         (nil)))
> (insn 8 7 10 2 (set (reg:SI 136 [ MEM <char[1:12]> [(void *)x_2(D)]+11 ])
>         (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 180 {*movsi_internal}
>      (nil))
> (jump_insn 10 8 73 2 (set (pc)
>         (if_then_else (ne (reg:DI 138)
>                 (const_int 0 [0]))
>             (label_ref 64)
>             (pc))) "j.c":5:11 243 {*branchdi}
>      (expr_list:REG_DEAD (reg:DI 138)
>         (int_list:REG_BR_PROB 536870916 (nil)))
>  -> 64)


insn 8 is the result of wanting the ultimate result of the strcmp to be 
an "int" type (SImode).    Note that (reg 136) is the result of the 
strcmp.  It gets set in each fragment of code that compares one element 
in the string.  It's also live after the strcmp sequence.   As a result 
combine isn't going to be able to clean this up.

Note how (reg 136) births while (reg 138) is live and even though (reg 
136) is a copy of (reg 138), IRA doesn't have the necessary code to 
determine that the regs do not conflict.  As a result (reg 136) and (reg 
138) must be allocated different hard registers and we get code like this:

>         lbu     a5,0(a0)        # 6     [c=28 l=4]  zero_extendqisi2/1
>         addiw   a5,a5,-108      # 7     [c=8 l=4]  addsi3_extended/1
>         mv      a4,a5   # 8     [c=4 l=4]  *movsi_internal/0
>         bne     a5,zero,.L2     # 10    [c=4 l=4]  *branchdi

Note the annoying "mv".


Rather than do a conversion for each character, we could do each step in 
word_mode and do the conversion once at the end of the whole sequence.

So for each character we expand to:

> (insn 6 5 7 (set (reg:DI 138)
>         (zero_extend:DI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM <char[1:12]> [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1
>      (nil))
> 
> (insn 7 6 8 (set (reg:DI 137)
>         (plus:DI (reg:DI 138)
>             (const_int -108 [0xffffffffffffff94]))) "j.c":5:11 -1
>      (nil))
> 
> (jump_insn 8 7 9 (set (pc)
>         (if_then_else (ne (reg:DI 137)
>                 (const_int 0 [0]))
>             (label_ref 41)
>             (pc))) "j.c":5:11 -1
>      (nil))

Good.  Then at the end of the sequence we have:
> (code_label 41 40 42 2 (nil) [0 uses])
> 
> (insn 42 41 43 (set (reg:SI 136)
>         (subreg:SI (reg:DI 137) 0)) "j.c":5:11 -1
>      (nil))

Which seems like exactly what we want.  At the assembly level we get:
         lbu     a5,0(a0)        # 6     [c=28 l=4]  zero_extendqidi2/1
         addi    a0,a5,-108      # 7     [c=4 l=4]  adddi3/1
         bne     a0,zero,.L2     # 8     [c=4 l=4]  *branchdi
[ ... ]

At the end of the sequence we realize the narrowing subreg followed by 
an extnesion isn't necessary and just remove them.

The ultimate result is omnetpp goes from a small regression to a small 
overall improvement with Jivan's patch.

Bootstrapped and regression tested on x86.  Also built and run spec2017 
on riscv64.

OK for the trunk?

Jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 2406 bytes --]

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 8400adaf5b4..f2e0d3b7d7f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -7135,6 +7135,9 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str,
   scalar_int_mode unit_mode
     = as_a <scalar_int_mode> TYPE_MODE (unit_type_node);
 
+  /* We do the intermediate steps in WORD_MODE, then convert
+     to mode at the end of the sequence.  */
+  rtx intermediate_result = gen_reg_rtx (word_mode);
   start_sequence ();
 
   for (unsigned HOST_WIDE_INT i = 0; i < length; i++)
@@ -7145,25 +7148,27 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str,
       rtx op0 = (const_str_n == 1) ? const_rtx : var_rtx;
       rtx op1 = (const_str_n == 1) ? var_rtx : const_rtx;
 
-      op0 = convert_modes (mode, unit_mode, op0, 1);
-      op1 = convert_modes (mode, unit_mode, op1, 1);
-      rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
-				      result, 1, OPTAB_WIDEN);
+      op0 = convert_modes (word_mode, unit_mode, op0, 1);
+      op1 = convert_modes (word_mode, unit_mode, op1, 1);
+      rtx diff = expand_simple_binop (word_mode, MINUS, op0, op1,
+				      intermediate_result, 1, OPTAB_WIDEN);
 
-      /* Force the difference into result register.  We cannot reassign
-	 result here ("result = diff") or we may end up returning
-	 uninitialized result when expand_simple_binop allocates a new
-	 pseudo-register for returning.  */
-      if (diff != result)
-	emit_move_insn (result, diff);
+      /* Force the difference into intermediate result register.  We cannot
+	 reassign the intermediate result here ("intermediate_result = diff")
+	 or we may end up returning uninitialized result when
+	 expand_simple_binop allocates a new pseudo-register for returning.  */
+      if (diff != intermediate_result)
+	emit_move_insn (intermediate_result, diff);
 
       if (i < length - 1)
-	emit_cmp_and_jump_insns (result, CONST0_RTX (mode), NE, NULL_RTX,
-	    			 mode, true, ne_label);
+	emit_cmp_and_jump_insns (intermediate_result, CONST0_RTX (word_mode),
+				 NE, NULL_RTX, word_mode, true, ne_label);
       offset += GET_MODE_SIZE (unit_mode);
     }
 
   emit_label (ne_label);
+  /* Now convert the intermediate result to the final result.  */
+  emit_move_insn (result, gen_lowpart (mode, intermediate_result));
   rtx_insn *insns = get_insns ();
   end_sequence ();
   emit_insn (insns);

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-07  3:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-04 21:41 [RFA] Improve strcmp expansion when one input is a constant string Jeff Law
2023-06-05  6:29 ` Richard Biener
2023-06-05 13:37   ` Jeff Law
2023-06-05 18:41   ` Jeff Law
2023-06-06  6:47     ` Richard Biener
2023-06-07  3:02       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).