public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
@ 2023-03-10  7:36 liwei at loongson dot cn
  2023-03-10  7:38 ` [Bug target/109086] " liwei at loongson dot cn
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: liwei at loongson dot cn @ 2023-03-10  7:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

            Bug ID: 109086
           Summary: Bug of builtin_strcmp in the case of using the adddi3
                    instruction patterns
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liwei at loongson dot cn
  Target Milestone: ---

Created attachment 54631
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54631&action=edit
cause builtin_strcmp bug file

When we try to force add<mode>3 insn patterns only match DI mode for some
reason (this is safe because Loongarch defines the macro PROMOTE_MODE to
promote all modes narrower than BITS_PER_WORD to BITS_PER_WORD), it fails in
the following test case.

extern const char* deal(const char* filename);
const char* foo (filename) 
    const char *filename;
{
    if (filename == NULL || !strcmp (filename, "-"))
        filename = ""; 
    return deal(filename); 
}

We use glibc's script build-many-glibcs.py to cross build loongarch64 gcc, and
use -O2 command to compile above test case.

Part of the assembly code is as follows:

.LFB0 = .
    .cfi_startproc
    beqz    $r4,.L5
    ld.bu   $r12,$r4,0
    addi.w  $r12,$r12,-45
    beqz    $r12,.L7
    slli.w  $r12,$r13,0
    bnez    $r12,.L2
...

The r12 register uses the uninitialized r13 register, which makes subsequent
bnez insn wrong.

Through gdb debugging we found that the problem occurs in
builtins.cc:inline_string_cmp, below is the problem code.

rtx result = target ? target : gen_reg_rtx (mode);
...
for (unsigned HOST_WIDE_INT i = 0; i < length; i++) 
{
...
op0 = convert_modes (mode, unit_mode, op0, 1);
op1 = convert_modes (mode, unit_mode, op1, 1);
result = expand_simple_binop (mode, MINUS, op0, op1,
               result, 1, OPTAB_WIDEN);
...
}

The result is empty at the beginning, and a new temporary register is used to
save the result of the first strcmp comparison, there is no problem in the
first loop.
From the second loop, the expand_simple_binop function actually calls the
expand_binop function. Since there is no addsi3 insn pattern, gcc will promote
the machine mode of the two operands op0 and op1 to DI, then match it to
adddi3, the result is stored in the temporary register and returned (because
the mode is MODE_INT and loongarch supports TRUNCATION). The previous result is
discarded, which resulting in error.

Attached are the diff file used to reproduce the bugs on loongarch and the
corresponding fix patches given.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
@ 2023-03-10  7:38 ` liwei at loongson dot cn
  2023-03-10  8:18 ` [Bug rtl-optimization/109086] " rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: liwei at loongson dot cn @ 2023-03-10  7:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #1 from liwei at loongson dot cn ---
Created attachment 54632
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54632&action=edit
try to fix builtin_strcmp bug patch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
  2023-03-10  7:38 ` [Bug target/109086] " liwei at loongson dot cn
@ 2023-03-10  8:18 ` rguenth at gcc dot gnu.org
  2023-03-10  9:31 ` liwei at loongson dot cn
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-10  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |loongarch
           Keywords|                            |wrong-code

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
But the previous result is irrelevant?  Each and every result is independent
and immediately checked with a compare-and-jump?  To me it looks like the
issue is in expand_simple_binop not honoring something not using the provided
'target'.  But I can't see how, so I can wonder if you can pin-point better
where it goes wrong.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
  2023-03-10  7:38 ` [Bug target/109086] " liwei at loongson dot cn
  2023-03-10  8:18 ` [Bug rtl-optimization/109086] " rguenth at gcc dot gnu.org
@ 2023-03-10  9:31 ` liwei at loongson dot cn
  2023-03-10 13:12 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: liwei at loongson dot cn @ 2023-03-10  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #3 from liwei at loongson dot cn ---
Thank you for your quick reply!

The final call of the expand_simple_binop function returns part of the code as
follows (optabs.cc):

1671  xop0 = widen_operand (xop0, wider_mode, mode, unsignedp, no_extend);
1672  
1673  /* The second operand of a shift must always be extended.  */
1674  xop1 = widen_operand (xop1, wider_mode, mode, unsignedp,
1675                      no_extend && binoptab != ashl_optab);
1676  
1677  temp = expand_binop (wider_mode, binoptab, xop0, xop1, NULL_RTX,
1678                     unsignedp, OPTAB_DIRECT);
1679  if (temp)
1680    {
1681      if (mclass != MODE_INT
1682          || !TRULY_NOOP_TRUNCATION_MODES_P (mode, wider_mode))
1683        {
1684          if (target == 0)
1685            target = gen_reg_rtx (mode);
1686          convert_move (target, temp, 0);
1687          return target;
1688        }
1689      else
1690        return gen_lowpart (mode, temp);
1691    }

In the above code, target = result, when promote xop0 and xop1 machine mode to
DI, then match adddi3 insn pattern and save the new rtx to temp, then because
the mclass equal MODE_INT (also equal true for TRULY_NOOP_TRUNCATION_MODES_P
function), the code enter else branch and return lowpart of temp which is
irrelevant with target (i.e. result).

Maybe the expand_binop function does not consider the case of dependency with
`target` when generating rtx for the case of promote MODE_INT mode, and maybe
theoretically it does not need to be considered, except that builtin_strcmp
happens to meet such cases, so I think it is enough to modify the processing
logic of builtin_strcmp (my humble opinion).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (2 preceding siblings ...)
  2023-03-10  9:31 ` liwei at loongson dot cn
@ 2023-03-10 13:12 ` rguenth at gcc dot gnu.org
  2023-03-14 14:59 ` xry111 at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-10 13:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to liwei from comment #3)
> Thank you for your quick reply!
> 
> The final call of the expand_simple_binop function returns part of the code
> as follows (optabs.cc):
> 
> 1671  xop0 = widen_operand (xop0, wider_mode, mode, unsignedp, no_extend);
> 1672  
> 1673  /* The second operand of a shift must always be extended.  */
> 1674  xop1 = widen_operand (xop1, wider_mode, mode, unsignedp,
> 1675      		  no_extend && binoptab != ashl_optab);
> 1676  
> 1677  temp = expand_binop (wider_mode, binoptab, xop0, xop1, NULL_RTX,
> 1678      		 unsignedp, OPTAB_DIRECT);
> 1679  if (temp)
> 1680    {
> 1681      if (mclass != MODE_INT
> 1682          || !TRULY_NOOP_TRUNCATION_MODES_P (mode, wider_mode))
> 1683        {
> 1684          if (target == 0)
> 1685            target = gen_reg_rtx (mode);
> 1686          convert_move (target, temp, 0);
> 1687          return target;
> 1688        }
> 1689      else
> 1690        return gen_lowpart (mode, temp);
> 1691    }
> 
> In the above code, target = result, when promote xop0 and xop1 machine mode
> to DI, then match adddi3 insn pattern and save the new rtx to temp, then
> because the mclass equal MODE_INT (also equal true for
> TRULY_NOOP_TRUNCATION_MODES_P function), the code enter else branch and
> return lowpart of temp which is irrelevant with target (i.e. result).

Sure, but expand_simple_binop doesn't need to return 'target'

> Maybe the expand_binop function does not consider the case of dependency
> with `target` when generating rtx for the case of promote MODE_INT mode, and
> maybe theoretically it does not need to be considered, except that
> builtin_strcmp happens to meet such cases, so I think it is enough to modify
> the processing logic of builtin_strcmp (my humble opinion).

I'm still missing the obvious?  We assign the result to 'result', so it
should be all fine?  I see that emit_cmp_and_jump_insns eventually forces
the lowpart to a register but still it should work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (3 preceding siblings ...)
  2023-03-10 13:12 ` rguenth at gcc dot gnu.org
@ 2023-03-14 14:59 ` xry111 at gcc dot gnu.org
  2023-03-15  2:00 ` xry111 at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-14 14:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xry111 at gcc dot gnu.org

--- Comment #5 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Definitely not a __builtin_strlen expansion issue.  Things start to go wrong in
318r.bbro pass.  In 317r.rtl.dce:

note 42 17 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(note 18 42 19 4 NOTE_INSN_DELETED)
(insn 19 18 20 4 (set (reg:DI 13 $r13 [orig:92 MEM <char[1:2]> [(void
*)filename_3(D)]+1 ] [92])
        (zero_extend:DI (mem:QI (plus:DI (reg/v/f:DI 4 $r4 [orig:82 filename ]
[82])
                    (const_int 1 [0x1])) [0 MEM <char[1:2]> [(void
*)filename_3(D)]+1 S1 A8]))) "t.c":6:30 discrim 1 107 {zero_extendqidi2}
     (nil))
(code_label 20 19 43 5 3 (nil) [1 uses])
(note 43 20 21 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 21 43 22 5 (set (reg:DI 12 $r12 [orig:80 _1 ] [80])
        (sign_extend:DI (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
*)filename_3(D)]+1 ] [92]))) "t.c":6:30 discrim 1 116 {extendsidi2}
     (expr_list:REG_DEAD (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
*)filename_3(D)]+1 ] [92])
        (nil)))

But in 318r.bbro:

Reordered sequence:
 2 bb 2
 3 bb 3
 4 bb 5
 5 bb 7
 6 bb 8
 7 bb 4
 8 duplicate of 5

and...

(code_label 20 19 43 4 3 (nil) [1 uses])
(note 43 20 21 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 21 43 22 4 (set (reg:DI 12 $r12 [orig:80 _1 ] [80])
        (sign_extend:DI (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
*)filename_3(D)]+1 ] [92]))) "t.c":6:30 discrim 1 116 {extendsidi2}
     (expr_list:REG_DEAD (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
*)filename_3(D)]+1 ] [92])
        (nil)))

Apparently the BB reorder pass believe BB 5 does not depends on BB 4.  Any idea
why this could even happen?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (4 preceding siblings ...)
  2023-03-14 14:59 ` xry111 at gcc dot gnu.org
@ 2023-03-15  2:00 ` xry111 at gcc dot gnu.org
  2023-03-15  2:04 ` xry111 at gcc dot gnu.org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-15  2:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #6 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #5)
> Definitely not a __builtin_strlen expansion issue.  Things start to go wrong
> in 318r.bbro pass.  In 317r.rtl.dce:
> 
> note 42 17 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (note 18 42 19 4 NOTE_INSN_DELETED)
> (insn 19 18 20 4 (set (reg:DI 13 $r13 [orig:92 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 ] [92])
>         (zero_extend:DI (mem:QI (plus:DI (reg/v/f:DI 4 $r4 [orig:82 filename
> ] [82])
>                     (const_int 1 [0x1])) [0 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 S1 A8]))) "t.c":6:30 discrim 1 107 {zero_extendqidi2}
>      (nil))
> (code_label 20 19 43 5 3 (nil) [1 uses])
> (note 43 20 21 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (insn 21 43 22 5 (set (reg:DI 12 $r12 [orig:80 _1 ] [80])
>         (sign_extend:DI (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 ] [92]))) "t.c":6:30 discrim 1 116 {extendsidi2}
>      (expr_list:REG_DEAD (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 ] [92])
>         (nil)))
> 
> But in 318r.bbro:
> 
> Reordered sequence:
>  2 bb 2
>  3 bb 3
>  4 bb 5
>  5 bb 7
>  6 bb 8
>  7 bb 4
>  8 duplicate of 5
> 
> and...
> 
> (code_label 20 19 43 4 3 (nil) [1 uses])
> (note 43 20 21 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 21 43 22 4 (set (reg:DI 12 $r12 [orig:80 _1 ] [80])
>         (sign_extend:DI (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 ] [92]))) "t.c":6:30 discrim 1 116 {extendsidi2}
>      (expr_list:REG_DEAD (reg:SI 13 $r13 [orig:92 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 ] [92])
>         (nil)))
> 
> Apparently the BB reorder pass believe BB 5 does not depends on BB 4.  Any
> idea why this could even happen?

Wrong analysis, looks like I was too sleepy yesterday.  I'll try to post a new
one now...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (5 preceding siblings ...)
  2023-03-15  2:00 ` xry111 at gcc dot gnu.org
@ 2023-03-15  2:04 ` xry111 at gcc dot gnu.org
  2023-03-15  2:17 ` liwei at loongson dot cn
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-15  2:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #7 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Things are already wrong in 255r:

(jump_insn 17 16 42 4 (set (pc)
        (if_then_else (ne (reg:DI 90)
                (const_int 0 [0]))
            (label_ref 20)
            (pc))) "t.c":4:23 discrim 1 -1
     (int_list:REG_BR_PROB 536870916 (nil))
 -> 20)
(note 42 17 18 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 18 42 19 5 (set (reg:SI 91)
        (zero_extend:SI (mem:QI (plus:DI (reg/v/f:DI 82 [ filename ])
                    (const_int 1 [0x1])) [0 MEM <char[1:2]> [(void
*)filename_3(D)]+1 S1 A8]))) "t.c":4:23 discrim 1 -1
     (nil))
(insn 19 18 20 5 (set (reg:DI 92)
        (plus:DI (subreg:DI (reg:SI 91) 0)
            (const_int 0 [0]))) "t.c":4:23 discrim 1 -1
     (nil))
(code_label 20 19 43 6 3 (nil) [1 uses])
(note 43 20 21 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
(insn 21 43 22 6 (set (reg:DI 80 [ _1 ])
        (sign_extend:DI (subreg:SI (reg:DI 92) 0))) "t.c":4:23 discrim 1 -1
     (nil))

Note that the jump_insn jumps over insn 19 which initializes the vreg 92 (which
would become r13).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (6 preceding siblings ...)
  2023-03-15  2:04 ` xry111 at gcc dot gnu.org
@ 2023-03-15  2:17 ` liwei at loongson dot cn
  2023-03-15  3:02 ` xry111 at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: liwei at loongson dot cn @ 2023-03-15  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #8 from liwei at loongson dot cn ---
(In reply to Xi Ruoyao from comment #7)
> Things are already wrong in 255r:
> 
> (jump_insn 17 16 42 4 (set (pc)
>         (if_then_else (ne (reg:DI 90)
>                 (const_int 0 [0]))
>             (label_ref 20)
>             (pc))) "t.c":4:23 discrim 1 -1
>      (int_list:REG_BR_PROB 536870916 (nil))
>  -> 20)
> (note 42 17 18 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (insn 18 42 19 5 (set (reg:SI 91)
>         (zero_extend:SI (mem:QI (plus:DI (reg/v/f:DI 82 [ filename ])
>                     (const_int 1 [0x1])) [0 MEM <char[1:2]> [(void
> *)filename_3(D)]+1 S1 A8]))) "t.c":4:23 discrim 1 -1
>      (nil))
> (insn 19 18 20 5 (set (reg:DI 92)
>         (plus:DI (subreg:DI (reg:SI 91) 0)
>             (const_int 0 [0]))) "t.c":4:23 discrim 1 -1
>      (nil))
> (code_label 20 19 43 6 3 (nil) [1 uses])
> (note 43 20 21 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
> (insn 21 43 22 6 (set (reg:DI 80 [ _1 ])
>         (sign_extend:DI (subreg:SI (reg:DI 92) 0))) "t.c":4:23 discrim 1 -1
>      (nil))
> 
> Note that the jump_insn jumps over insn 19 which initializes the vreg 92
> (which would become r13).

Thanks for the reply, i am trying to debug inline_string_cmp again to explain
the problem in more detail.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (7 preceding siblings ...)
  2023-03-15  2:17 ` liwei at loongson dot cn
@ 2023-03-15  3:02 ` xry111 at gcc dot gnu.org
  2023-03-15  4:52 ` liwei at loongson dot cn
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-15  3:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #9 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)

> > Maybe the expand_binop function does not consider the case of dependency
> > with `target` when generating rtx for the case of promote MODE_INT mode, and
> > maybe theoretically it does not need to be considered, except that
> > builtin_strcmp happens to meet such cases, so I think it is enough to modify
> > the processing logic of builtin_strcmp (my humble opinion).
> 
> I'm still missing the obvious?  We assign the result to 'result', so it
> should be all fine?  I see that emit_cmp_and_jump_insns eventually forces
> the lowpart to a register but still it should work.

The problem here is we are assigning the result (i. e. virtual register number)
to 'result', not *moving* the result (i. e. the difference between s[i] and
t[i]) into 'result'.

When expand_simple_binop is allowed to assign new vreg in the loop, it's
possible that loop iteration 1 uses vreg1, iteration 2 use vreg2, ...,
iteration N use vregN.  When we move the result into 'result', the "moving"
will happen only if the execution really reaches the move insn, such a behavior
is correct.

But we we assign the result into 'result', the assignment will "always" happen
(because assignment is a logic in the compiler itself, unlike moving which is a
part of target code).  I. e. we always return vregN as the result, which is not
correct if the loop jumps out early.

I'd suggest

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 305c65c29be..0c14a3b52b3 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -7142,8 +7142,11 @@ inline_string_cmp (rtx target, tree var_str, const char
*const_str,

       op0 = convert_modes (mode, unit_mode, op0, 1);
       op1 = convert_modes (mode, unit_mode, op1, 1);
-      result = expand_simple_binop (mode, MINUS, op0, op1,
-                                   result, 1, OPTAB_WIDEN);
+      rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
+                                     result, 1, OPTAB_WIDEN);
+      if (diff != result)
+       emit_move_insn (result, diff);
+
       if (i < length - 1)
        emit_cmp_and_jump_insns (result, CONST0_RTX (mode), NE, NULL_RTX,
                                 mode, true, ne_label);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109086] Bug of builtin_strcmp in the case of using the adddi3 instruction patterns
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (8 preceding siblings ...)
  2023-03-15  3:02 ` xry111 at gcc dot gnu.org
@ 2023-03-15  4:52 ` liwei at loongson dot cn
  2023-03-15  7:33 ` [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result xry111 at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: liwei at loongson dot cn @ 2023-03-15  4:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #10 from liwei at loongson dot cn ---
(In reply to Richard Biener from comment #4)
> (In reply to liwei from comment #3)
> > Thank you for your quick reply!
> > 
> > The final call of the expand_simple_binop function returns part of the code
> > as follows (optabs.cc):
> > 
> > 1671  xop0 = widen_operand (xop0, wider_mode, mode, unsignedp, no_extend);
> > 1672  
> > 1673  /* The second operand of a shift must always be extended.  */
> > 1674  xop1 = widen_operand (xop1, wider_mode, mode, unsignedp,
> > 1675      		  no_extend && binoptab != ashl_optab);
> > 1676  
> > 1677  temp = expand_binop (wider_mode, binoptab, xop0, xop1, NULL_RTX,
> > 1678      		 unsignedp, OPTAB_DIRECT);
> > 1679  if (temp)
> > 1680    {
> > 1681      if (mclass != MODE_INT
> > 1682          || !TRULY_NOOP_TRUNCATION_MODES_P (mode, wider_mode))
> > 1683        {
> > 1684          if (target == 0)
> > 1685            target = gen_reg_rtx (mode);
> > 1686          convert_move (target, temp, 0);
> > 1687          return target;
> > 1688        }
> > 1689      else
> > 1690        return gen_lowpart (mode, temp);
> > 1691    }
> > 
> > In the above code, target = result, when promote xop0 and xop1 machine mode
> > to DI, then match adddi3 insn pattern and save the new rtx to temp, then
> > because the mclass equal MODE_INT (also equal true for
> > TRULY_NOOP_TRUNCATION_MODES_P function), the code enter else branch and
> > return lowpart of temp which is irrelevant with target (i.e. result).
> 
> Sure, but expand_simple_binop doesn't need to return 'target'
> 
> > Maybe the expand_binop function does not consider the case of dependency
> > with `target` when generating rtx for the case of promote MODE_INT mode, and
> > maybe theoretically it does not need to be considered, except that
> > builtin_strcmp happens to meet such cases, so I think it is enough to modify
> > the processing logic of builtin_strcmp (my humble opinion).
> 
> I'm still missing the obvious?  We assign the result to 'result', so it
> should be all fine?  I see that emit_cmp_and_jump_insns eventually forces
> the lowpart to a register but still it should work.

Thanks for Xi Ruoyao's explanation, here I will add a specific diff example.
Following is the expand pass result of add above bug patch:

;; expand_simple_binop in loop where i = 0
(insn 13 12 14 (set (reg:DI 87)
        (plus:DI (subreg:DI (reg:SI 86) 0)
                (const_int -45 [0xffffffffffffffd3]))))

;; emit_cmp_and_jump_insns in loop where i = 0
(insn 14 13 15 (set (reg:DI 88)
        (sign_extend:DI (subreg:SI (reg:DI 87) 0))))
(jump_insn 15 14 16 (set (pc)
        (if_then_else (ne (reg:DI 88)
                        (const_int 0 [0]))
                (label_ref 18)
                (pc))))
...
;; expand_simple_binop in loop where i = 1
(insn 17 16 18 (set (reg:DI 90)
        (plus:DI (subreg:DI (reg:SI 89) 0)
                (const_int 0 [0]))))
(code_label 18 17 19 3 (nil))

;; finish and return result -> subreg:SI (reg:DI 90) 0)
;; next insn
(insn 19 18 0 (set (reg:DI 80 [ _1 ])
        (sign_extend:DI (subreg:SI (reg:DI 90) 0))))

The problem is in insn 17, because the compare result is set in reg:DI 90,
which  
 is diff with compare result(loop0) set in reg:DI 87. The reason for this is
because after loop0 the rtx result = subreg:SI (reg:DI 87) 0), then in
expand_simple_binop function, due to no addsi3 insn pattern, gcc promote mode
and match adddi3 and finally return temporary register (reg:DI 90), and lowpart
it to match mode so after loop1 the result = (subreg:SI (reg:DI 90)).

The way to solve this kind of problem is to record the result of the previous
loop comparison in loop1 and later, and when the comparison result in the
current loop is not the same as the previous one, correct it by `move`, so that
no matter which branch it jumps from, the result is correct.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (9 preceding siblings ...)
  2023-03-15  4:52 ` liwei at loongson dot cn
@ 2023-03-15  7:33 ` xry111 at gcc dot gnu.org
  2023-03-15  7:36 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-15  7:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |xry111 at gcc dot gnu.org
             Target|loongarch                   |
            Summary|Bug of builtin_strcmp in    |__builtin_strcmp generates
                   |the case of using the       |wrong code if
                   |adddi3 instruction patterns |expand_simple_binop assigns
                   |                            |new pseudo-register for
                   |                            |result
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-03-15
     Ever confirmed|0                           |1
          Component|rtl-optimization            |other

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (10 preceding siblings ...)
  2023-03-15  7:33 ` [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result xry111 at gcc dot gnu.org
@ 2023-03-15  7:36 ` rguenth at gcc dot gnu.org
  2023-03-15  9:28 ` cvs-commit at gcc dot gnu.org
  2023-03-15  9:29 ` xry111 at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-15  7:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so the issue is that expand_simple_binop will not necessarily return a
register and emit_cmp_and_jump_insns eventually mangles things.

The proposed fix looks OK, but please add a comment like

 /* Force the result into a register.  */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (11 preceding siblings ...)
  2023-03-15  7:36 ` rguenth at gcc dot gnu.org
@ 2023-03-15  9:28 ` cvs-commit at gcc dot gnu.org
  2023-03-15  9:29 ` xry111 at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-15  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xi Ruoyao <xry111@gcc.gnu.org>:

https://gcc.gnu.org/g:45641f3a99281bb0a429649741a29c2aace4c63e

commit r13-6690-g45641f3a99281bb0a429649741a29c2aace4c63e
Author: Xi Ruoyao <xry111@xry111.site>
Date:   Wed Mar 15 15:34:52 2023 +0800

    builtins: Move the character difference into result instead of reassigning
result [PR109086]

    expand_simple_binop() is allowed to allocate a new pseudo-register and
    return it, instead of forcing the result into the provided
    pseudo-register.  This can cause a problem when we expand the unrolled
    loop for __builtin_strcmp: the compiler always generates code for all n
    iterations of the loop, so "result" will be an alias of the
    pseudo-register allocated and used in the last iteration; but at runtime
    the loop can break early, causing this pseudo-register uninitialized.

    Emit a move instruction in the iteration to force the difference into
    one register which has been allocated before the loop, to avoid this
    issue.

    gcc/ChangeLog:

            PR other/109086
            * builtins.cc (inline_string_cmp): Force the character
            difference into "result" pseudo-register, instead of reassign
            the pseudo-register.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result
  2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
                   ` (12 preceding siblings ...)
  2023-03-15  9:28 ` cvs-commit at gcc dot gnu.org
@ 2023-03-15  9:29 ` xry111 at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-15  9:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109086

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Pushed the patch with a comment explaining the if (result != diff) block.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-15  9:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10  7:36 [Bug target/109086] New: Bug of builtin_strcmp in the case of using the adddi3 instruction patterns liwei at loongson dot cn
2023-03-10  7:38 ` [Bug target/109086] " liwei at loongson dot cn
2023-03-10  8:18 ` [Bug rtl-optimization/109086] " rguenth at gcc dot gnu.org
2023-03-10  9:31 ` liwei at loongson dot cn
2023-03-10 13:12 ` rguenth at gcc dot gnu.org
2023-03-14 14:59 ` xry111 at gcc dot gnu.org
2023-03-15  2:00 ` xry111 at gcc dot gnu.org
2023-03-15  2:04 ` xry111 at gcc dot gnu.org
2023-03-15  2:17 ` liwei at loongson dot cn
2023-03-15  3:02 ` xry111 at gcc dot gnu.org
2023-03-15  4:52 ` liwei at loongson dot cn
2023-03-15  7:33 ` [Bug other/109086] __builtin_strcmp generates wrong code if expand_simple_binop assigns new pseudo-register for result xry111 at gcc dot gnu.org
2023-03-15  7:36 ` rguenth at gcc dot gnu.org
2023-03-15  9:28 ` cvs-commit at gcc dot gnu.org
2023-03-15  9:29 ` xry111 at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).