public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
@ 2023-08-02  5:41 Jeff Law
  2023-08-05  7:46 ` Xiao Zeng
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Law @ 2023-08-02  5:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Xiao Zeng

[-- Attachment #1: Type: text/plain, Size: 4466 bytes --]


c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We 
miscompile the "foo" routine because we have eliminated a required sign 
extension.

The key routine looks like this:

foo (long long int x, long long int y)
{
   if (((int) x | (int) y) != 0)
     return 6;
   return x + y;
}

So we kindof do the expected thing at expansion time.  We IOR X and Y, 
sign extend the result from 32 to 64 bits (note how the values in the 
source are casted from long long to ints), then emit a suitable 
conditional branch.  ie:

> (insn 10 4 12 2 (set (reg:DI 142)
>         (ior:DI (reg/v:DI 138 [ x ])
>             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>      (nil))
> (insn 12 10 13 2 (set (reg:DI 144)
>         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
>      (nil))
> (jump_insn 13 12 14 2 (set (pc)
>         (if_then_else (ne (reg:DI 144)
>                 (const_int 0 [0]))
>             (label_ref:DI 27)
>             (pc))) "j.c":6:6 243 {*branchdi}
>      (expr_list:REG_DEAD (reg:DI 144)
>         (int_list:REG_BR_PROB 233216732 (nil)))

When we if-convert that we generate this sequence:

> (insn 10 4 12 2 (set (reg:DI 142)
>         (ior:DI (reg/v:DI 138 [ x ])
>             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>      (nil))
> (insn 12 10 30 2 (set (reg:DI 144)
>         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
>      (nil))
> (insn 30 12 31 2 (set (reg:DI 147)
>         (const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit}
>      (nil))     
> (insn 31 30 33 2 (set (reg:DI 146)
>         (plus:DI (reg/v:DI 138 [ x ])
>             (reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3} 
>      (nil))      
> (insn 33 31 34 2 (set (reg:DI 149)
>         (if_then_else:DI (ne:DI (reg:DI 144)
>                 (const_int 0 [0]))
>             (const_int 0 [0])
>             (reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi}
>      (nil)) 
> (insn 34 33 35 2 (set (reg:DI 148) 
>         (if_then_else:DI (eq:DI (reg:DI 144)
>                 (const_int 0 [0]))
>             (const_int 0 [0])
>             (reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi}
>      (nil))
> (insn 35 34 21 2 (set (reg:DI 137 [ <retval> ])
>         (ior:DI (reg:DI 148)
>             (reg:DI 149))) "j.c":8:12 99 {iordi3}
>      (nil))

Which looks basically OK.  The sign extended subreg is a bit worrisome 
though.  And sure enough when we get into combine:

> Failed to match this instruction:
> (parallel [
>         (set (reg:DI 149)
>             (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>                     (const_int 0 [0]))
>                 (reg:DI 146)
>                 (const_int 0 [0])))
>         (set (reg:DI 144)
>             (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
>     ])
> Successfully matched this instruction:
> (set (reg:DI 144)
>     (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
> Successfully matched this instruction:
> (set (reg:DI 149)
>     (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>             (const_int 0 [0]))
>         (reg:DI 146)
>         (const_int 0 [0])))
> allowing combination of insns 12 and 33

Since we need the side effect we first try the PARALLEL with two sets. 
That, as expected, fails.  Generic combine code then tries to pull apart 
the two sets as distinct insns resulting in this conditional move:

> (insn 33 31 34 2 (set (reg:DI 149)
>         (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>                 (const_int 0 [0]))
>             (reg:DI 146)
>             (const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi}
>      (expr_list:REG_DEAD (reg:DI 146)
>         (nil)))

Bzzt.  We can't actually implement this RTL in the hardware.  Basically 
it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits 
of the input register.  That's not actually how zicond works.

The operands to the comparison need to be in DImode for rv64 and SImode 
for rv32.  That's the X iterator.  Note the mode of the comparison 
operands may be different than the mode of the destination.  ie, we 
might have a 64bit comparison and produce a 32bit sign extended result 
much like the setcc insns support.

This patch changes the 6 zicond patterns to use the X iterator on the 
comparison inputs.  That at least makes the patterns correct and fixes 
this particular testcase.   There's a few other lurking problems that 
I'll address in additional patches.

Committed to the trunk,
Jeff





[-- Attachment #2: P --]
[-- Type: text/plain, Size: 8890 bytes --]

commit 2d73f2eb80caf328bc4dd1324d475e7bf6b56837
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Tue Aug 1 23:12:16 2023 -0600

    [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
    
    c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We
    miscompile the "foo" routine because we have eliminated a required sign
    extension.
    
    The key routine looks like this:
    
    foo (long long int x, long long int y)
    {
      if (((int) x | (int) y) != 0)
        return 6;
      return x + y;
    }
    
    So we kindof do the expected thing.  We IOR X and Y, sign extend the result
    from 32 to 64 bits, then emit a suitable conditional branch.  ie:
    
    > (insn 10 4 12 2 (set (reg:DI 142)
    >         (ior:DI (reg/v:DI 138 [ x ])
    >             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
    >      (nil))
    > (insn 12 10 13 2 (set (reg:DI 144)
    >         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
    >      (nil))
    > (jump_insn 13 12 14 2 (set (pc)
    >         (if_then_else (ne (reg:DI 144)
    >                 (const_int 0 [0]))
    >             (label_ref:DI 27)
    >             (pc))) "j.c":6:6 243 {*branchdi}
    >      (expr_list:REG_DEAD (reg:DI 144)
    >         (int_list:REG_BR_PROB 233216732 (nil)))
    When we if-convert that we generate this sequence:
    
    > (insn 10 4 12 2 (set (reg:DI 142)
    >         (ior:DI (reg/v:DI 138 [ x ])
    >             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
    >      (nil))
    > (insn 12 10 30 2 (set (reg:DI 144)
    >         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
    >      (nil))
    > (insn 30 12 31 2 (set (reg:DI 147)
    >         (const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit}
    >      (nil))
    > (insn 31 30 33 2 (set (reg:DI 146)
    >         (plus:DI (reg/v:DI 138 [ x ])
    >             (reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3}
    >      (nil))
    > (insn 33 31 34 2 (set (reg:DI 149)
    >         (if_then_else:DI (ne:DI (reg:DI 144)
    >                 (const_int 0 [0]))
    >             (const_int 0 [0])
    >             (reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi}
    >      (nil))
    > (insn 34 33 35 2 (set (reg:DI 148)
    >         (if_then_else:DI (eq:DI (reg:DI 144)
    >                 (const_int 0 [0]))
    >             (const_int 0 [0])
    >             (reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi}
    >      (nil))
    > (insn 35 34 21 2 (set (reg:DI 137 [ <retval> ])
    >         (ior:DI (reg:DI 148)
    >             (reg:DI 149))) "j.c":8:12 99 {iordi3}
    >      (nil))
    Which looks basically OK.  The sign extended subreg is a bit worrisome though.
    And sure enough when we get into combine:
    
    > Failed to match this instruction:
    > (parallel [
    >         (set (reg:DI 149)
    >             (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
    >                     (const_int 0 [0]))
    >                 (reg:DI 146)
    >                 (const_int 0 [0])))
    >         (set (reg:DI 144)
    >             (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
    >     ])
    > Successfully matched this instruction:
    > (set (reg:DI 144)
    >     (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
    > Successfully matched this instruction:
    > (set (reg:DI 149)
    >     (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
    >             (const_int 0 [0]))
    >         (reg:DI 146)
    >         (const_int 0 [0])))
    > allowing combination of insns 12 and 33
    Since we need the side effect we first try the PARALLEL with two sets.
    That, as expected, fails.  Generic combine code then tries to pull apart
    the two sets as distinct insns resulting in this conditional move:
    
    > (insn 33 31 34 2 (set (reg:DI 149)
    >         (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
    >                 (const_int 0 [0]))
    >             (reg:DI 146)
    >             (const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi}
    >      (expr_list:REG_DEAD (reg:DI 146)
    >         (nil)))
    Bzzt.  We can't actually implement this RTL in the hardware.  Basically
    it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits
    of the input register.  That's not actually how zicond works.
    
    The operands to the comparison need to be in DImode for rv64 and SImode
    for rv32.  That's the X iterator.  Note the mode of the comparison
    operands may be different than the mode of the destination.  ie, we might
    have a 64bit comparison and produce a 32bit sign extended result much
    like the setcc insns support.
    
    This patch changes the 6 zicond patterns to use the X iterator on the
    comparison inputs and fixes the testsuite failure.
    
    gcc/
    
            * config/riscv/zicond.md: Use the X iterator instead of ANYI
            on the comparison input operands.

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 723a22422e1..8f24b3a1690 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -22,9 +22,9 @@ (define_code_attr eqz [(eq "nez") (ne "eqz")])
 (define_code_attr nez [(eq "eqz") (ne "nez")])
 
 ;; Zicond
-(define_insn "*czero.<eqz>.<GPR:mode><ANYI:mode>"
+(define_insn "*czero.<eqz>.<GPR:mode><X:mode>"
   [(set (match_operand:GPR 0 "register_operand"                      "=r")
-        (if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (eq_or_ne (match_operand:X 1 "register_operand" "r")
                                     (const_int 0))
                           (match_operand:GPR 2 "register_operand"    "r")
                           (const_int 0)))]
@@ -32,9 +32,9 @@ (define_insn "*czero.<eqz>.<GPR:mode><ANYI:mode>"
   "czero.<eqz>\t%0,%2,%1"
 )
 
-(define_insn "*czero.<nez>.<GPR:mode><ANYI:mode>"
+(define_insn "*czero.<nez>.<GPR:mode><X:mode>"
   [(set (match_operand:GPR 0 "register_operand"                     "=r")
-        (if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (eq_or_ne (match_operand:X 1 "register_operand" "r")
                                     (const_int 0))
                           (const_int 0)
                           (match_operand:GPR 2 "register_operand"   "r")))]
@@ -43,9 +43,9 @@ (define_insn "*czero.<nez>.<GPR:mode><ANYI:mode>"
 )
 
 ;; Special optimization under eq/ne in primitive semantics
-(define_insn "*czero.eqz.<GPR:mode><ANYI:mode>.opt1"
+(define_insn "*czero.eqz.<GPR:mode><X:mode>.opt1"
   [(set (match_operand:GPR 0 "register_operand"                   "=r")
-        (if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
                               (const_int 0))
                           (match_operand:GPR 2 "register_operand" "1")
                           (match_operand:GPR 3 "register_operand" "r")))]
@@ -53,9 +53,9 @@ (define_insn "*czero.eqz.<GPR:mode><ANYI:mode>.opt1"
   "czero.eqz\t%0,%3,%1"
 )
 
-(define_insn "*czero.eqz.<GPR:mode><ANYI:mode>.opt2"
+(define_insn "*czero.eqz.<GPR:mode><X:mode>.opt2"
   [(set (match_operand:GPR 0 "register_operand"                   "=r")
-        (if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
                               (const_int 0))
                           (match_operand:GPR 2 "register_operand" "r")
                           (match_operand:GPR 3 "register_operand" "1")))]
@@ -63,9 +63,9 @@ (define_insn "*czero.eqz.<GPR:mode><ANYI:mode>.opt2"
   "czero.nez\t%0,%2,%1"
 )
 
-(define_insn "*czero.nez.<GPR:mode><ANYI:mode>.opt3"
+(define_insn "*czero.nez.<GPR:mode><X:mode>.opt3"
   [(set (match_operand:GPR 0 "register_operand"                   "=r")
-        (if_then_else:GPR (ne (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
                               (const_int 0))
                           (match_operand:GPR 2 "register_operand" "r")
                           (match_operand:GPR 3 "register_operand" "1")))]
@@ -73,9 +73,9 @@ (define_insn "*czero.nez.<GPR:mode><ANYI:mode>.opt3"
   "czero.eqz\t%0,%2,%1"
 )
 
-(define_insn "*czero.nez.<GPR:mode><ANYI:mode>.opt4"
+(define_insn "*czero.nez.<GPR:mode><X:mode>.opt4"
   [(set (match_operand:GPR 0 "register_operand"                   "=r")
-        (if_then_else:GPR (ne (match_operand:ANYI 1 "register_operand" "r")
+        (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
                               (const_int 0))
                           (match_operand:GPR 2 "register_operand" "1")
                           (match_operand:GPR 3 "register_operand" "r")))]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
  2023-08-02  5:41 [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond Jeff Law
@ 2023-08-05  7:46 ` Xiao Zeng
  2023-08-05 14:56   ` Jeff Law
  0 siblings, 1 reply; 3+ messages in thread
From: Xiao Zeng @ 2023-08-05  7:46 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: gaofei

On Wed, Aug 02, 2023 at 01:41:00 PM  Jeff Law <jlaw@ventanamicro.com> wrote:
>
>c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We
>miscompile the "foo" routine because we have eliminated a required sign
>extension.
>
>The key routine looks like this:
>
>foo (long long int x, long long int y)
>{
>   if (((int) x | (int) y) != 0)
>     return 6;
>   return x + y;
>}
>
>So we kindof do the expected thing at expansion time.  We IOR X and Y,
>sign extend the result from 32 to 64 bits (note how the values in the
>source are casted from long long to ints), then emit a suitable
>conditional branch.  ie:
>
>> (insn 10 4 12 2 (set (reg:DI 142)
>>         (ior:DI (reg/v:DI 138 [ x ])
>>             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>>      (nil))
>> (insn 12 10 13 2 (set (reg:DI 144)
>>         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
>>      (nil))
>> (jump_insn 13 12 14 2 (set (pc)
>>         (if_then_else (ne (reg:DI 144)
>>                 (const_int 0 [0]))
>>             (label_ref:DI 27)
>>             (pc))) "j.c":6:6 243 {*branchdi}
>>      (expr_list:REG_DEAD (reg:DI 144)
>>         (int_list:REG_BR_PROB 233216732 (nil)))
>
>When we if-convert that we generate this sequence:
>
>> (insn 10 4 12 2 (set (reg:DI 142)
>>         (ior:DI (reg/v:DI 138 [ x ])
>>             (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>>      (nil))
>> (insn 12 10 30 2 (set (reg:DI 144)
>>         (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
>>      (nil))
>> (insn 30 12 31 2 (set (reg:DI 147)
>>         (const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit}
>>      (nil))    
>> (insn 31 30 33 2 (set (reg:DI 146)
>>         (plus:DI (reg/v:DI 138 [ x ])
>>             (reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3}
>>      (nil))     
>> (insn 33 31 34 2 (set (reg:DI 149)
>>         (if_then_else:DI (ne:DI (reg:DI 144)
>>                 (const_int 0 [0]))
>>             (const_int 0 [0])
>>             (reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi}
>>      (nil))
>> (insn 34 33 35 2 (set (reg:DI 148)
>>         (if_then_else:DI (eq:DI (reg:DI 144)
>>                 (const_int 0 [0]))
>>             (const_int 0 [0])
>>             (reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi}
>>      (nil))
>> (insn 35 34 21 2 (set (reg:DI 137 [ <retval> ])
>>         (ior:DI (reg:DI 148)
>>             (reg:DI 149))) "j.c":8:12 99 {iordi3}
>>      (nil))
>
>Which looks basically OK.  The sign extended subreg is a bit worrisome
>though.  And sure enough when we get into combine:
>
>> Failed to match this instruction:
>> (parallel [
>>         (set (reg:DI 149)
>>             (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>>                     (const_int 0 [0]))
>>                 (reg:DI 146)
>>                 (const_int 0 [0])))
>>         (set (reg:DI 144)
>>             (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
>>     ])
>> Successfully matched this instruction:
>> (set (reg:DI 144)
>>     (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
>> Successfully matched this instruction:
>> (set (reg:DI 149)
>>     (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>>             (const_int 0 [0]))
>>         (reg:DI 146)
>>         (const_int 0 [0])))
>> allowing combination of insns 12 and 33
>
>Since we need the side effect we first try the PARALLEL with two sets.
>That, as expected, fails.  Generic combine code then tries to pull apart
>the two sets as distinct insns resulting in this conditional move:
>
>> (insn 33 31 34 2 (set (reg:DI 149)
>>         (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>>                 (const_int 0 [0]))
>>             (reg:DI 146)
>>             (const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi}
>>      (expr_list:REG_DEAD (reg:DI 146)
>>         (nil)))
>
>Bzzt.  We can't actually implement this RTL in the hardware.  Basically
>it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits
>of the input register.  That's not actually how zicond works.
>
>The operands to the comparison need to be in DImode for rv64 and SImode
>for rv32.  That's the X iterator.  
After analyzing the rtl log, I can't agree more with this sentence.

>Note the mode of the comparison
>operands may be different than the mode of the destination.  ie, we
>might have a 64bit comparison and produce a 32bit sign extended result
>much like the setcc insns support.
>
>This patch changes the 6 zicond patterns to use the X iterator on the
>comparison inputs.  That at least makes the patterns correct and fixes
>this particular testcase.   There's a few other lurking problems that
>I'll address in additional patches.
>
>Committed to the trunk,
>Jeff

1 In any case, jeff's analysis is convincing, here I will add a little bit of my own analysis.

2 First, for the test cases:

foo(long long int x, long long int y) {
  if (((int)x | (int)y) != 0)
    return 6;
  return x + y;
}

look directly at the compared assembly code. This allows people to quickly
realize where the error occurred.

X_mode.s(right)                                                    ANYI_mode.S(error)
10 foo:                                                                  10 foo:                                                                                            
      11         or      a5,a0,a1                                          11        or      a5,a0,a1                                                                        
      12         sext.w  a5,a5                                            12        li      a4,6                                                                            
      13         addw    a0,a0,a1                                      13        addw    a0,a0,a1                                                                        
      14         li      a4,6                                                                                                                                                                                       
      15         czero.nez       a1,a0,a5                             14        czero.nez       a1,a0,a5                                                                
      16         czero.eqz       a0,a4,a5                             15        czero.eqz       a0,a4,a5                                                                
      17         or      a0,a0,a1                                          16        or      a0,a0,a1                                                                        
      18         ret                                                            17        ret        

3 You will find that the correct assembly is just one more assembly
instruction: sext.w  a5,a5, the rest of the two are exactly the same.

4 From the perspective of assembly instructions, the a5 value obtained
by sext.w a5, a5 may be different from the original a5 value, which leads
to errors in the test case.

5 However, it is difficult to directly see that an error has occurred
from the rtl log of gcc's passe.

6 I'm wondering about transforms like this:

In test.c.c.301r.ira
(insn 36 34 42 2 (set (reg:DI 153)
        (if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0)
                (const_int 0 [0]))
            (reg:DI 149)
            (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
     (expr_list:REG_DEAD (reg:DI 149)
        (nil)))

In test.c.c.302r.reload it becomes
(insn 36 34 42 2 (set (reg:DI 11 a1 [153])
        (if_then_else:DI (eq:DI (reg:SI 15 a5 [145])
                (const_int 0 [0]))
            (reg:DI 10 a0 [149])
            (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
     (nil))

Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145]) after
passing through reload pass. This conversion is wrong, why did gcc not warn?

7 I'm not very familiar with reload pass, maybe someone can give me a
brief introduction, or tell me where to find relevant information? Thanks.

 
Thanks
Xiao Zeng


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
  2023-08-05  7:46 ` Xiao Zeng
@ 2023-08-05 14:56   ` Jeff Law
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff Law @ 2023-08-05 14:56 UTC (permalink / raw)
  To: Xiao Zeng, gcc-patches; +Cc: gaofei



On 8/5/23 01:46, Xiao Zeng wrote:

>>
>> The operands to the comparison need to be in DImode for rv64 and SImode
>> for rv32.  That's the X iterator.
> After analyzing the rtl log, I can't agree more with this sentence.
> 
>> Note the mode of the comparison
>> operands may be different than the mode of the destination.  ie, we
>> might have a 64bit comparison and produce a 32bit sign extended result
>> much like the setcc insns support.
>>
>> This patch changes the 6 zicond patterns to use the X iterator on the
>> comparison inputs.  That at least makes the patterns correct and fixes
>> this particular testcase.   There's a few other lurking problems that
>> I'll address in additional patches.
>>
>> Committed to the trunk,
>> Jeff
> 
> 1 In any case, jeff's analysis is convincing, here I will add a little bit of my own analysis.
> 
> 2 First, for the test cases:
> 
> foo(long long int x, long long int y) {
>    if (((int)x | (int)y) != 0)
>      return 6;
>    return x + y;
> }
> 
> look directly at the compared assembly code. This allows people to quickly
> realize where the error occurred.
> 
> X_mode.s(right)                                                    ANYI_mode.S(error)
> 10 foo:                                                                  10 foo:
>        11         or      a5,a0,a1                                          11        or      a5,a0,a1
>        12         sext.w  a5,a5                                            12        li      a4,6
>        13         addw    a0,a0,a1                                      13        addw    a0,a0,a1
>        14         li      a4,6
>        15         czero.nez       a1,a0,a5                             14        czero.nez       a1,a0,a5
>        16         czero.eqz       a0,a4,a5                             15        czero.eqz       a0,a4,a5
>        17         or      a0,a0,a1                                          16        or      a0,a0,a1
>        18         ret                                                            17        ret
> 
> 3 You will find that the correct assembly is just one more assembly
> instruction: sext.w  a5,a5, the rest of the two are exactly the same.
> 
> 4 From the perspective of assembly instructions, the a5 value obtained
> by sext.w a5, a5 may be different from the original a5 value, which leads
> to errors in the test case.
Exactly.

> 
> 5 However, it is difficult to directly see that an error has occurred
> from the rtl log of gcc's passe.
Yes.  It's pretty subtle, but par for the course once subreg expressions 
are involved.  subregs are a major wart in the design/implementation of 
RTL due to them having fairly obscure semantics that are dependent on 
whether they are widening or narrowing subregs, if the inner object is a 
mem or something else, etc and how they interact with 
WORD_REGISTER_OPERATIONS.


> 
> 6 I'm wondering about transforms like this:
> 
> In test.c.c.301r.ira
> (insn 36 34 42 2 (set (reg:DI 153)
>          (if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0)
>                  (const_int 0 [0]))
>              (reg:DI 149)
>              (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
>       (expr_list:REG_DEAD (reg:DI 149)
>          (nil)))
But it's already broken in this form.   There was a sign extension in 
the IL up through the combine pass.  In fact, I nearly made the argument 
that the bug was in combine's simplification of (sign_extend (subreg 
...)) to (subreg ...).

But ultimately the hardware semantics of risc-v are that comparisons 
look at the full register, plain and simple.  So a narrowing subreg to a 
sub-word mode is simply wrong i this context.  This is also consistent 
with the implementation of conditional branches and scc insns in the 
risc-v backend.

> 
> In test.c.c.302r.reload it becomes
> (insn 36 34 42 2 (set (reg:DI 11 a1 [153])
>          (if_then_else:DI (eq:DI (reg:SI 15 a5 [145])
>                  (const_int 0 [0]))
>              (reg:DI 10 a0 [149])
>              (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
>       (nil))
> 
> Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145]) after
> passing through reload pass. This conversion is wrong, why did gcc not warn?
The only reason subregs exist is because pseudo registers have a single 
mode -- which derives from the fact that there is precisely once 
instance of any given pseudo register.  See RTL sharing section of the 
internals manual.

A subreg expression allows us to look at a pseudo in a mode other than 
its natural mode.   Essentially its the same bits reinterpreted in 
another mode.  I'm over-simplifying a lot.  The full details of subreg 
semantics are documented in the RTL section of the internals manual.

WORD_REGISTER_OPERATIONS is a property of the target that (again 
over-simplifying) says that an operation such as an add may have a given 
size (say SImode) in the RTL, but the hardware performs the operation in 
its native word size (DImode for rv64).

While WORD_REGISTER_OPERATIONS has been around since the 90s, its 
semantics have always been a bit vague.  It's seen as a wart and I'd 
very much like to get away from using it on the RISC-V port.

Register allocation assigns hard registers to the pseudo registers and 
the RTL is changed to reflect that register allocation.  At that point 
we have no more pseudos in the IL as they have been replaced by hard 
registers.  Register allocation is split into two distinct phases.  IRA 
and LRA.  It's an over-simplification, but IRA is the main register 
allocator and LRA is primarily tasked with making each operand match its 
constraints -- ie spilling and reloading.

Hard registers are not shared objects, meaning we can have multiple 
instances of the same hard register.  So near the end of register 
allocation we transform all subregs into an appropriate reg expression. 
This simplifies the implementation of various passes that run after 
register allocation/reloading.

So the transformation (subreg:SI (reg:DI a5)) into (reg:SI a5) by the 
register allocation is valid and normal.  That subreg expression should 
never have shown up in a comparison to begin with.


> 
> 7 I'm not very familiar with reload pass, maybe someone can give me a
> brief introduction, or tell me where to find relevant information? Thanks.
There is some discussion of reload in the internals manual which covers 
some of the basics.  The implementation details are incredibly complex. 
Additionally there are two completely distinct implementations.  There 
is "reload" which was the original implementation from the late 1980s 
and is still used by some targets, though it's considered deprecated at 
this point.  And there's the LRA implementation which was added in the 
2000s.

The RISC-V port is an LRA port.  The best documentation of LRA is the 
code itself (lra-*.cc) and the papers that have presented at the various 
GNU Cauldron and GCC Summit conferences.

Jeff

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-08-05 14:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-02  5:41 [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond Jeff Law
2023-08-05  7:46 ` Xiao Zeng
2023-08-05 14:56   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).