public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
@ 2023-11-01 11:29 Giuseppe Tagliavini
  2023-11-01 16:11 ` Jeff Law
  2023-11-02  8:03 ` Richard Biener
  0 siblings, 2 replies; 6+ messages in thread
From: Giuseppe Tagliavini @ 2023-11-01 11:29 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]

I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.

Consider this simple test:

#include <stdio.h>
int f(unsigned int a) {
  unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
  if(res>0) printf("test passed\n");
  return res-1;
}

I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.

According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:

.LC0:
        .asciz  "test"
f:
        save    %sp, -96, %sp
        call    __clzsi2, 0
         mov    %i0, %o0
        mov     %o0, %i0
        sethi   %hi(.LC0), %o0
        call    printf, 0
         or     %o0, %lo(.LC0), %o0
        mov     31, %g1
        return  %i7+8
         sub    %g1, %o0, %o0

After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.

Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
  2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
@ 2023-11-01 16:11 ` Jeff Law
  2023-11-01 17:21   ` Giuseppe Tagliavini
  2023-11-02  8:03 ` Richard Biener
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Law @ 2023-11-01 16:11 UTC (permalink / raw)
  To: Giuseppe Tagliavini, gcc



On 11/1/23 05:29, Giuseppe Tagliavini via Gcc wrote:
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
> 
> Consider this simple test:
> 
> #include <stdio.h>
> int f(unsigned int a) {
>    unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
>    if(res>0) printf("test passed\n");
>    return res-1;
> }
> 
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
> 
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
> 
> .LC0:
>          .asciz  "test"
> f:
>          save    %sp, -96, %sp
>          call    __clzsi2, 0
>           mov    %i0, %o0
>          mov     %o0, %i0
>          sethi   %hi(.LC0), %o0
>          call    printf, 0
>           or     %o0, %lo(.LC0), %o0
>          mov     31, %g1
>          return  %i7+8
>           sub    %g1, %o0, %o0
> 
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
> 
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
It would help if you included the debugging dumps.

Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
  2023-11-01 16:11 ` Jeff Law
@ 2023-11-01 17:21   ` Giuseppe Tagliavini
  0 siblings, 0 replies; 6+ messages in thread
From: Giuseppe Tagliavini @ 2023-11-01 17:21 UTC (permalink / raw)
  To: jeffreyalaw, gcc


[-- Attachment #1.1: Type: text/plain, Size: 3512 bytes --]

Sure, I include the relevant tree dumps obtained with the "releases/gcc-11" branch.
The "patch_" variants represent the dumps after disabling the check on the internal flag (I include the patch for both "releases/gcc-11" and "master" branches).
The pass under investigation is "evpr"; you can see how the if condition is removed and related BBs are merged if the range analysis provides what I think is an unexpected result. The optimized dump changes accordingly, but the troublesome transformation is the one performed by the gimple VRP.

Giuseppe


________________________________
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Wednesday, November 1, 2023 5:11 PM
To: Giuseppe Tagliavini <giuseppe.tagliavini@unibo.it>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
Subject: Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz



On 11/1/23 05:29, Giuseppe Tagliavini via Gcc wrote:
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
>
> Consider this simple test:
>
> #include <stdio.h>
> int f(unsigned int a) {
>    unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
>    if(res>0) printf("test passed\n");
>    return res-1;
> }
>
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
>
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
>
> .LC0:
>          .asciz  "test"
> f:
>          save    %sp, -96, %sp
>          call    __clzsi2, 0
>           mov    %i0, %o0
>          mov     %o0, %i0
>          sethi   %hi(.LC0), %o0
>          call    printf, 0
>           or     %o0, %lo(.LC0), %o0
>          mov     31, %g1
>          return  %i7+8
>           sub    %g1, %o0, %o0
>
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
>
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
It would help if you included the debugging dumps.

Jeff

[-- Attachment #2: gcc-master.patch --]
[-- Type: application/octet-stream, Size: 493 bytes --]

--- a/gcc/gimple-range-op.cc	2023-11-01 17:47:16.181068263 +0100
+++ b/gcc/gimple-range-op.cc	2023-11-01 17:48:51.928619647 +0100
@@ -929,7 +929,8 @@
   int maxi = prec - 1;
   int zerov = 0;
   scalar_int_mode mode = SCALAR_INT_TYPE_MODE (lh.type ());
-  if (m_gimple_call_internal_p)
+  // TODO Check the validity of this condition
+  // if (m_gimple_call_internal_p)
     {
       if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
 	  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)

[-- Attachment #3: test.c.244t.optimized --]
[-- Type: application/octet-stream, Size: 326 bytes --]


;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)

int f (unsigned int a)
{
  int _1;
  unsigned int _2;
  int _6;
  unsigned int _7;

  <bb 2> [local count: 1073741824]:
  _1 = __builtin_clz (a_3(D));
  printf ("test");
  _7 = (unsigned int) _1;
  _2 = 31 - _7;
  _6 = (int) _2;
  return _6;

}



[-- Attachment #4: patch_test.c.244t.optimized --]
[-- Type: application/octet-stream, Size: 500 bytes --]


;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)

Removing basic block 5
int f (unsigned int a)
{
  int _1;
  unsigned int _3;
  int _9;
  unsigned int _10;

  <bb 2> [local count: 1073741824]:
  _1 = __builtin_clz (a_5(D));
  if (_1 != 32)
    goto <bb 3>; [33.00%]
  else
    goto <bb 4>; [67.00%]

  <bb 3> [local count: 354334800]:
  printf ("test");

  <bb 4> [local count: 1073741824]:
  _10 = (unsigned int) _1;
  _3 = 31 - _10;
  _9 = (int) _3;
  return _9;

}



[-- Attachment #5: patch_test.c.038t.evrp --]
[-- Type: application/octet-stream, Size: 823 bytes --]


;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }

Value ranges after Early VRP:

_1: int [0, 32]
_2: int [0, 32]
_3: unsigned int ~[32, 4294967294]
a_5(D): unsigned int VARYING
res_6: unsigned int [0, 32]
_9: int [-1, 31]
_10: unsigned int [0, 32]


int f (unsigned int a)
{
  unsigned int res;
  int _1;
  int _2;
  unsigned int _3;
  int _9;
  unsigned int _10;

  <bb 2> :
  _1 = __builtin_clz (a_5(D));
  _2 = 32 - _1;
  res_6 = (unsigned int) _2;
  if (res_6 != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  printf ("test");

  <bb 4> :
  _10 = (unsigned int) _1;
  _3 = 31 - _10;
  _9 = (int) _3;
  return _9;

}



[-- Attachment #6: test.c.006t.gimple --]
[-- Type: application/octet-stream, Size: 290 bytes --]

int f (unsigned int a)
{
  int D.2347;
  unsigned int res;

  _1 = __builtin_clz (a);
  _2 = 32 - _1;
  res = (unsigned int) _2;
  if (res != 0) goto <D.2345>; else goto <D.2346>;
  <D.2345>:
  printf ("test");
  <D.2346>:
  _3 = res + 4294967295;
  D.2347 = (int) _3;
  return D.2347;
}



[-- Attachment #7: test.c.037t.fre1 --]
[-- Type: application/octet-stream, Size: 622 bytes --]


;; Function f (f, funcdef_no=0, decl_uid=1409, cgraph_uid=1, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
int f (unsigned int a)
{
  unsigned int res;
  int _1;
  int _2;
  unsigned int _3;
  int _9;
  unsigned int _10;

  <bb 2> :
  _1 = __builtin_clz (a_5(D));
  _2 = 32 - _1;
  res_6 = (unsigned int) _2;
  if (res_6 != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  printf ("test");

  <bb 4> :
  _10 = (unsigned int) _1;
  _3 = 31 - _10;
  _9 = (int) _3;
  return _9;

}



[-- Attachment #8: test.c.038t.evrp --]
[-- Type: application/octet-stream, Size: 763 bytes --]


;; Function f (f, funcdef_no=0, decl_uid=1409, cgraph_uid=1, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }

Value ranges after Early VRP:

_1: int [0, 31]
_2: int [1, 32]
_3: unsigned int [0, 31]
a_5(D): unsigned int VARYING
res_6: unsigned int [1, 32]
_9: int [0, 31]
_10: unsigned int [0, 31]


Merging blocks 2 and 3
Merging blocks 2 and 4
int f (unsigned int a)
{
  unsigned int res;
  int _1;
  int _2;
  unsigned int _3;
  int _9;
  unsigned int _10;

  <bb 2> :
  _1 = __builtin_clz (a_5(D));
  _2 = 32 - _1;
  res_6 = (unsigned int) _2;
  printf ("test");
  _10 = (unsigned int) _1;
  _3 = 31 - _10;
  _9 = (int) _3;
  return _9;

}



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
  2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
  2023-11-01 16:11 ` Jeff Law
@ 2023-11-02  8:03 ` Richard Biener
  2023-11-02 16:44   ` Joseph Myers
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Biener @ 2023-11-02  8:03 UTC (permalink / raw)
  To: Giuseppe Tagliavini; +Cc: gcc

On Wed, Nov 1, 2023 at 12:30 PM Giuseppe Tagliavini via Gcc
<gcc@gcc.gnu.org> wrote:
>
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
>
> Consider this simple test:
>
> #include <stdio.h>
> int f(unsigned int a) {
>   unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
>   if(res>0) printf("test passed\n");
>   return res-1;
> }
>
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
>
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:

Note the semantic of __builtin_clz is _not_ altered by
CLZ_DEFINED_VALUE_AT_ZERO, the behavior
of __builtin_clz (x) is that is has undefined result for x == 0.
CLZ_DEFINED_VALUE_AT_ZERO
is only used to optimize code generation when the user writes say
x == 0 ? 0 : __builtin_clz (x)

Richard.

> .LC0:
>         .asciz  "test"
> f:
>         save    %sp, -96, %sp
>         call    __clzsi2, 0
>          mov    %i0, %o0
>         mov     %o0, %i0
>         sethi   %hi(.LC0), %o0
>         call    printf, 0
>          or     %o0, %lo(.LC0), %o0
>         mov     31, %g1
>         return  %i7+8
>          sub    %g1, %o0, %o0
>
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
>
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
  2023-11-02  8:03 ` Richard Biener
@ 2023-11-02 16:44   ` Joseph Myers
  2023-11-02 16:52     ` Jakub Jelinek
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Myers @ 2023-11-02 16:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Giuseppe Tagliavini, gcc

On Thu, 2 Nov 2023, Richard Biener via Gcc wrote:

> Note the semantic of __builtin_clz is _not_ altered by
> CLZ_DEFINED_VALUE_AT_ZERO, the behavior
> of __builtin_clz (x) is that is has undefined result for x == 0.

Note also the discussion in bug 111309 of possible future type-generic 
versions of operations such as clz (under a different name because 
__builtin_clz is already taken), in order to support _BitInt, where 
there's a plausible argument for defining the result for all argument 
values rather than leaving it undefined at 0.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
  2023-11-02 16:44   ` Joseph Myers
@ 2023-11-02 16:52     ` Jakub Jelinek
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Jelinek @ 2023-11-02 16:52 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Richard Biener, Giuseppe Tagliavini, gcc

On Thu, Nov 02, 2023 at 04:44:30PM +0000, Joseph Myers wrote:
> On Thu, 2 Nov 2023, Richard Biener via Gcc wrote:
> 
> > Note the semantic of __builtin_clz is _not_ altered by
> > CLZ_DEFINED_VALUE_AT_ZERO, the behavior
> > of __builtin_clz (x) is that is has undefined result for x == 0.
> 
> Note also the discussion in bug 111309 of possible future type-generic 
> versions of operations such as clz (under a different name because 
> __builtin_clz is already taken), in order to support _BitInt, where 
> there's a plausible argument for defining the result for all argument 
> values rather than leaving it undefined at 0.

I plan to work on that soon.
The current plan is to let the users choose what they want, by having
the type-generic variants of the builtin support either one or two
arguments (just for clz/ctz obviously), if only one is provided,
it is undefined at 0, if two are provided, then the second argument is
the value to be returned for 0.

So, __builtin_clzg (0uwb) or __builtin_clzg (0ULL) would be UB,
while __builtin_clzg (0uwb, 42) would return 42.
We'd need something like __builtin_widthof as well because _Bitwithof
or something similar hasn't been standardized yet, so people can use
the type-generic function in macro and ask for the number of bits
to be returned for 0 (or something based on that).

	Jakub


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-02 16:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
2023-11-01 16:11 ` Jeff Law
2023-11-01 17:21   ` Giuseppe Tagliavini
2023-11-02  8:03 ` Richard Biener
2023-11-02 16:44   ` Joseph Myers
2023-11-02 16:52     ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).