* Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
@ 2023-11-01 11:29 Giuseppe Tagliavini
2023-11-01 16:11 ` Jeff Law
2023-11-02 8:03 ` Richard Biener
0 siblings, 2 replies; 6+ messages in thread
From: Giuseppe Tagliavini @ 2023-11-01 11:29 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]
I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
Consider this simple test:
#include <stdio.h>
int f(unsigned int a) {
unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
if(res>0) printf("test passed\n");
return res-1;
}
I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
.LC0:
.asciz "test"
f:
save %sp, -96, %sp
call __clzsi2, 0
mov %i0, %o0
mov %o0, %i0
sethi %hi(.LC0), %o0
call printf, 0
or %o0, %lo(.LC0), %o0
mov 31, %g1
return %i7+8
sub %g1, %o0, %o0
After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
@ 2023-11-01 16:11 ` Jeff Law
2023-11-01 17:21 ` Giuseppe Tagliavini
2023-11-02 8:03 ` Richard Biener
1 sibling, 1 reply; 6+ messages in thread
From: Jeff Law @ 2023-11-01 16:11 UTC (permalink / raw)
To: Giuseppe Tagliavini, gcc
On 11/1/23 05:29, Giuseppe Tagliavini via Gcc wrote:
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
>
> Consider this simple test:
>
> #include <stdio.h>
> int f(unsigned int a) {
> unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
> if(res>0) printf("test passed\n");
> return res-1;
> }
>
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
>
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
>
> .LC0:
> .asciz "test"
> f:
> save %sp, -96, %sp
> call __clzsi2, 0
> mov %i0, %o0
> mov %o0, %i0
> sethi %hi(.LC0), %o0
> call printf, 0
> or %o0, %lo(.LC0), %o0
> mov 31, %g1
> return %i7+8
> sub %g1, %o0, %o0
>
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
>
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
It would help if you included the debugging dumps.
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
2023-11-01 16:11 ` Jeff Law
@ 2023-11-01 17:21 ` Giuseppe Tagliavini
0 siblings, 0 replies; 6+ messages in thread
From: Giuseppe Tagliavini @ 2023-11-01 17:21 UTC (permalink / raw)
To: jeffreyalaw, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3512 bytes --]
Sure, I include the relevant tree dumps obtained with the "releases/gcc-11" branch.
The "patch_" variants represent the dumps after disabling the check on the internal flag (I include the patch for both "releases/gcc-11" and "master" branches).
The pass under investigation is "evpr"; you can see how the if condition is removed and related BBs are merged if the range analysis provides what I think is an unexpected result. The optimized dump changes accordingly, but the troublesome transformation is the one performed by the gimple VRP.
Giuseppe
________________________________
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Wednesday, November 1, 2023 5:11 PM
To: Giuseppe Tagliavini <giuseppe.tagliavini@unibo.it>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
Subject: Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
On 11/1/23 05:29, Giuseppe Tagliavini via Gcc wrote:
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
>
> Consider this simple test:
>
> #include <stdio.h>
> int f(unsigned int a) {
> unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
> if(res>0) printf("test passed\n");
> return res-1;
> }
>
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
>
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
>
> .LC0:
> .asciz "test"
> f:
> save %sp, -96, %sp
> call __clzsi2, 0
> mov %i0, %o0
> mov %o0, %i0
> sethi %hi(.LC0), %o0
> call printf, 0
> or %o0, %lo(.LC0), %o0
> mov 31, %g1
> return %i7+8
> sub %g1, %o0, %o0
>
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
>
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
It would help if you included the debugging dumps.
Jeff
[-- Attachment #2: gcc-master.patch --]
[-- Type: application/octet-stream, Size: 493 bytes --]
--- a/gcc/gimple-range-op.cc 2023-11-01 17:47:16.181068263 +0100
+++ b/gcc/gimple-range-op.cc 2023-11-01 17:48:51.928619647 +0100
@@ -929,7 +929,8 @@
int maxi = prec - 1;
int zerov = 0;
scalar_int_mode mode = SCALAR_INT_TYPE_MODE (lh.type ());
- if (m_gimple_call_internal_p)
+ // TODO Check the validity of this condition
+ // if (m_gimple_call_internal_p)
{
if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
&& CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
[-- Attachment #3: test.c.244t.optimized --]
[-- Type: application/octet-stream, Size: 326 bytes --]
;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)
int f (unsigned int a)
{
int _1;
unsigned int _2;
int _6;
unsigned int _7;
<bb 2> [local count: 1073741824]:
_1 = __builtin_clz (a_3(D));
printf ("test");
_7 = (unsigned int) _1;
_2 = 31 - _7;
_6 = (int) _2;
return _6;
}
[-- Attachment #4: patch_test.c.244t.optimized --]
[-- Type: application/octet-stream, Size: 500 bytes --]
;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)
Removing basic block 5
int f (unsigned int a)
{
int _1;
unsigned int _3;
int _9;
unsigned int _10;
<bb 2> [local count: 1073741824]:
_1 = __builtin_clz (a_5(D));
if (_1 != 32)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 3> [local count: 354334800]:
printf ("test");
<bb 4> [local count: 1073741824]:
_10 = (unsigned int) _1;
_3 = 31 - _10;
_9 = (int) _3;
return _9;
}
[-- Attachment #5: patch_test.c.038t.evrp --]
[-- Type: application/octet-stream, Size: 823 bytes --]
;; Function f (f, funcdef_no=3, decl_uid=2341, cgraph_uid=4, symbol_order=3)
;; 1 loops found
;;
;; Loop 0
;; header 0, latch 1
;; depth 0, outer -1
;; nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
Value ranges after Early VRP:
_1: int [0, 32]
_2: int [0, 32]
_3: unsigned int ~[32, 4294967294]
a_5(D): unsigned int VARYING
res_6: unsigned int [0, 32]
_9: int [-1, 31]
_10: unsigned int [0, 32]
int f (unsigned int a)
{
unsigned int res;
int _1;
int _2;
unsigned int _3;
int _9;
unsigned int _10;
<bb 2> :
_1 = __builtin_clz (a_5(D));
_2 = 32 - _1;
res_6 = (unsigned int) _2;
if (res_6 != 0)
goto <bb 3>; [INV]
else
goto <bb 4>; [INV]
<bb 3> :
printf ("test");
<bb 4> :
_10 = (unsigned int) _1;
_3 = 31 - _10;
_9 = (int) _3;
return _9;
}
[-- Attachment #6: test.c.006t.gimple --]
[-- Type: application/octet-stream, Size: 290 bytes --]
int f (unsigned int a)
{
int D.2347;
unsigned int res;
_1 = __builtin_clz (a);
_2 = 32 - _1;
res = (unsigned int) _2;
if (res != 0) goto <D.2345>; else goto <D.2346>;
<D.2345>:
printf ("test");
<D.2346>:
_3 = res + 4294967295;
D.2347 = (int) _3;
return D.2347;
}
[-- Attachment #7: test.c.037t.fre1 --]
[-- Type: application/octet-stream, Size: 622 bytes --]
;; Function f (f, funcdef_no=0, decl_uid=1409, cgraph_uid=1, symbol_order=0)
;; 1 loops found
;;
;; Loop 0
;; header 0, latch 1
;; depth 0, outer -1
;; nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
int f (unsigned int a)
{
unsigned int res;
int _1;
int _2;
unsigned int _3;
int _9;
unsigned int _10;
<bb 2> :
_1 = __builtin_clz (a_5(D));
_2 = 32 - _1;
res_6 = (unsigned int) _2;
if (res_6 != 0)
goto <bb 3>; [INV]
else
goto <bb 4>; [INV]
<bb 3> :
printf ("test");
<bb 4> :
_10 = (unsigned int) _1;
_3 = 31 - _10;
_9 = (int) _3;
return _9;
}
[-- Attachment #8: test.c.038t.evrp --]
[-- Type: application/octet-stream, Size: 763 bytes --]
;; Function f (f, funcdef_no=0, decl_uid=1409, cgraph_uid=1, symbol_order=0)
;; 1 loops found
;;
;; Loop 0
;; header 0, latch 1
;; depth 0, outer -1
;; nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
Value ranges after Early VRP:
_1: int [0, 31]
_2: int [1, 32]
_3: unsigned int [0, 31]
a_5(D): unsigned int VARYING
res_6: unsigned int [1, 32]
_9: int [0, 31]
_10: unsigned int [0, 31]
Merging blocks 2 and 3
Merging blocks 2 and 4
int f (unsigned int a)
{
unsigned int res;
int _1;
int _2;
unsigned int _3;
int _9;
unsigned int _10;
<bb 2> :
_1 = __builtin_clz (a_5(D));
_2 = 32 - _1;
res_6 = (unsigned int) _2;
printf ("test");
_10 = (unsigned int) _1;
_3 = 31 - _10;
_9 = (int) _3;
return _9;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
2023-11-01 16:11 ` Jeff Law
@ 2023-11-02 8:03 ` Richard Biener
2023-11-02 16:44 ` Joseph Myers
1 sibling, 1 reply; 6+ messages in thread
From: Richard Biener @ 2023-11-02 8:03 UTC (permalink / raw)
To: Giuseppe Tagliavini; +Cc: gcc
On Wed, Nov 1, 2023 at 12:30 PM Giuseppe Tagliavini via Gcc
<gcc@gcc.gnu.org> wrote:
>
> I found an unexpected issue working with an experimental target (available here: https://github.com/EEESlab/tricore-gcc), but I was able to reproduce it on mainstream architectures. For the sake of clarity and reproducibility, I always refer to upstream code in the rest of the discussion.
>
> Consider this simple test:
>
> #include <stdio.h>
> int f(unsigned int a) {
> unsigned int res = 8*sizeof(unsigned int) - __builtin_clz(a);
> if(res>0) printf("test passed\n");
> return res-1;
> }
>
> I tested this code on GCC 9 and GCC 11 branches, obtaining the expected result from GCC 9 and the wrong one from GCC 11. In GCC 11 and newer versions, the condition check is removed by a gimple-level optimization (I will provide details later), and the printf is always invoked at the assembly level with no branch.
>
> According to the GCC manual, __builtin_clz "returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined." However, it is possible to define a CLZ_DEFINED_VALUE_AT_ZERO in the architecture backend to specify a defined behavior for this case. For instance, this has been done for SPARC and AARCH64 architectures. Compiling my test with SPARC GCC 13.2.0 with the -O3 flag on CompilerExplorer I got this assembly:
Note the semantic of __builtin_clz is _not_ altered by
CLZ_DEFINED_VALUE_AT_ZERO, the behavior
of __builtin_clz (x) is that is has undefined result for x == 0.
CLZ_DEFINED_VALUE_AT_ZERO
is only used to optimize code generation when the user writes say
x == 0 ? 0 : __builtin_clz (x)
Richard.
> .LC0:
> .asciz "test"
> f:
> save %sp, -96, %sp
> call __clzsi2, 0
> mov %i0, %o0
> mov %o0, %i0
> sethi %hi(.LC0), %o0
> call printf, 0
> or %o0, %lo(.LC0), %o0
> mov 31, %g1
> return %i7+8
> sub %g1, %o0, %o0
>
> After some investigation, I found this optimization derives from the results of the value range propagation analysis: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-range-op.cc#L917
> In this code, I do not understand why CLZ_DEFINED_VALUE_AT_ZERO is verified only if the function call is tagged as internal. A gimple call is tagged as internal at creation time only when there is no associated function declaration (see https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple.cc#L371), which is not the case for the builtins. From my point of view, this condition prevents the computation of the correct upper bound for this case, resulting in a wrong result from the VRP analysis.
>
> Before considering this behavior as a bug, I prefer to ask the community to understand if there is any aspect I have missed in my reasoning.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
2023-11-02 8:03 ` Richard Biener
@ 2023-11-02 16:44 ` Joseph Myers
2023-11-02 16:52 ` Jakub Jelinek
0 siblings, 1 reply; 6+ messages in thread
From: Joseph Myers @ 2023-11-02 16:44 UTC (permalink / raw)
To: Richard Biener; +Cc: Giuseppe Tagliavini, gcc
On Thu, 2 Nov 2023, Richard Biener via Gcc wrote:
> Note the semantic of __builtin_clz is _not_ altered by
> CLZ_DEFINED_VALUE_AT_ZERO, the behavior
> of __builtin_clz (x) is that is has undefined result for x == 0.
Note also the discussion in bug 111309 of possible future type-generic
versions of operations such as clz (under a different name because
__builtin_clz is already taken), in order to support _BitInt, where
there's a plausible argument for defining the result for all argument
values rather than leaving it undefined at 0.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz
2023-11-02 16:44 ` Joseph Myers
@ 2023-11-02 16:52 ` Jakub Jelinek
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Jelinek @ 2023-11-02 16:52 UTC (permalink / raw)
To: Joseph Myers; +Cc: Richard Biener, Giuseppe Tagliavini, gcc
On Thu, Nov 02, 2023 at 04:44:30PM +0000, Joseph Myers wrote:
> On Thu, 2 Nov 2023, Richard Biener via Gcc wrote:
>
> > Note the semantic of __builtin_clz is _not_ altered by
> > CLZ_DEFINED_VALUE_AT_ZERO, the behavior
> > of __builtin_clz (x) is that is has undefined result for x == 0.
>
> Note also the discussion in bug 111309 of possible future type-generic
> versions of operations such as clz (under a different name because
> __builtin_clz is already taken), in order to support _BitInt, where
> there's a plausible argument for defining the result for all argument
> values rather than leaving it undefined at 0.
I plan to work on that soon.
The current plan is to let the users choose what they want, by having
the type-generic variants of the builtin support either one or two
arguments (just for clz/ctz obviously), if only one is provided,
it is undefined at 0, if two are provided, then the second argument is
the value to be returned for 0.
So, __builtin_clzg (0uwb) or __builtin_clzg (0ULL) would be UB,
while __builtin_clzg (0uwb, 42) would return 42.
We'd need something like __builtin_widthof as well because _Bitwithof
or something similar hasn't been standardized yet, so people can use
the type-generic function in macro and ask for the number of bits
to be returned for 0 (or something based on that).
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-11-02 16:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-01 11:29 Suspecting a wrong behavior in the value range propagation analysis for __builtin_clz Giuseppe Tagliavini
2023-11-01 16:11 ` Jeff Law
2023-11-01 17:21 ` Giuseppe Tagliavini
2023-11-02 8:03 ` Richard Biener
2023-11-02 16:44 ` Joseph Myers
2023-11-02 16:52 ` Jakub Jelinek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).