Hello

I've made the suggested changes. Should I hold off on committing this 
until GCC 13 has been branched off?

Kwok

On 01/03/2023 10:01 am, Andrew Stubbs wrote:
> On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
>> Hello
>>
>> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
>> target hook for the AMD GCN architecture, such that when vectorized, 
>> calls to builtin standard math functions such as asinf, exp, pow etc. 
>> are converted to calls to the recently added vectorized math functions 
>> for GCN in Newlib. The -fno-math-errno flag is required in addition to 
>> the usual vectorization optimization flags for this to occur, and some 
>> of the math functions (the larger double-precision ones) require a 
>> large stack size to function properly.
>>
>> This patch requires the GCN vector math functions in Newlib to 
>> function - these were included in the recent 4.3.0.20230120 snapshot. 
>> As this was a minimum requirement starting from the patch 'amdgcn, 
>> libgomp: Manually allocated stacks', this should not be a problem.
>>
>> I have added new testcases in the testsuite that compare the output of 
>> the vectorized math functions against the scalar, passing if they are 
>> sufficiently close. With the testcase for standalone GCN (without 
>> libgomp) in gcc.target/gcn/, there is a problem since gcn-run 
>> currently cannot set the stack size correctly in DejaGnu testing, so I 
>> have made it a compile test for now - it is still useful to check that 
>> calls to the correct functions are being made. The runtime correctness 
>> is still covered by the libgomp test.
>>
>> Okay for trunk?
> 
> The main part of the patch is OK, with the small changes below.
> 
> Others have pointed out that "omp declare simd" exists, but you and I 
> have been all through that verbally, long ago, and as Tobias says the 
> offload compiler cannot rely on markup in the host compiler's header 
> files to solve this problem.
> 
>> @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl)
>>    gcn_simd_clone_compute_vecsize_and_simdlen
>>  #undef  TARGET_SIMD_CLONE_USABLE
>>  #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
>> +  gcn_vectorize_builtin_vectorized_function
>> +#undef TARGET_LIBC_HAS_FUNCTION
>> +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function
>>  #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
>>  #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
>>    gcn_small_register_classes_for_mode_p
> 
> Please keep these in alphabetical order.
> 
>> +/* Ideally this test should be run, but the math routines require a 
>> large
>> +   stack and gcn-run currently does not respect the stack-size 
>> parameter.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno 
>> -mstack-size=3000000 -fdump-tree-vect" } */
> 
> This isn't ideal. The dg-set-target-env-var directive (I think this is 
> it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise 
> that doesn't work with remote test targets (like ours).
> 
> I suggest adding an additional test that sets the envvar and #includes 
> the code from this one; one test to scan the dumps, one test to run it. 
> Like this .... (untested, syntax uncertain).
> 
> /* { dg-do run } */
> /* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
> /* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */
> #include "simd-math-1.c"
> 
> The run test will get skipped in our test environment (and anyone else 
> using remote), but the libgomp test should make up for that.
> 
> Andrew