Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? Kwok On 01/03/2023 10:01 am, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN architecture, such that when vectorized, >> calls to builtin standard math functions such as asinf, exp, pow etc. >> are converted to calls to the recently added vectorized math functions >> for GCN in Newlib. The -fno-math-errno flag is required in addition to >> the usual vectorization optimization flags for this to occur, and some >> of the math functions (the larger double-precision ones) require a >> large stack size to function properly. >> >> This patch requires the GCN vector math functions in Newlib to >> function - these were included in the recent 4.3.0.20230120 snapshot. >> As this was a minimum requirement starting from the patch 'amdgcn, >> libgomp: Manually allocated stacks', this should not be a problem. >> >> I have added new testcases in the testsuite that compare the output of >> the vectorized math functions against the scalar, passing if they are >> sufficiently close. With the testcase for standalone GCN (without >> libgomp) in gcc.target/gcn/, there is a problem since gcn-run >> currently cannot set the stack size correctly in DejaGnu testing, so I >> have made it a compile test for now - it is still useful to check that >> calls to the correct functions are being made. The runtime correctness >> is still covered by the libgomp test. >> >> Okay for trunk? > > The main part of the patch is OK, with the small changes below. > > Others have pointed out that "omp declare simd" exists, but you and I > have been all through that verbally, long ago, and as Tobias says the > offload compiler cannot rely on markup in the host compiler's header > files to solve this problem. > >> @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl) >>    gcn_simd_clone_compute_vecsize_and_simdlen >>  #undef  TARGET_SIMD_CLONE_USABLE >>  #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable >> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ >> +  gcn_vectorize_builtin_vectorized_function >> +#undef TARGET_LIBC_HAS_FUNCTION >> +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function >>  #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P >>  #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ >>    gcn_small_register_classes_for_mode_p > > Please keep these in alphabetical order. > >> +/* Ideally this test should be run, but the math routines require a >> large >> +   stack and gcn-run currently does not respect the stack-size >> parameter.  */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno >> -mstack-size=3000000 -fdump-tree-vect" } */ > > This isn't ideal. The dg-set-target-env-var directive (I think this is > it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise > that doesn't work with remote test targets (like ours). > > I suggest adding an additional test that sets the envvar and #includes > the code from this one; one test to scan the dumps, one test to run it. > Like this .... (untested, syntax uncertain). > > /* { dg-do run } */ > /* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ > /* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */ > #include "simd-math-1.c" > > The run test will get skipped in our test environment (and anyone else > using remote), but the libgomp test should make up for that. > > Andrew