From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 2ADC2385841A for ; Wed, 1 Mar 2023 10:01:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ADC2385841A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,224,1673942400"; d="scan'208";a="99912851" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 01 Mar 2023 02:01:57 -0800 IronPort-SDR: QL+9EiVf+O9hPGpsRt/M+HmSvFSrG43Pzof+ikePkN8Hw30TQiQisw0WY9zVX26SdQAfuwYTH9 2fz7jZe8hVNeUBTzVphIFz19FRsOqNvZSO3zsatMF6LxdBzW/JWGeVpUPWc4JVgwA8YdBFoLho FWLq+bVeZjYPnslBxmpQEDaqegGikCxkEU2UhcI930ibJUqEnT6cK1yrJwLYHEKEkiDdeJtzFn EAxmBDWksUGHde0k3JkqQjZpo2saZaesniiCniopSEF/heNxxiyIdBdNbgZQ6HiCalQSMRMSHg aho= Message-ID: Date: Wed, 1 Mar 2023 10:01:52 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions Content-Language: en-GB To: Kwok Cheung Yeung , gcc-patches References: From: Andrew Stubbs In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 28/02/2023 23:01, Kwok Cheung Yeung wrote: > Hello > > This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION > target hook for the AMD GCN architecture, such that when vectorized, > calls to builtin standard math functions such as asinf, exp, pow etc. > are converted to calls to the recently added vectorized math functions > for GCN in Newlib. The -fno-math-errno flag is required in addition to > the usual vectorization optimization flags for this to occur, and some > of the math functions (the larger double-precision ones) require a large > stack size to function properly. > > This patch requires the GCN vector math functions in Newlib to function > - these were included in the recent 4.3.0.20230120 snapshot. As this was > a minimum requirement starting from the patch 'amdgcn, libgomp: Manually > allocated stacks', this should not be a problem. > > I have added new testcases in the testsuite that compare the output of > the vectorized math functions against the scalar, passing if they are > sufficiently close. With the testcase for standalone GCN (without > libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently > cannot set the stack size correctly in DejaGnu testing, so I have made > it a compile test for now - it is still useful to check that calls to > the correct functions are being made. The runtime correctness is still > covered by the libgomp test. > > Okay for trunk? The main part of the patch is OK, with the small changes below. Others have pointed out that "omp declare simd" exists, but you and I have been all through that verbally, long ago, and as Tobias says the offload compiler cannot rely on markup in the host compiler's header files to solve this problem. > @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl) > gcn_simd_clone_compute_vecsize_and_simdlen > #undef TARGET_SIMD_CLONE_USABLE > #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable > +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION > +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ > + gcn_vectorize_builtin_vectorized_function > +#undef TARGET_LIBC_HAS_FUNCTION > +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function > #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P > #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ > gcn_small_register_classes_for_mode_p Please keep these in alphabetical order. > +/* Ideally this test should be run, but the math routines require a large > + stack and gcn-run currently does not respect the stack-size parameter. */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */ This isn't ideal. The dg-set-target-env-var directive (I think this is it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise that doesn't work with remote test targets (like ours). I suggest adding an additional test that sets the envvar and #includes the code from this one; one test to scan the dumps, one test to run it. Like this .... (untested, syntax uncertain). /* { dg-do run } */ /* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ /* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */ #include "simd-math-1.c" The run test will get skipped in our test environment (and anyone else using remote), but the libgomp test should make up for that. Andrew