From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=EwFU=6Z=mentor.com=Andrew_Stubbs@sourceware.org>
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
	by sourceware.org (Postfix) with ESMTPS id 2ADC2385841A
	for <gcc-patches@gcc.gnu.org>; Wed,  1 Mar 2023 10:01:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ADC2385841A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.98,224,1673942400"; 
   d="scan'208";a="99912851"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
  by esa2.mentor.iphmx.com with ESMTP; 01 Mar 2023 02:01:57 -0800
IronPort-SDR: QL+9EiVf+O9hPGpsRt/M+HmSvFSrG43Pzof+ikePkN8Hw30TQiQisw0WY9zVX26SdQAfuwYTH9
 2fz7jZe8hVNeUBTzVphIFz19FRsOqNvZSO3zsatMF6LxdBzW/JWGeVpUPWc4JVgwA8YdBFoLho
 FWLq+bVeZjYPnslBxmpQEDaqegGikCxkEU2UhcI930ibJUqEnT6cK1yrJwLYHEKEkiDdeJtzFn
 EAxmBDWksUGHde0k3JkqQjZpo2saZaesniiCniopSEF/heNxxiyIdBdNbgZQ6HiCalQSMRMSHg
 aho=
Message-ID: <ec4efdfd-7c9e-6301-9430-fa9cb639bc0a@codesourcery.com>
Date: Wed, 1 Mar 2023 10:01:52 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.8.0
Subject: Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions
Content-Language: en-GB
To: Kwok Cheung Yeung <kcy@codesourcery.com>, gcc-patches
	<gcc-patches@gcc.gnu.org>
References: <f85990b0-1a7a-bc17-50c1-ef176cbd0dae@codesourcery.com>
From: Andrew Stubbs <ams@codesourcery.com>
In-Reply-To: <f85990b0-1a7a-bc17-50c1-ef176cbd0dae@codesourcery.com>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
> Hello
> 
> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
> target hook for the AMD GCN architecture, such that when vectorized, 
> calls to builtin standard math functions such as asinf, exp, pow etc. 
> are converted to calls to the recently added vectorized math functions 
> for GCN in Newlib. The -fno-math-errno flag is required in addition to 
> the usual vectorization optimization flags for this to occur, and some 
> of the math functions (the larger double-precision ones) require a large 
> stack size to function properly.
> 
> This patch requires the GCN vector math functions in Newlib to function 
> - these were included in the recent 4.3.0.20230120 snapshot. As this was 
> a minimum requirement starting from the patch 'amdgcn, libgomp: Manually 
> allocated stacks', this should not be a problem.
> 
> I have added new testcases in the testsuite that compare the output of 
> the vectorized math functions against the scalar, passing if they are 
> sufficiently close. With the testcase for standalone GCN (without 
> libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently 
> cannot set the stack size correctly in DejaGnu testing, so I have made 
> it a compile test for now - it is still useful to check that calls to 
> the correct functions are being made. The runtime correctness is still 
> covered by the libgomp test.
> 
> Okay for trunk?

The main part of the patch is OK, with the small changes below.

Others have pointed out that "omp declare simd" exists, but you and I 
have been all through that verbally, long ago, and as Tobias says the 
offload compiler cannot rely on markup in the host compiler's header 
files to solve this problem.

> @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl)
>    gcn_simd_clone_compute_vecsize_and_simdlen
>  #undef  TARGET_SIMD_CLONE_USABLE
>  #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
> +  gcn_vectorize_builtin_vectorized_function
> +#undef TARGET_LIBC_HAS_FUNCTION
> +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function
>  #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
>  #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
>    gcn_small_register_classes_for_mode_p

Please keep these in alphabetical order.

> +/* Ideally this test should be run, but the math routines require a large
> +   stack and gcn-run currently does not respect the stack-size parameter.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */

This isn't ideal. The dg-set-target-env-var directive (I think this is 
it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise 
that doesn't work with remote test targets (like ours).

I suggest adding an additional test that sets the envvar and #includes 
the code from this one; one test to scan the dumps, one test to run it. 
Like this .... (untested, syntax uncertain).

/* { dg-do run } */
/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
/* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */
#include "simd-math-1.c"

The run test will get skipped in our test environment (and anyone else 
using remote), but the libgomp test should make up for that.

Andrew