From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31531 invoked by alias); 15 Mar 2017 09:50:26 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 31476 invoked by uid 89); 15 Mar 2017 09:50:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: NAM02-SN1-obe.outbound.protection.outlook.com Received: from mail-sn1nam02on0065.outbound.protection.outlook.com (HELO NAM02-SN1-obe.outbound.protection.outlook.com) (104.47.36.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 15 Mar 2017 09:50:21 +0000 Received: from BY2PR07MB2421.namprd07.prod.outlook.com (10.166.115.13) by BY2PR07MB2422.namprd07.prod.outlook.com (10.166.115.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.961.17; Wed, 15 Mar 2017 09:50:18 +0000 Received: from BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) by BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) with mapi id 15.01.0961.022; Wed, 15 Mar 2017 09:50:18 +0000 From: "Sekhar, Ashwin" To: "gcc@gcc.gnu.org" CC: "richard.earnshaw@arm.com" , "marcus.shawcroft@arm.com" , "james.greenhalgh@arm.com" Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP Date: Wed, 15 Mar 2017 09:50:00 -0000 Message-ID: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=cavium.com; x-microsoft-exchange-diagnostics: 1;BY2PR07MB2422;7:7VnIFmSqTLk/RNua3j0gI7tAKpQuatqEEWrv4cxniN401zH932xWJQA7utUacxzMFaC7ZpmwXOrMSfivQcH0tH5pEYjwRxQhzBkDXRSj5oGPeZLZxLmu2BQDbO0WpMo42zLBBldiJyBLTsjSA6XvW+HzUVDQ17FogA1iU0Rlc7IxMQyDI92NiYNKlGb8TK/tdZ2s/IPcuKZyddHr5/p69YMAG/XrrLohvxpp1L2L3W9EgnaDBNfwDNVJvxxD4EmL4kIrbSDDMTcc5sGCT4sdxLvdC5B0pP11eQ/lUN1Ksp9FPBlIZmRMxjZdlaWffXgu5wqiK9CBtw6898iOa6RaRQ== x-ms-office365-filtering-correlation-id: 4db52fa4-93b5-4ad4-c375-08d46b88b07b x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BY2PR07MB2422; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(20161123562025)(20161123555025)(20161123558025)(20161123564025)(20161123560025)(6072148);SRVR:BY2PR07MB2422;BCL:0;PCL:0;RULEID:;SRVR:BY2PR07MB2422; x-forefront-prvs: 02475B2A01 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39450400003)(8676002)(66066001)(4326008)(81166006)(8936002)(6916009)(189998001)(50986999)(122556002)(7696004)(305945005)(86362001)(54356999)(7736002)(2900100001)(74316002)(2351001)(110136004)(1730700003)(6306002)(2906002)(38730400002)(9686003)(77096006)(53936002)(966004)(5660300001)(33656002)(3846002)(6116002)(102836003)(561944003)(3660700001)(25786008)(6436002)(6506006)(5890100001)(3280700002)(55016002)(2501003)(54906002)(99286003)(5640700003);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR07MB2422;H:BY2PR07MB2421.namprd07.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Mar 2017 09:50:18.6082 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR07MB2422 X-IsSubscribed: yes X-SW-Source: 2017-03/txt/msg00077.txt.bz2 Hi GCC Team, Aarch64 Maintainers, The rules in Vector Function Application Binary Interface Specification fo= r OpenMP (https://sourceware.org/glibc/wiki/libmvec?action=3DAttachFile&do= =3Dview&target=3DVectorABI.txt) is used in x86 for generating the simd clo= nes of a function. Is there a similar one defined for Aarch64? If not, would like to start a discussion on the same for Aarch64. To kick = start the same, a draft proposal for Aarch64 (on the same lines as x86 ABI= ) is included below. The only change from x86 ABI is in the function name = mangling. Here the letter 'b' is used for indicating the ASIMD isa. Please review and comment. Thanks and Regards, Ashwin Sekhar T K ------------------------------------ CUT HERE -----------------------------= ----- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D =A0Aarch64 Vector Function Application Binary Interface Specification for O= penMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D 1. Vector Function ABI Overview Aarch64 Vector Function ABI provides ABI for the vector functions generated= by compiler supporting SIMD constructs of OpenMP 4.0 [1] in Aarch64. This is based on the x86 Vector Function Application Binary Interface Specification= for OpenMP [2]. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D 2. Vector Function ABI Vector Function ABI defines a set of rules that the caller and the callee functions must obey. These rules consist of: =A0 * Calling convention =A0 * Vector length (the number of concurrent scalar invocations to be proc= essed =A0=A0=A0 per invocation of the vector function) =A0 * Mapping from element data types to vector data types =A0 * Ordering of vector arguments =A0 * Vector function masking =A0 * Vector function name mangling =A0 * Compiler generated variants of vector function ---------------------------------------------------------------------------= ----- 2.1. Calling Convention The vector functions should use calling convention described in Procedure C= all Standard for the ARM 64-bit Architecture (AArch64) [3]. ---------------------------------------------------------------------------= ----- 2.2. Vector Length Every vector variant of a SIMD-enabled function has a vector length (VLEN).= If OpenMP clause "simdlen" is used, the VLEN is the value of the argument of t= hat clause. The VLEN value must be power of 2. In other case the notion of the function`s "characteristic data type" (CDT) is used to compute the vector length. CDT is defined in the following order: =A0 a) For non-void function, the CDT is the return type. =A0 b) If the function has any non-uniform, non-linear parameters, then the= CDT =A0=A0=A0=A0 is the type of the first such parameter. =A0 c) If the CDT determined by a) or b) above is struct, union, or class t= ype =A0=A0=A0=A0 which is pass-by-value (except for the type that maps to the b= uilt-in =A0=A0=A0=A0 complex data type), the characteristic data type is int. =A0 d) If none of the above three cases is applicable, the CDT is int. VLEN=A0 =3D sizeof(vector_register) / sizeof(CDT), For example, if ISA is ASIMD, sizeof(vector_register) =3D 16, as the vector registers are 128 bit. And if the CDT of the function is "int", sizeof(CDT)= =3D 4. So, VLEN =3D 4. ---------------------------------------------------------------------------= ----- 2.3. Element Data Type to Vector Data Type Mapping The vector data types for parameters are selected depending on ISA, vector length, data type of original parameter, and parameter specification. For uniform and linear parameters (detailed description could be found in [= 1]), the original data type is preserved. For vector parameters, vector data types are selected by the compiler. The mapping from element data type to vector data type is described as below. =A0 * The bit size of vector data type of parameter is computed as: =A0=A0=A0 size_of_vector_data_type =3D VLEN * sizeof(original_parameter_dat= a_type) * 8 =A0=A0=A0 For instance, for ASIMD version of vector function with parameter= data type =A0=A0=A0 "int": If VLEN =3D 4, size_of_vector_data_type =3D 4 * 4 * 8 =3D = 128 (bits), which =A0=A0=A0 means one argument of type __m128 to be passed. =A0 * If the size_of_vector_data_type is greater than the width of the vect= or =A0=A0=A0 register, multiple vector registers are selected and the paramete= r will be =A0=A0=A0 passed in multiple vector registers. =A0=A0=A0 For instance, for ASIMD version of vector function with parameter= data type =A0=A0=A0 "int": =A0=A0=A0 If VLEN =3D 8, size_of_vector_data_type =3D 8 * 4 * 8 =3D 256 (bi= ts), so the =A0=A0=A0 vector data type is __m256, which means 2 arguments of type __m12= 8 are to =A0=A0=A0 be passed. ---------------------------------------------------------------------------= ----- 2.4. Ordering of Vector Arguments =A0 * When a parameter in the original data type results in one argument in= the =A0=A0=A0 vector function, the ordering rule is a simple one to one match w= ith the =A0=A0=A0 original argument order. =A0=A0 =A0 =A0=A0=A0 For example, when the original=A0 argument list is (int a, float = b, int c), =A0=A0=A0 VLEN is 4, the ISA is ASIMD, and all a, b, and c are classified= =A0 vector =A0=A0=A0 parameters, the vector function argument list becomes (__m128i ve= c_a, =A0=A0=A0 __m128 vec_b, __m128i vec_c). =A0 * There are cases where a single parameter in the original data type re= sults =A0=A0=A0 in the multiple arguments in the vector function. Those addition = second and =A0=A0=A0 subsequent arguments are inserted in the argument list right afte= r the =A0=A0=A0 corresponding first argument, not appended to the end of the argu= ment list =A0=A0=A0 of the vector function. =A0=A0=A0 For example, the original argument list is (int a, float =A0=A0=A0 b, int c), VLEN is 8, the ISA is ASIMD, and all a, b, and c are c= lassified =A0=A0=A0 as vector parameters, the vector function argument list becomes =A0=A0=A0 (__m128i vec_a1, __m128i vec_a2, __m128 vec_b1, __m128 vec_b2, =A0=A0=A0 __m128i vec_c1, __m128i vec_c2). ---------------------------------------------------------------------------= ----- 2.5. Masking of Vector Function Masked vector function variant used for invocation in conditional statement (please refer to [1] for detailed information) additionally takes an implic= it mask argument, which disables processing of some of the vector lanes. For masked vector functions, the additional "mask" parameters are required. Each element of "mask" parameters has the data type of the CDT (see Section 2.2). The number of mask parameters is the same as number of parameters required to pass the vector of CDT for the given vector length. The value o= f a mask parameter must be either bit patterns of all ones or all zeros for each element. For each element of the vector, if the corresponding mask value is zero, the return value associated to that element is zero. Mask parameters are passed after all other parameters in the same order of parameters that they are ap= ply to. ---------------------------------------------------------------------------= ----- 2.6. Vector Function Name Mangling The name mangling of the generated vector function based on vector annotati= on is important part of Vector ABI. It allows the caller and the callee functi= ons to be compiled in separate files or compilation units. Using the function prototype in header files to communicate vector function annotation information, the compiler can perform function matching while vectorizing c= ode at call sites. The vector function name is mangled as the concatenation of the following i= tems: '_' The descriptions of each item are: =A0 * =A0=A0=A0=A0=A0 string "_ZGV" =A0 =A0 * =A0=A0=A0=A0=A0 name of scalar function, including C++ and Fortran mangling =A0 =A0 * =A0=A0=A0=A0=A0 'b'=A0=A0=A0 // ASIMD =A0 =A0 * =A0=A0=A0=A0=A0 'M'=A0=A0=A0 // masked version =A0=A0=A0=A0=A0 | 'N'=A0 // unmasked version =A0 =A0 * =A0=A0=A0=A0=A0 decimal-number =A0 =A0 * =A0=A0=A0=A0=A0 /* empty */ =A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0 o =A0=A0=A0=A0=A0=A0=A0=A0=A0 (please refer to [1] for information about para= meter types used below) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 's' decimal-number // linear parame= ter, variable stride , =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 // decimal number is the position # of =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 // stride argument, which starts from 0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'l' =A0=A0=A0=A0 // linea= r parameter, constant stride =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'u'=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 // uniform parameter =A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'v'=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 // vector parameter =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 o =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 [n] non-negati= ve decimal integer=A0 // n indicates negative =A0=A0=A0=A0=A0=A0=A0=A0=A0 o =A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 /* empty */ =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'a' non-negative-decimal-number Please refer to section 2.7 Compiler generated variants of vector function = for examples of vector function mangling. ---------------------------------------------------------------------------= ----- 2.7. Compiler generated variants of vector function Compiler's architecture selection flag has no impact on ISA selection for t= he generated vector variants. Vector variants should be generated by compiler for each ISA for both maske= d and unmasked versions for each ISA (if one of them is not specified with accord= ing clause). Compiler implementations must not generate calls to version of oth= er ISAs unless some non-standard pragma or clause is used to declare those oth= er versions are available. Example 1. #pragma omp declare simd uniform(q) aligned(q:16) linear(k:1) float foo(float *q, float x, int k) { =A0=A0=A0 q[k] =3D q[k] + x; =A0=A0=A0 return q[k]; } Below is the list of generated function names or list of symbols provided by library with the same pragma in "foo" prototype. 1) _ZGVbN4ua16vl_foo (ASIMD ISA, unmasked version) 2) _ZGVbM4ua16vl_foo (ASIMD ISA, masked version) Where the "foo" is the original mangled function name, "_ZGV" is the prefix= of the vector function name, "b" indicates the ASIMD ISA, "N" indicates that t= his is a unmasked version, "M" indicates that this is a masked version, "4" is = the vector length for ASIMD ISA, "ua16" indicates uniform(q) and align(a:32), "= v" indicates second argument x is vector argument, "l" indicates linear(k:1) -= k is a linear variable whose stride is 1. Example 2. #pragma omp declare simd notinbranch double foo(double x) { =A0=A0=A0 return x*x; } Below is the list of generated function names or list of symbols provided by library with the same pragma in "foo" prototype. 1) _ZGVbN2v_foo (ASIMD ISA, unmasked version) Where the "foo" is the original mangled function name, "_ZGV" is the prefix= of the vector function name, "b" indicates the ASIMD ISA, "N" indicates that t= his is a unmasked version, "2" is the vector length for ASIMD ISA, "v" indicates single argument x is vector argument. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D 3. References [1] OpenMP 4.0 Specification http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf [2] Vector Function Application Binary Interface Specification for OpenMP (= x86) https://sourceware.org/glibc/wiki/libmvec?action=3DAttachFile&do=3Dview&tar= get=3DVectorABI.txt [3] Procedure Call Standard for the ARM 64-bit Architecture (AArch64) http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.= pdf=20