From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-192769-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31531 invoked by alias); 15 Mar 2017 09:50:26 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 31476 invoked by uid 89); 15 Mar 2017 09:50:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: NAM02-SN1-obe.outbound.protection.outlook.com
Received: from mail-sn1nam02on0065.outbound.protection.outlook.com (HELO NAM02-SN1-obe.outbound.protection.outlook.com) (104.47.36.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 15 Mar 2017 09:50:21 +0000
Received: from BY2PR07MB2421.namprd07.prod.outlook.com (10.166.115.13) by BY2PR07MB2422.namprd07.prod.outlook.com (10.166.115.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.961.17; Wed, 15 Mar 2017 09:50:18 +0000
Received: from BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) by BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) with mapi id 15.01.0961.022; Wed, 15 Mar 2017 09:50:18 +0000
From: "Sekhar, Ashwin" <Ashwin.Sekhar@cavium.com>
To: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
CC: "richard.earnshaw@arm.com" <richard.earnshaw@arm.com>,	"marcus.shawcroft@arm.com" <marcus.shawcroft@arm.com>,	"james.greenhalgh@arm.com" <james.greenhalgh@arm.com>
Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Date: Wed, 15 Mar 2017 09:50:00 -0000
Message-ID: <BY2PR07MB2421EB8B7FDC1CFFDE065DDE92270@BY2PR07MB2421.namprd07.prod.outlook.com>
authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=cavium.com;
x-microsoft-exchange-diagnostics: 1;BY2PR07MB2422;7:7VnIFmSqTLk/RNua3j0gI7tAKpQuatqEEWrv4cxniN401zH932xWJQA7utUacxzMFaC7ZpmwXOrMSfivQcH0tH5pEYjwRxQhzBkDXRSj5oGPeZLZxLmu2BQDbO0WpMo42zLBBldiJyBLTsjSA6XvW+HzUVDQ17FogA1iU0Rlc7IxMQyDI92NiYNKlGb8TK/tdZ2s/IPcuKZyddHr5/p69YMAG/XrrLohvxpp1L2L3W9EgnaDBNfwDNVJvxxD4EmL4kIrbSDDMTcc5sGCT4sdxLvdC5B0pP11eQ/lUN1Ksp9FPBlIZmRMxjZdlaWffXgu5wqiK9CBtw6898iOa6RaRQ==
x-ms-office365-filtering-correlation-id: 4db52fa4-93b5-4ad4-c375-08d46b88b07b
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BY2PR07MB2422;
x-microsoft-antispam-prvs: <BY2PR07MB2422F365758077C086B7EC4C92270@BY2PR07MB2422.namprd07.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(180628864354917);
x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(20161123562025)(20161123555025)(20161123558025)(20161123564025)(20161123560025)(6072148);SRVR:BY2PR07MB2422;BCL:0;PCL:0;RULEID:;SRVR:BY2PR07MB2422;
x-forefront-prvs: 02475B2A01
x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39450400003)(8676002)(66066001)(4326008)(81166006)(8936002)(6916009)(189998001)(50986999)(122556002)(7696004)(305945005)(86362001)(54356999)(7736002)(2900100001)(74316002)(2351001)(110136004)(1730700003)(6306002)(2906002)(38730400002)(9686003)(77096006)(53936002)(966004)(5660300001)(33656002)(3846002)(6116002)(102836003)(561944003)(3660700001)(25786008)(6436002)(6506006)(5890100001)(3280700002)(55016002)(2501003)(54906002)(99286003)(5640700003);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR07MB2422;H:BY2PR07MB2421.namprd07.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: cavium.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Mar 2017 09:50:18.6082 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR07MB2422
X-IsSubscribed: yes
X-SW-Source: 2017-03/txt/msg00077.txt.bz2

Hi GCC Team, Aarch64 Maintainers,


The rules in Vector Function Application Binary Interface Specification  fo=
r OpenMP  (https://sourceware.org/glibc/wiki/libmvec?action=3DAttachFile&do=
=3Dview&target=3DVectorABI.txt)  is used in x86 for generating the simd clo=
nes of a function.


Is there a similar one defined for Aarch64?


If not, would like to start a discussion on the same for Aarch64. To  kick =
start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI=
) is included below. The only change from x86 ABI is in the  function name =
mangling. Here the letter 'b' is used for indicating the  ASIMD isa.


Please review and comment.


Thanks and Regards,

Ashwin Sekhar T K


------------------------------------ CUT HERE -----------------------------=
-----


=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
=A0Aarch64 Vector Function Application Binary Interface Specification for O=
penMP
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D

1. Vector Function ABI Overview

Aarch64 Vector Function ABI provides ABI for the vector functions generated=
 by
compiler supporting SIMD constructs of OpenMP 4.0 [1] in Aarch64. This is
based on the x86 Vector Function Application Binary Interface Specification=
 for
OpenMP [2].

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D

2. Vector Function ABI

Vector Function ABI defines a set of rules that the caller and the callee
functions must obey.

These rules consist of:
=A0 * Calling convention
=A0 * Vector length (the number of concurrent scalar invocations to be proc=
essed
=A0=A0=A0 per invocation of the vector function)
=A0 * Mapping from element data types to vector data types
=A0 * Ordering of vector arguments
=A0 * Vector function masking
=A0 * Vector function name mangling
=A0 * Compiler generated variants of vector function

---------------------------------------------------------------------------=
-----

2.1. Calling Convention

The vector functions should use calling convention described in Procedure C=
all
Standard for the ARM 64-bit Architecture (AArch64) [3].

---------------------------------------------------------------------------=
-----

2.2. Vector Length

Every vector variant of a SIMD-enabled function has a vector length (VLEN).=
 If
OpenMP clause "simdlen" is used, the VLEN is the value of the argument of t=
hat
clause. The VLEN value must be power of 2. In other case the notion of the
function`s "characteristic data type" (CDT) is used to compute the vector
length.

CDT is defined in the following order:
=A0 a) For non-void function, the CDT is the return type.
=A0 b) If the function has any non-uniform, non-linear parameters, then the=
 CDT
=A0=A0=A0=A0 is the type of the first such parameter.
=A0 c) If the CDT determined by a) or b) above is struct, union, or class t=
ype
=A0=A0=A0=A0 which is pass-by-value (except for the type that maps to the b=
uilt-in
=A0=A0=A0=A0 complex data type), the characteristic data type is int.
=A0 d) If none of the above three cases is applicable, the CDT is int.

VLEN=A0 =3D sizeof(vector_register) / sizeof(CDT),

For example, if ISA is ASIMD, sizeof(vector_register) =3D 16, as the vector
registers are 128 bit. And if the CDT of the function is "int", sizeof(CDT)=
 =3D 4.
So, VLEN =3D 4.

---------------------------------------------------------------------------=
-----

2.3. Element Data Type to Vector Data Type Mapping

The vector data types for parameters are selected depending on ISA, vector
length, data type of original parameter, and parameter specification.

For uniform and linear parameters (detailed description could be found in [=
1]),
the original data type is preserved.

For vector parameters, vector data types are selected by the compiler. The
mapping from element data type to vector data type is described as below.

=A0 * The bit size of vector data type of parameter is computed as:

=A0=A0=A0 size_of_vector_data_type =3D VLEN * sizeof(original_parameter_dat=
a_type) * 8

=A0=A0=A0 For instance, for ASIMD version of vector function with parameter=
 data type
=A0=A0=A0 "int": If VLEN =3D 4, size_of_vector_data_type =3D 4 * 4 * 8 =3D =
128 (bits), which
=A0=A0=A0 means one argument of type __m128 to be passed.

=A0 * If the size_of_vector_data_type is greater than the width of the vect=
or
=A0=A0=A0 register, multiple vector registers are selected and the paramete=
r will be
=A0=A0=A0 passed in multiple vector registers.

=A0=A0=A0 For instance, for ASIMD version of vector function with parameter=
 data type
=A0=A0=A0 "int":

=A0=A0=A0 If VLEN =3D 8, size_of_vector_data_type =3D 8 * 4 * 8 =3D 256 (bi=
ts), so the
=A0=A0=A0 vector data type is __m256, which means 2 arguments of type __m12=
8 are to
=A0=A0=A0 be passed.

---------------------------------------------------------------------------=
-----

2.4. Ordering of Vector Arguments

=A0 * When a parameter in the original data type results in one argument in=
 the
=A0=A0=A0 vector function, the ordering rule is a simple one to one match w=
ith the
=A0=A0=A0 original argument order.
=A0=A0 =A0
=A0=A0=A0 For example, when the original=A0 argument list is (int a, float =
b, int c),
=A0=A0=A0 VLEN is 4, the ISA is ASIMD, and all a, b, and c are classified=
=A0 vector
=A0=A0=A0 parameters, the vector function argument list becomes (__m128i ve=
c_a,
=A0=A0=A0 __m128 vec_b, __m128i vec_c).

=A0 * There are cases where a single parameter in the original data type re=
sults
=A0=A0=A0 in the multiple arguments in the vector function. Those addition =
second and
=A0=A0=A0 subsequent arguments are inserted in the argument list right afte=
r the
=A0=A0=A0 corresponding first argument, not appended to the end of the argu=
ment list
=A0=A0=A0 of the vector function.

=A0=A0=A0 For example, the original argument list is (int a, float
=A0=A0=A0 b, int c), VLEN is 8, the ISA is ASIMD, and all a, b, and c are c=
lassified
=A0=A0=A0 as vector parameters, the vector function argument list becomes
=A0=A0=A0 (__m128i vec_a1, __m128i vec_a2, __m128 vec_b1, __m128 vec_b2,
=A0=A0=A0 __m128i vec_c1, __m128i vec_c2).

---------------------------------------------------------------------------=
-----

2.5. Masking of Vector Function

Masked vector function variant used for invocation in conditional statement
(please refer to [1] for detailed information) additionally takes an implic=
it
mask argument, which disables processing of some of the vector lanes. For
masked vector functions, the additional "mask" parameters are required.

Each element of "mask" parameters has the data type of the CDT (see Section
2.2). The number of mask parameters is the same as number of parameters
required to pass the vector of CDT for the given vector length. The value o=
f a
mask parameter must be either bit patterns of all ones or all zeros for each
element.

For each element of the vector, if the corresponding mask value is zero, the
return value associated to that element is zero. Mask parameters are passed
after all other parameters in the same order of parameters that they are ap=
ply
to.

---------------------------------------------------------------------------=
-----

2.6. Vector Function Name Mangling

The name mangling of the generated vector function based on vector annotati=
on
is important part of Vector ABI. It allows the caller and the callee functi=
ons
to be compiled in separate files or compilation units. Using the function
prototype in header files to communicate vector function annotation
information, the compiler can perform function matching while vectorizing c=
ode
at call sites.

The vector function name is mangled as the concatenation of the following i=
tems:

<vector_prefix> <isa> <mask> <vlen> <vparameters> '_' <original_name>

The descriptions of each item are:
=A0 * <vector_prefix>
=A0=A0=A0=A0=A0 string "_ZGV"
=A0
=A0 * <original_name>
=A0=A0=A0=A0=A0 name of scalar function, including C++ and Fortran mangling
=A0
=A0 * <isa>
=A0=A0=A0=A0=A0 'b'=A0=A0=A0 // ASIMD
=A0
=A0 * <mask>
=A0=A0=A0=A0=A0 'M'=A0=A0=A0 // masked version
=A0=A0=A0=A0=A0 | 'N'=A0 // unmasked version
=A0
=A0 * <vlen>
=A0=A0=A0=A0=A0 decimal-number
=A0
=A0 * <vparameters>
=A0=A0=A0=A0=A0 /* empty */
=A0=A0=A0=A0=A0 <vparameter> <opt-align> <vparameters>
=A0=A0=A0=A0=A0=A0=A0=A0=A0 o <vparameter>
=A0=A0=A0=A0=A0=A0=A0=A0=A0 (please refer to [1] for information about para=
meter types used below)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 's' decimal-number // linear parame=
ter, variable stride ,
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 // decimal number is the position # of
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 // stride argument, which starts from 0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'l' <number>=A0=A0=A0=A0 // linea=
r parameter, constant stride
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'u'=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 // uniform parameter =A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'v'=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 // vector parameter
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 o <number>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 [n] non-negati=
ve decimal integer=A0 // n indicates negative
=A0=A0=A0=A0=A0=A0=A0=A0=A0 o <opt-align> =A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 /* empty */
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 'a' non-negative-decimal-number

Please refer to section 2.7 Compiler generated variants of vector function =
for
examples of vector function mangling.

---------------------------------------------------------------------------=
-----

2.7. Compiler generated variants of vector function

Compiler's architecture selection flag has no impact on ISA selection for t=
he
generated vector variants.

Vector variants should be generated by compiler for each ISA for both maske=
d and
unmasked versions for each ISA (if one of them is not specified with accord=
ing
clause). Compiler implementations must not generate calls to version of oth=
er
ISAs unless some non-standard pragma or clause is used to declare those oth=
er
versions are available.

Example 1.
#pragma omp declare simd uniform(q) aligned(q:16) linear(k:1)
float foo(float *q, float x, int k)
{
=A0=A0=A0 q[k] =3D q[k] + x;
=A0=A0=A0 return q[k];
}

Below is the list of generated function names or list of symbols provided by
library with the same pragma in "foo" prototype.

1) _ZGVbN4ua16vl_foo (ASIMD ISA, unmasked version)
2) _ZGVbM4ua16vl_foo (ASIMD ISA, masked version)

Where the "foo" is the original mangled function name, "_ZGV" is the prefix=
 of
the vector function name, "b" indicates the ASIMD ISA, "N" indicates that t=
his
is a unmasked version, "M" indicates that this is a masked version, "4" is =
the
vector length for ASIMD ISA, "ua16" indicates uniform(q) and align(a:32), "=
v"
indicates second argument x is vector argument, "l" indicates linear(k:1) -=
 k
is a linear variable whose stride is 1.

Example 2.
#pragma omp declare simd notinbranch
double foo(double x)
{
=A0=A0=A0 return x*x;
}

Below is the list of generated function names or list of symbols provided by
library with the same pragma in "foo" prototype.

1) _ZGVbN2v_foo (ASIMD ISA, unmasked version)

Where the "foo" is the original mangled function name, "_ZGV" is the prefix=
 of
the vector function name, "b" indicates the ASIMD ISA, "N" indicates that t=
his
is a unmasked version, "2" is the vector length for ASIMD ISA, "v" indicates
single argument x is vector argument.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D

3. References

[1] OpenMP 4.0 Specification
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

[2] Vector Function Application Binary Interface Specification for OpenMP (=
x86)
https://sourceware.org/glibc/wiki/libmvec?action=3DAttachFile&do=3Dview&tar=
get=3DVectorABI.txt

[3] Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.=
pdf=20