From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5754 invoked by alias); 30 Aug 2007 12:25:20 -0000 Received: (qmail 5729 invoked by uid 22791); 30 Aug 2007 12:25:18 -0000 X-Spam-Check-By: sourceware.org Received: from ns.suse.de (HELO mx1.suse.de) (195.135.220.2) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 30 Aug 2007 12:25:13 +0000 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 75C4D125A6; Thu, 30 Aug 2007 14:25:10 +0200 (CEST) Date: Thu, 30 Aug 2007 13:03:00 -0000 From: Richard Guenther To: Uros Bizjak Cc: Paolo Bonzini , GCC Patches , Jan Hubicka Subject: Re: [PATCH] Update support patch for ACML vectorized intrinsic library In-Reply-To: <5787cf470708300454t2a708b1frf67710d9cf0cbad6@mail.gmail.com> Message-ID: References: <46C9EF2C.8000808@gmail.com> <5787cf470708240402u1bb79c56k5f0955d784c6e6fd@mail.gmail.com> <46CEC4E3.5020009@gnu.org> <5787cf470708300454t2a708b1frf67710d9cf0cbad6@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-08/txt/msg02178.txt.bz2 On Thu, 30 Aug 2007, Uros Bizjak wrote: > On 8/24/07, Paolo Bonzini wrote: > > > > So I am fine with either of two options - not linking automatically > > > or linking against libacml_mv with -mveclib=acml. > > > > What about having -mveclibabi=acml doing what it does now, and > > -mveclib=acml doing "-mveclibabi=acml -lacml_mv"? Which you could also > > read as, link automatically with libacml_mv and make a note to implement > > -mveclibabi when the need arises... > > Let's follow the example with -fexternal-blas and for this moment > implement original Richard's solution. However, I'd rename > proposed-mveclib option into -mveclibabi=... as suggested by Paolo, > because this is IMO less confusing. > > So, we will have: -mveclibabi=acml -lacml_mv. > > Also, it is possible to add something like gfortran has for -fexternal-blas: > > `-fexternal-blas' > This option will make `gfortran' generate calls to BLAS functions > for some matrix operations like `MATMUL', instead of using our own > algorithms, if the size of the matrices involved is larger than a > given limit (see `-fblas-matmul-limit'). This may be profitable > if an optimized vendor BLAS library is available. The BLAS > library will have to be specified at link time. > > to the docs? I am re-testing the following. The option is now -mveclibabi and the docs have been extended to mention -ftree-vectorize and the required linking adjustment. Does this look ok? Thanks, Richard. 2007-08-24 Richard Guenther * doc/invoke.texi (-mveclibabi): Document new target option. * config/i386/i386.opt (-mveclibabi): New target option. * config/i386/i386.c (ix86_veclib_handler): Handler for vectorization library support. (override_options): Handle the -mveclibabi option, initialize the vectorization library handler. (ix86_builtin_vectorized_function): As fallback call the vectorization library handler, if set. (ix86_veclibabi_acml): New static function for ACML ABI style vectorization support. Index: gcc/doc/invoke.texi =================================================================== *** gcc/doc/invoke.texi.orig 2007-08-30 14:25:10.000000000 +0200 --- gcc/doc/invoke.texi 2007-08-30 14:27:11.000000000 +0200 *************** Objective-C and Objective-C++ Dialects}. *** 557,563 **** -mthreads -mno-align-stringops -minline-all-stringops @gol -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol -m96bit-long-double -mregparm=@var{num} -msseregparm @gol ! -mpc32 -mpc64 -mpc80 mstackrealign @gol -momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol -mcmodel=@var{code-model} @gol -m32 -m64 -mlarge-data-threshold=@var{num}} --- 557,563 ---- -mthreads -mno-align-stringops -minline-all-stringops @gol -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol -m96bit-long-double -mregparm=@var{num} -msseregparm @gol ! -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol -momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol -mcmodel=@var{code-model} @gol -m32 -m64 -mlarge-data-threshold=@var{num}} *************** vectorized variants RCPPS and RSQRTPS) i *** 10440,10445 **** --- 10440,10458 ---- vectorized variants). These instructions will be generated only when @option{-funsafe-math-optimizations} is enabled. + @item -mveclibabi=@var{type} + @opindex mveclibabi + Specifies the ABI type to use for vectorizing intrinsics using an + external library. Supported types are @code{acml} for the AMD + math core library style of interfacing. GCC will currently emit + calls to @code{__vrd2_sin}, @code{__vrd2_cos}, @code{__vrd2_exp}, + @code{__vrd2_log}, @code{__vrd2_log2}, @code{__vrd2_log10}, + @code{__vrs4_sinf}, @code{__vrs4_cosf}, @code{__vrs4_expf}, + @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f} + and @code{__vrs4_powf} when using this type and @option{-ftree-vectorize} + is enabled. A ACML ABI compatible library will have to be specified + at link time. + @item -mpush-args @itemx -mno-push-args @opindex mpush-args Index: gcc/config/i386/i386.opt =================================================================== *** gcc/config/i386/i386.opt.orig 2007-08-30 14:25:10.000000000 +0200 --- gcc/config/i386/i386.opt 2007-08-30 14:27:46.000000000 +0200 *************** mtune= *** 182,187 **** --- 182,191 ---- Target RejectNegative Joined Var(ix86_tune_string) Schedule code for given CPU + mveclibabi= + Target RejectNegative Joined Var(ix86_veclibabi_string) + Vector library ABI to use + ;; ISA support m32 Index: gcc/config/i386/i386.c =================================================================== *** gcc/config/i386/i386.c.orig 2007-08-30 14:25:10.000000000 +0200 --- gcc/config/i386/i386.c 2007-08-30 14:29:02.000000000 +0200 *************** static int ix86_isa_flags_explicit; *** 1620,1625 **** --- 1620,1629 ---- #define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4 + /* Vectorization library interface and handlers. */ + tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL; + static tree ix86_veclibabi_acml (enum built_in_function, tree, tree); + /* Implement TARGET_HANDLE_OPTION. */ static bool *************** override_options (void) *** 2409,2414 **** --- 2413,2428 ---- if (!TARGET_80387) target_flags &= ~MASK_FLOAT_RETURNS; + /* Use external vectorized library in vectorizing intrinsics. */ + if (ix86_veclibabi_string) + { + if (strcmp (ix86_veclibabi_string, "acml") == 0) + ix86_veclib_handler = ix86_veclibabi_acml; + else + error ("unknown vectorization library ABI type (%s) for " + "-mveclibabi= switch", ix86_veclibabi_string); + } + if ((x86_accumulate_outgoing_args & ix86_tune_mask) && !(target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS) && !optimize_size) *************** ix86_builtin_vectorized_function (unsign *** 19934,19966 **** if (out_mode == DFmode && out_n == 2 && in_mode == DFmode && in_n == 2) return ix86_builtins[IX86_BUILTIN_SQRTPD]; ! return NULL_TREE; case BUILT_IN_SQRTF: if (out_mode == SFmode && out_n == 4 && in_mode == SFmode && in_n == 4) return ix86_builtins[IX86_BUILTIN_SQRTPS]; ! return NULL_TREE; case BUILT_IN_LRINT: if (out_mode == SImode && out_n == 4 && in_mode == DFmode && in_n == 2) return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX]; ! return NULL_TREE; case BUILT_IN_LRINTF: if (out_mode == SImode && out_n == 4 && in_mode == SFmode && in_n == 4) return ix86_builtins[IX86_BUILTIN_CVTPS2DQ]; ! return NULL_TREE; default: ; } return NULL_TREE; } /* Returns a decl of a function that implements conversion of the input vector of type TYPE, or NULL_TREE if it is not available. */ --- 19948,20069 ---- if (out_mode == DFmode && out_n == 2 && in_mode == DFmode && in_n == 2) return ix86_builtins[IX86_BUILTIN_SQRTPD]; ! break; case BUILT_IN_SQRTF: if (out_mode == SFmode && out_n == 4 && in_mode == SFmode && in_n == 4) return ix86_builtins[IX86_BUILTIN_SQRTPS]; ! break; case BUILT_IN_LRINT: if (out_mode == SImode && out_n == 4 && in_mode == DFmode && in_n == 2) return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX]; ! break; case BUILT_IN_LRINTF: if (out_mode == SImode && out_n == 4 && in_mode == SFmode && in_n == 4) return ix86_builtins[IX86_BUILTIN_CVTPS2DQ]; ! break; default: ; } + /* Dispatch to a handler for a vectorization library. */ + if (ix86_veclib_handler) + return (*ix86_veclib_handler)(fn, type_out, type_in); + return NULL_TREE; } + /* Handler for an ACML-style interface to a library with vectorized + intrinsics. */ + + static tree + ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in) + { + char name[20] = "__vr.._"; + tree fntype, new_fndecl, args; + unsigned arity; + const char *bname; + enum machine_mode el_mode, in_mode; + int n, in_n; + + /* The ACML is 64bits only and suitable for unsafe math only as + it does not correctly support parts of IEEE with the required + precision such as denormals. */ + if (!TARGET_64BIT + || !flag_unsafe_math_optimizations) + return NULL_TREE; + + el_mode = TYPE_MODE (TREE_TYPE (type_out)); + n = TYPE_VECTOR_SUBPARTS (type_out); + in_mode = TYPE_MODE (TREE_TYPE (type_in)); + in_n = TYPE_VECTOR_SUBPARTS (type_in); + if (el_mode != in_mode + || n != in_n) + return NULL_TREE; + + switch (fn) + { + case BUILT_IN_SIN: + case BUILT_IN_COS: + case BUILT_IN_EXP: + case BUILT_IN_LOG: + case BUILT_IN_LOG2: + case BUILT_IN_LOG10: + name[4] = 'd'; + name[5] = '2'; + if (el_mode != DFmode + || n != 2) + return NULL_TREE; + break; + + case BUILT_IN_SINF: + case BUILT_IN_COSF: + case BUILT_IN_EXPF: + case BUILT_IN_POWF: + case BUILT_IN_LOGF: + case BUILT_IN_LOG2F: + case BUILT_IN_LOG10F: + name[4] = 's'; + name[5] = '4'; + if (el_mode != SFmode + || n != 4) + return NULL_TREE; + break; + + default: + return NULL_TREE; + } + + bname = IDENTIFIER_POINTER (DECL_NAME (implicit_built_in_decls[fn])); + sprintf (name + 7, "%s", bname+10); + + arity = 0; + for (args = DECL_ARGUMENTS (implicit_built_in_decls[fn]); args; + args = TREE_CHAIN (args)) + arity++; + + if (arity == 1) + fntype = build_function_type_list (type_out, type_in, NULL); + else + fntype = build_function_type_list (type_out, type_in, type_in, NULL); + + /* Build a function declaration for the vectorized function. */ + new_fndecl = build_decl (FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; + } + + /* Returns a decl of a function that implements conversion of the input vector of type TYPE, or NULL_TREE if it is not available. */