* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-20 19:57 Uros Bizjak
2007-08-21 12:41 ` Richard Guenther
0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-20 19:57 UTC (permalink / raw)
To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka
Hello!
> This is about the best thing we can do at the moment without introducing
> libgcc-math, so I'd like to go ahead with this for 4.3 at least.
>
> Any opinion on whether we automatically should link acml_mv?
What about having to specify full library name to -mveclib= ? This name
can be processed by appropriate _SPEC define to automatically link
specified library. The benefit of specifying a full name would be to
distinguish between i.e. acml_mv and (possible) acml_mv2.
(Ideally, we should detect when acml_mv library was added using
-lacml_mv and trigger correct veclib handler. I'm not sure, this is
possible...).
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-20 19:57 [PATCH] Update support patch for ACML vectorized intrinsic library Uros Bizjak
@ 2007-08-21 12:41 ` Richard Guenther
2007-08-24 11:03 ` Richard Guenther
0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-21 12:41 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka
On Mon, 20 Aug 2007, Uros Bizjak wrote:
> Hello!
>
> > This is about the best thing we can do at the moment without introducing
> > libgcc-math, so I'd like to go ahead with this for 4.3 at least.
> >
> > Any opinion on whether we automatically should link acml_mv?
>
> What about having to specify full library name to -mveclib= ? This name can be
> processed by appropriate _SPEC define to automatically link specified library.
> The benefit of specifying a full name would be to distinguish between i.e.
> acml_mv and (possible) acml_mv2.
Uh, I don't like giving full paths to an option. If acml_mv will become
acml_mv2 it probably changes the ABI so we would need adjustments to the
code anyway, so I don't expect this to happen.
We should be able to process -mveclib=acml in the specs processing as well
and just add -lacml_mv - the question was mainly whether we should do that
(given that gfortran doesn't do it for -fexternal-blas).
Richard.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-21 12:41 ` Richard Guenther
@ 2007-08-24 11:03 ` Richard Guenther
2007-08-24 11:13 ` Uros Bizjak
0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-24 11:03 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka
On Tue, 21 Aug 2007, Richard Guenther wrote:
> On Mon, 20 Aug 2007, Uros Bizjak wrote:
>
> > Hello!
> >
> > > This is about the best thing we can do at the moment without introducing
> > > libgcc-math, so I'd like to go ahead with this for 4.3 at least.
> > >
> > > Any opinion on whether we automatically should link acml_mv?
> >
> > What about having to specify full library name to -mveclib= ? This name can be
> > processed by appropriate _SPEC define to automatically link specified library.
> > The benefit of specifying a full name would be to distinguish between i.e.
> > acml_mv and (possible) acml_mv2.
>
> Uh, I don't like giving full paths to an option. If acml_mv will become
> acml_mv2 it probably changes the ABI so we would need adjustments to the
> code anyway, so I don't expect this to happen.
>
> We should be able to process -mveclib=acml in the specs processing as well
> and just add -lacml_mv - the question was mainly whether we should do that
> (given that gfortran doesn't do it for -fexternal-blas).
So, the following extra hunk for the patch automatically links the
library.
Is the patch ok for mainline?
Thanks,
Richard.
* config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
Link with libacml_mv as needed, if building with -mveclib=acml.
Index: config/i386/linux64.h
===================================================================
*** config/i386/linux64.h.orig 2007-08-20 13:44:10.000000000 +0200
--- config/i386/linux64.h 2007-08-24 12:48:44.000000000 +0200
*************** along with GCC; see the file COPYING3.
*** 74,79 ****
--- 74,86 ----
%{" SPEC_64 ":%{!dynamic-linker:-dynamic-linker "
LINUX_DYNAMIC_LINKER64 "}}} \
%{static:-static}}"
+ #undef LIB_SPEC
+ #define LIB_SPEC\
+ "%{pthread:-lpthread} \
+ %{shared:-lc} \
+ %{!shared:%{mieee-fp:-lieee} %{profile:-lc_p}%{!profile:-lc}} \
+ %{mveclib=acml:--as-needed -lacml_mv --no-as-needed}"
+
/* Similar to standard Linux, but adding -ffast-math support. */
#undef ENDFILE_SPEC
#define ENDFILE_SPEC \
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-24 11:03 ` Richard Guenther
@ 2007-08-24 11:13 ` Uros Bizjak
2007-08-24 11:18 ` Richard Guenther
0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-24 11:13 UTC (permalink / raw)
To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka
On 8/24/07, Richard Guenther <rguenther@suse.de> wrote:
> So, the following extra hunk for the patch automatically links the
> library.
>
> Is the patch ok for mainline?
> * config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
> Link with libacml_mv as needed, if building with -mveclib=acml.
I still think that having -mveclib=acml_mv and to pass acml_mv
automatically to -l as a variable is better. But I'll leave this to
your choice (but perhaps we should wait for another opinion).
Thanks,
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-24 11:13 ` Uros Bizjak
@ 2007-08-24 11:18 ` Richard Guenther
2007-08-24 12:34 ` Paolo Bonzini
0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-24 11:18 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka
On Fri, 24 Aug 2007, Uros Bizjak wrote:
> On 8/24/07, Richard Guenther <rguenther@suse.de> wrote:
>
> > So, the following extra hunk for the patch automatically links the
> > library.
> >
> > Is the patch ok for mainline?
>
> > * config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
> > Link with libacml_mv as needed, if building with -mveclib=acml.
>
> I still think that having -mveclib=acml_mv and to pass acml_mv
> automatically to -l as a variable is better. But I'll leave this to
> your choice (but perhaps we should wait for another opinion).
I'm somewhat unsure here - for one -mveclib was supposed to select
an ABI for the vectorization library, but of course automatically
linking against some specific library name makes this pointless
(that is, it doesn't allow for an alternate implementation with
a different name). Now if we want some automatic linking then still
the option should be easy enough to recognize (acml_mv is not from
my point of view).
So I am fine with either of two options - not linking automatically
or linking against libacml_mv with -mveclib=acml.
Richard.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-24 11:18 ` Richard Guenther
@ 2007-08-24 12:34 ` Paolo Bonzini
2007-08-30 12:02 ` Uros Bizjak
0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2007-08-24 12:34 UTC (permalink / raw)
To: Richard Guenther; +Cc: Uros Bizjak, GCC Patches, Jan Hubicka
> So I am fine with either of two options - not linking automatically
> or linking against libacml_mv with -mveclib=acml.
What about having -mveclibabi=acml doing what it does now, and
-mveclib=acml doing "-mveclibabi=acml -lacml_mv"? Which you could also
read as, link automatically with libacml_mv and make a note to implement
-mveclibabi when the need arises...
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-24 12:34 ` Paolo Bonzini
@ 2007-08-30 12:02 ` Uros Bizjak
2007-08-30 13:03 ` Richard Guenther
0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 12:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Richard Guenther, GCC Patches, Jan Hubicka
On 8/24/07, Paolo Bonzini <bonzini@gnu.org> wrote:
> > So I am fine with either of two options - not linking automatically
> > or linking against libacml_mv with -mveclib=acml.
>
> What about having -mveclibabi=acml doing what it does now, and
> -mveclib=acml doing "-mveclibabi=acml -lacml_mv"? Which you could also
> read as, link automatically with libacml_mv and make a note to implement
> -mveclibabi when the need arises...
Let's follow the example with -fexternal-blas and for this moment
implement original Richard's solution. However, I'd rename
proposed-mveclib option into -mveclibabi=... as suggested by Paolo,
because this is IMO less confusing.
So, we will have: -mveclibabi=acml -lacml_mv.
Also, it is possible to add something like gfortran has for -fexternal-blas:
`-fexternal-blas'
This option will make `gfortran' generate calls to BLAS functions
for some matrix operations like `MATMUL', instead of using our own
algorithms, if the size of the matrices involved is larger than a
given limit (see `-fblas-matmul-limit'). This may be profitable
if an optimized vendor BLAS library is available. The BLAS
library will have to be specified at link time.
to the docs?
The (original) patch is OK for mainline as far as i386 is concerned.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 12:02 ` Uros Bizjak
@ 2007-08-30 13:03 ` Richard Guenther
2007-08-30 13:08 ` Uros Bizjak
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Richard Guenther @ 2007-08-30 13:03 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka
On Thu, 30 Aug 2007, Uros Bizjak wrote:
> On 8/24/07, Paolo Bonzini <bonzini@gnu.org> wrote:
>
> > > So I am fine with either of two options - not linking automatically
> > > or linking against libacml_mv with -mveclib=acml.
> >
> > What about having -mveclibabi=acml doing what it does now, and
> > -mveclib=acml doing "-mveclibabi=acml -lacml_mv"? Which you could also
> > read as, link automatically with libacml_mv and make a note to implement
> > -mveclibabi when the need arises...
>
> Let's follow the example with -fexternal-blas and for this moment
> implement original Richard's solution. However, I'd rename
> proposed-mveclib option into -mveclibabi=... as suggested by Paolo,
> because this is IMO less confusing.
>
> So, we will have: -mveclibabi=acml -lacml_mv.
>
> Also, it is possible to add something like gfortran has for -fexternal-blas:
>
> `-fexternal-blas'
> This option will make `gfortran' generate calls to BLAS functions
> for some matrix operations like `MATMUL', instead of using our own
> algorithms, if the size of the matrices involved is larger than a
> given limit (see `-fblas-matmul-limit'). This may be profitable
> if an optimized vendor BLAS library is available. The BLAS
> library will have to be specified at link time.
>
> to the docs?
I am re-testing the following. The option is now -mveclibabi and
the docs have been extended to mention -ftree-vectorize and the
required linking adjustment.
Does this look ok?
Thanks,
Richard.
2007-08-24 Richard Guenther <rguenther@suse.de>
* doc/invoke.texi (-mveclibabi): Document new target option.
* config/i386/i386.opt (-mveclibabi): New target option.
* config/i386/i386.c (ix86_veclib_handler): Handler for
vectorization library support.
(override_options): Handle the -mveclibabi option, initialize
the vectorization library handler.
(ix86_builtin_vectorized_function): As fallback call the
vectorization library handler, if set.
(ix86_veclibabi_acml): New static function for ACML ABI style
vectorization support.
Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig 2007-08-30 14:25:10.000000000 +0200
--- gcc/doc/invoke.texi 2007-08-30 14:27:11.000000000 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 557,563 ****
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -msseregparm @gol
! -mpc32 -mpc64 -mpc80 mstackrealign @gol
-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
-mcmodel=@var{code-model} @gol
-m32 -m64 -mlarge-data-threshold=@var{num}}
--- 557,563 ----
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -msseregparm @gol
! -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol
-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
-mcmodel=@var{code-model} @gol
-m32 -m64 -mlarge-data-threshold=@var{num}}
*************** vectorized variants RCPPS and RSQRTPS) i
*** 10440,10445 ****
--- 10440,10458 ----
vectorized variants). These instructions will be generated only when
@option{-funsafe-math-optimizations} is enabled.
+ @item -mveclibabi=@var{type}
+ @opindex mveclibabi
+ Specifies the ABI type to use for vectorizing intrinsics using an
+ external library. Supported types are @code{acml} for the AMD
+ math core library style of interfacing. GCC will currently emit
+ calls to @code{__vrd2_sin}, @code{__vrd2_cos}, @code{__vrd2_exp},
+ @code{__vrd2_log}, @code{__vrd2_log2}, @code{__vrd2_log10},
+ @code{__vrs4_sinf}, @code{__vrs4_cosf}, @code{__vrs4_expf},
+ @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
+ and @code{__vrs4_powf} when using this type and @option{-ftree-vectorize}
+ is enabled. A ACML ABI compatible library will have to be specified
+ at link time.
+
@item -mpush-args
@itemx -mno-push-args
@opindex mpush-args
Index: gcc/config/i386/i386.opt
===================================================================
*** gcc/config/i386/i386.opt.orig 2007-08-30 14:25:10.000000000 +0200
--- gcc/config/i386/i386.opt 2007-08-30 14:27:46.000000000 +0200
*************** mtune=
*** 182,187 ****
--- 182,191 ----
Target RejectNegative Joined Var(ix86_tune_string)
Schedule code for given CPU
+ mveclibabi=
+ Target RejectNegative Joined Var(ix86_veclibabi_string)
+ Vector library ABI to use
+
;; ISA support
m32
Index: gcc/config/i386/i386.c
===================================================================
*** gcc/config/i386/i386.c.orig 2007-08-30 14:25:10.000000000 +0200
--- gcc/config/i386/i386.c 2007-08-30 14:29:02.000000000 +0200
*************** static int ix86_isa_flags_explicit;
*** 1620,1625 ****
--- 1620,1629 ----
#define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4
+ /* Vectorization library interface and handlers. */
+ tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL;
+ static tree ix86_veclibabi_acml (enum built_in_function, tree, tree);
+
/* Implement TARGET_HANDLE_OPTION. */
static bool
*************** override_options (void)
*** 2409,2414 ****
--- 2413,2428 ----
if (!TARGET_80387)
target_flags &= ~MASK_FLOAT_RETURNS;
+ /* Use external vectorized library in vectorizing intrinsics. */
+ if (ix86_veclibabi_string)
+ {
+ if (strcmp (ix86_veclibabi_string, "acml") == 0)
+ ix86_veclib_handler = ix86_veclibabi_acml;
+ else
+ error ("unknown vectorization library ABI type (%s) for "
+ "-mveclibabi= switch", ix86_veclibabi_string);
+ }
+
if ((x86_accumulate_outgoing_args & ix86_tune_mask)
&& !(target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS)
&& !optimize_size)
*************** ix86_builtin_vectorized_function (unsign
*** 19934,19966 ****
if (out_mode == DFmode && out_n == 2
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_SQRTPD];
! return NULL_TREE;
case BUILT_IN_SQRTF:
if (out_mode == SFmode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_SQRTPS];
! return NULL_TREE;
case BUILT_IN_LRINT:
if (out_mode == SImode && out_n == 4
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
! return NULL_TREE;
case BUILT_IN_LRINTF:
if (out_mode == SImode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
! return NULL_TREE;
default:
;
}
return NULL_TREE;
}
/* Returns a decl of a function that implements conversion of the
input vector of type TYPE, or NULL_TREE if it is not available. */
--- 19948,20069 ----
if (out_mode == DFmode && out_n == 2
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_SQRTPD];
! break;
case BUILT_IN_SQRTF:
if (out_mode == SFmode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_SQRTPS];
! break;
case BUILT_IN_LRINT:
if (out_mode == SImode && out_n == 4
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
! break;
case BUILT_IN_LRINTF:
if (out_mode == SImode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
! break;
default:
;
}
+ /* Dispatch to a handler for a vectorization library. */
+ if (ix86_veclib_handler)
+ return (*ix86_veclib_handler)(fn, type_out, type_in);
+
return NULL_TREE;
}
+ /* Handler for an ACML-style interface to a library with vectorized
+ intrinsics. */
+
+ static tree
+ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
+ {
+ char name[20] = "__vr.._";
+ tree fntype, new_fndecl, args;
+ unsigned arity;
+ const char *bname;
+ enum machine_mode el_mode, in_mode;
+ int n, in_n;
+
+ /* The ACML is 64bits only and suitable for unsafe math only as
+ it does not correctly support parts of IEEE with the required
+ precision such as denormals. */
+ if (!TARGET_64BIT
+ || !flag_unsafe_math_optimizations)
+ return NULL_TREE;
+
+ el_mode = TYPE_MODE (TREE_TYPE (type_out));
+ n = TYPE_VECTOR_SUBPARTS (type_out);
+ in_mode = TYPE_MODE (TREE_TYPE (type_in));
+ in_n = TYPE_VECTOR_SUBPARTS (type_in);
+ if (el_mode != in_mode
+ || n != in_n)
+ return NULL_TREE;
+
+ switch (fn)
+ {
+ case BUILT_IN_SIN:
+ case BUILT_IN_COS:
+ case BUILT_IN_EXP:
+ case BUILT_IN_LOG:
+ case BUILT_IN_LOG2:
+ case BUILT_IN_LOG10:
+ name[4] = 'd';
+ name[5] = '2';
+ if (el_mode != DFmode
+ || n != 2)
+ return NULL_TREE;
+ break;
+
+ case BUILT_IN_SINF:
+ case BUILT_IN_COSF:
+ case BUILT_IN_EXPF:
+ case BUILT_IN_POWF:
+ case BUILT_IN_LOGF:
+ case BUILT_IN_LOG2F:
+ case BUILT_IN_LOG10F:
+ name[4] = 's';
+ name[5] = '4';
+ if (el_mode != SFmode
+ || n != 4)
+ return NULL_TREE;
+ break;
+
+ default:
+ return NULL_TREE;
+ }
+
+ bname = IDENTIFIER_POINTER (DECL_NAME (implicit_built_in_decls[fn]));
+ sprintf (name + 7, "%s", bname+10);
+
+ arity = 0;
+ for (args = DECL_ARGUMENTS (implicit_built_in_decls[fn]); args;
+ args = TREE_CHAIN (args))
+ arity++;
+
+ if (arity == 1)
+ fntype = build_function_type_list (type_out, type_in, NULL);
+ else
+ fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+
+ /* Build a function declaration for the vectorized function. */
+ new_fndecl = build_decl (FUNCTION_DECL, get_identifier (name), fntype);
+ TREE_PUBLIC (new_fndecl) = 1;
+ DECL_EXTERNAL (new_fndecl) = 1;
+ DECL_IS_NOVOPS (new_fndecl) = 1;
+ TREE_READONLY (new_fndecl) = 1;
+
+ return new_fndecl;
+ }
+
+
/* Returns a decl of a function that implements conversion of the
input vector of type TYPE, or NULL_TREE if it is not available. */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 13:03 ` Richard Guenther
@ 2007-08-30 13:08 ` Uros Bizjak
2007-08-30 15:53 ` Tobias Burnus
2007-08-30 15:56 ` Uros Bizjak
2 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 13:08 UTC (permalink / raw)
To: Richard Guenther; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka
On 8/30/07, Richard Guenther <rguenther@suse.de> wrote:
> 2007-08-24 Richard Guenther <rguenther@suse.de>
>
> * doc/invoke.texi (-mveclibabi): Document new target option.
> * config/i386/i386.opt (-mveclibabi): New target option.
> * config/i386/i386.c (ix86_veclib_handler): Handler for
> vectorization library support.
> (override_options): Handle the -mveclibabi option, initialize
> the vectorization library handler.
> (ix86_builtin_vectorized_function): As fallback call the
> vectorization library handler, if set.
> (ix86_veclibabi_acml): New static function for ACML ABI style
> vectorization support.
This is OK for mainline; maybe a target testcase that checks if calls
are indeed generated would be also nice here (a lot of users are
looking into the testsuite, how certain feature is invoked).
> + @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
> + and @code{__vrs4_powf} when using this type and @option{-ftree-vectorize}
> + is enabled. A ACML ABI compatible library will have to be specified
An ACML ABI ...
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 13:03 ` Richard Guenther
2007-08-30 13:08 ` Uros Bizjak
@ 2007-08-30 15:53 ` Tobias Burnus
2007-08-30 16:08 ` Richard Guenther
2007-08-30 15:56 ` Uros Bizjak
2 siblings, 1 reply; 16+ messages in thread
From: Tobias Burnus @ 2007-08-30 15:53 UTC (permalink / raw)
To: Richard Guenther; +Cc: GCC Patches
Richard Guenther wrote:
> I am re-testing the following. The option is now -mveclibabi and
> the docs have been extended to mention -ftree-vectorize and the
> required linking adjustment.
As it is now checked in, can your write something for
http://gcc.gnu.org/gcc-4.3/changes.html ?
* * *
Some initial results for the Polyhedron test on Athlon64 4800+ using:
gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
-ftree-vectorize -msse3 -O3
and the same plus -mveclibabi=acml -lacml_mv (ACML 3.6.1)
Result:
Test noACML ACML
----------------------------
ac 13.87 13.89 100%
aermod 38.12 32.91 86% <<<
air 14.09 14.11 100%
capacita 82.45 82.37 100%
channel 12.70 12.67 100%
doduc 43.15 34.81 80% <<<
fatigue 12.11 12.04 99%
gas_dyn 12.03 12.09 100%
induct 48.83 48.10 99%
linpk 26.00 25.93 100%
mdbx 24.28 24.29 100%
nf 29.69 29.67 100%
protein 64.51 64.37 100%
rnflow 36.80 36.99 101%
test_fpu 19.98 19.98 100%
tfft 7.74 7.65 99%
----------------------------
Geo.Mean 24.46 23.88 97.6% <<
Tobias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 13:03 ` Richard Guenther
2007-08-30 13:08 ` Uros Bizjak
2007-08-30 15:53 ` Tobias Burnus
@ 2007-08-30 15:56 ` Uros Bizjak
2 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 15:56 UTC (permalink / raw)
To: Richard Guenther; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka
Richard Guenther wrote:
> 2007-08-24 Richard Guenther <rguenther@suse.de>
>
> * doc/invoke.texi (-mveclibabi): Document new target option.
> * config/i386/i386.opt (-mveclibabi): New target option.
> * config/i386/i386.c (ix86_veclib_handler): Handler for
> vectorization library support.
>
BTW: Do you perhaps have some benchmark results at hand, to illustrate
impact of this change on popular benchmark scores?
Thanks,
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 15:53 ` Tobias Burnus
@ 2007-08-30 16:08 ` Richard Guenther
2007-08-30 19:40 ` Gerald Pfeifer
0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-30 16:08 UTC (permalink / raw)
To: Tobias Burnus; +Cc: GCC Patches, Gerald Pfeifer
On Thu, 30 Aug 2007, Tobias Burnus wrote:
> Richard Guenther wrote:
> > I am re-testing the following. The option is now -mveclibabi and
> > the docs have been extended to mention -ftree-vectorize and the
> > required linking adjustment.
>
> As it is now checked in, can your write something for
> http://gcc.gnu.org/gcc-4.3/changes.html ?
Like the following? Ok?
Thanks,
Richard.
Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.3/changes.html,v
retrieving revision 1.66
diff -u -r1.66 changes.html
--- changes.html 25 Aug 2007 16:17:48 -0000 1.66
+++ changes.html 30 Aug 2007 15:53:11 -0000
@@ -403,6 +403,9 @@
<code>signed</code> or <code>unsigned</code> quad (TImode) integer
types. Additionally, all operations generate the full set of IEEE
exceptions and support the full set of IEEE rounding modes.</li>
+ <li>GCC can now utilize the ACML library for vectorizing calls to
+ a set of C99 functions on x86_64 if <code>-mveclibabi=acml</code>
+ is specified and you link to an ACML ABI compatible library.</li>
</ul>
<h3>ARM</h3>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-30 16:08 ` Richard Guenther
@ 2007-08-30 19:40 ` Gerald Pfeifer
0 siblings, 0 replies; 16+ messages in thread
From: Gerald Pfeifer @ 2007-08-30 19:40 UTC (permalink / raw)
To: Richard Guenther; +Cc: Tobias Burnus, GCC Patches
On Thu, 30 Aug 2007, Richard Guenther wrote:
> Like the following? Ok?
Looks good to my eyes. Thanks,
Gerald
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-30 16:59 Uros Bizjak
0 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 16:59 UTC (permalink / raw)
To: GCC Patches; +Cc: Tobias Burnus, Richard Guenther
Hello!
> Some initial results for the Polyhedron test on Athlon64 4800+ using:
>
> gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
> -ftree-vectorize -msse3 -O3
>
> and the same plus -mveclibabi=acml -lacml_mv (ACML 3.6.1)
>
>
> Result:
>
> Test noACML ACML
> ----------------------------
> ac 13.87 13.89 100%
> aermod 38.12 32.91 86% <<<
> air 14.09 14.11 100%
> capacita 82.45 82.37 100%
In aermod case, no function gets vectorized, but it looks that optimized
scalar functions are picked from acml_mv library.
Uros.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
2007-08-20 15:03 Richard Guenther
@ 2007-08-20 22:36 ` Hans-Peter Nilsson
0 siblings, 0 replies; 16+ messages in thread
From: Hans-Peter Nilsson @ 2007-08-20 22:36 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc-patches
On Mon, 20 Aug 2007, Richard Guenther wrote:
> Index: doc/invoke.texi
> ! -mveclib=@var{type} -mpc32 -mpc64 -mpc80 mstackrealign @gol
Not part of your change, but I believe there's a missing "-" on
mstackrealign.
brgds, H-P
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-20 15:03 Richard Guenther
2007-08-20 22:36 ` Hans-Peter Nilsson
0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-20 15:03 UTC (permalink / raw)
To: gcc-patches; +Cc: Jan Hubicka
This is an updated version for the AMD optimized math intrinsic library
(libacml_mv). With this patch we can vectorize calls to sin, cos, exp,
log, log2 and log10 for double and sinf, cosf, expf, powf, logf, log2f
and log10f for float arguments. The user still needs to manually link
against acml_mv (and obviously have it installed).
This is about the best thing we can do at the moment without introducing
libgcc-math, so I'd like to go ahead with this for 4.3 at least.
Any opinion on whether we automatically should link acml_mv?
I expect that intel folks may want to add support for the corresponding
functions in libimf (I believe the MKL doesn't have support, but the
intel compiler ships with libimf at least) and possibly the ppc/spu
folks to add support for their library.
Any objections? [I'm waiting for the tree to unbreak for testing]
Thanks,
Richard.
2006-12-10 Richard Guenther <rguenther@suse.de>
* doc/invoke.texi (-mveclib): Document new target option.
* config/i386/i386.opt (-mveclib): New target option.
* config/i386/i386.c (ix86_veclib_handler): Handler for
vectorization library support.
(override_options): Handle the -mveclib option, initialize
the vectorization library handler.
(ix86_builtin_vectorized_function): As fallback call the
vectorization library handler, if set.
(ix86_veclib_acml): New static function for ACML style
vectorization support.
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi.orig 2007-08-20 13:43:35.000000000 +0200
--- doc/invoke.texi 2007-08-20 16:13:29.000000000 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 556,562 ****
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -msseregparm @gol
! -mpc32 -mpc64 -mpc80 mstackrealign @gol
-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
-mcmodel=@var{code-model} @gol
-m32 -m64 -mlarge-data-threshold=@var{num}}
--- 556,562 ----
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -msseregparm @gol
! -mveclib=@var{type} -mpc32 -mpc64 -mpc80 mstackrealign @gol
-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
-mcmodel=@var{code-model} @gol
-m32 -m64 -mlarge-data-threshold=@var{num}}
*************** vectorized variants RCPPS and RSQRTPS) i
*** 10427,10432 ****
--- 10427,10443 ----
vectorized variants). These instructions will be generated only when
@option{-funsafe-math-optimizations} is enabled.
+ @item -mveclib=@var{type}
+ @opindex mveclib
+ Specifies the ABI type to use for vectorizing intrinsics using an
+ external library. Supported types are @code{acml} for the AMD
+ math core library style of interfacing. GCC will currently emit
+ calls to @code{__vrd2_sin}, @code{__vrd2_cos}, @code{__vrd2_exp},
+ @code{__vrd2_log}, @code{__vrd2_log2}, @code{__vrd2_log10},
+ @code{__vrs4_sinf}, @code{__vrs4_cosf}, @code{__vrs4_expf},
+ @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
+ and @code{__vrs4_powf} when using this type.
+
@item -mpush-args
@itemx -mno-push-args
@opindex mpush-args
Index: config/i386/i386.opt
===================================================================
*** config/i386/i386.opt.orig 2007-08-20 13:44:10.000000000 +0200
--- config/i386/i386.opt 2007-08-20 16:14:22.000000000 +0200
*************** mtune=
*** 182,187 ****
--- 182,191 ----
Target RejectNegative Joined Var(ix86_tune_string)
Schedule code for given CPU
+ mveclib=
+ Target RejectNegative Joined Var(ix86_veclib_string)
+ Vector library interface to use
+
;; ISA support
m32
Index: config/i386/i386.c
===================================================================
*** config/i386/i386.c.orig 2007-08-20 13:44:10.000000000 +0200
--- config/i386/i386.c 2007-08-20 16:34:13.000000000 +0200
*************** static int ix86_isa_flags_explicit;
*** 1620,1625 ****
--- 1620,1629 ----
#define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4
+ /* Vectorization library interface and handlers. */
+ tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL;
+ static tree ix86_veclib_acml (enum built_in_function, tree, tree);
+
/* Implement TARGET_HANDLE_OPTION. */
static bool
*************** override_options (void)
*** 2409,2414 ****
--- 2413,2428 ----
if (!TARGET_80387)
target_flags &= ~MASK_FLOAT_RETURNS;
+ /* Use external vectorized library in vectorizing intrinsics. */
+ if (ix86_veclib_string)
+ {
+ if (strcmp (ix86_veclib_string, "acml") == 0)
+ ix86_veclib_handler = ix86_veclib_acml;
+ else
+ error ("unknown vectorization library type (%s) for -mveclib= switch",
+ ix86_veclib_string);
+ }
+
if ((x86_accumulate_outgoing_args & ix86_tune_mask)
&& !(target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS)
&& !optimize_size)
*************** ix86_builtin_vectorized_function (unsign
*** 19919,19951 ****
if (out_mode == DFmode && out_n == 2
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_SQRTPD];
! return NULL_TREE;
case BUILT_IN_SQRTF:
if (out_mode == SFmode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_SQRTPS];
! return NULL_TREE;
case BUILT_IN_LRINT:
if (out_mode == SImode && out_n == 4
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
! return NULL_TREE;
case BUILT_IN_LRINTF:
if (out_mode == SImode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
! return NULL_TREE;
default:
;
}
return NULL_TREE;
}
/* Returns a decl of a function that implements conversion of the
input vector of type TYPE, or NULL_TREE if it is not available. */
--- 19933,20054 ----
if (out_mode == DFmode && out_n == 2
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_SQRTPD];
! break;
case BUILT_IN_SQRTF:
if (out_mode == SFmode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_SQRTPS];
! break;
case BUILT_IN_LRINT:
if (out_mode == SImode && out_n == 4
&& in_mode == DFmode && in_n == 2)
return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
! break;
case BUILT_IN_LRINTF:
if (out_mode == SImode && out_n == 4
&& in_mode == SFmode && in_n == 4)
return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
! break;
default:
;
}
+ /* Dispatch to a handler for a vectorization library. */
+ if (ix86_veclib_handler)
+ return (*ix86_veclib_handler)(fn, type_out, type_in);
+
return NULL_TREE;
}
+ /* Handler for an ACML-style interface to a library with vectorized
+ intrinsics. */
+
+ static tree
+ ix86_veclib_acml (enum built_in_function fn, tree type_out, tree type_in)
+ {
+ char name[20] = "__vr.._";
+ tree fntype, new_fndecl, args;
+ unsigned arity;
+ const char *bname;
+ enum machine_mode el_mode, in_mode;
+ int n, in_n;
+
+ /* The ACML is 64bits only and suitable for unsafe math only as
+ it does not correctly support parts of IEEE with the required
+ precision such as denormals. */
+ if (!TARGET_64BIT
+ || !flag_unsafe_math_optimizations)
+ return NULL_TREE;
+
+ el_mode = TYPE_MODE (TREE_TYPE (type_out));
+ n = TYPE_VECTOR_SUBPARTS (type_out);
+ in_mode = TYPE_MODE (TREE_TYPE (type_in));
+ in_n = TYPE_VECTOR_SUBPARTS (type_in);
+ if (el_mode != in_mode
+ || n != in_n)
+ return NULL_TREE;
+
+ switch (fn)
+ {
+ case BUILT_IN_SIN:
+ case BUILT_IN_COS:
+ case BUILT_IN_EXP:
+ case BUILT_IN_LOG:
+ case BUILT_IN_LOG2:
+ case BUILT_IN_LOG10:
+ name[4] = 'd';
+ name[5] = '2';
+ if (el_mode != DFmode
+ || n != 2)
+ return NULL_TREE;
+ break;
+
+ case BUILT_IN_SINF:
+ case BUILT_IN_COSF:
+ case BUILT_IN_EXPF:
+ case BUILT_IN_POWF:
+ case BUILT_IN_LOGF:
+ case BUILT_IN_LOG2F:
+ case BUILT_IN_LOG10F:
+ name[4] = 's';
+ name[5] = '4';
+ if (el_mode != SFmode
+ || n != 4)
+ return NULL_TREE;
+ break;
+
+ default:
+ return NULL_TREE;
+ }
+
+ bname = IDENTIFIER_POINTER (DECL_NAME (implicit_built_in_decls[fn]));
+ sprintf (name + 7, "%s", bname+10);
+
+ arity = 0;
+ for (args = DECL_ARGUMENTS (implicit_built_in_decls[fn]); args;
+ args = TREE_CHAIN (args))
+ arity++;
+
+ if (arity == 1)
+ fntype = build_function_type_list (type_out, type_in, NULL);
+ else
+ fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+
+ /* Build a function declaration for the vectorized function. */
+ new_fndecl = build_decl (FUNCTION_DECL, get_identifier (name), fntype);
+ TREE_PUBLIC (new_fndecl) = 1;
+ DECL_EXTERNAL (new_fndecl) = 1;
+ DECL_IS_NOVOPS (new_fndecl) = 1;
+ TREE_READONLY (new_fndecl) = 1;
+
+ return new_fndecl;
+ }
+
+
/* Returns a decl of a function that implements conversion of the
input vector of type TYPE, or NULL_TREE if it is not available. */
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-08-30 18:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-20 19:57 [PATCH] Update support patch for ACML vectorized intrinsic library Uros Bizjak
2007-08-21 12:41 ` Richard Guenther
2007-08-24 11:03 ` Richard Guenther
2007-08-24 11:13 ` Uros Bizjak
2007-08-24 11:18 ` Richard Guenther
2007-08-24 12:34 ` Paolo Bonzini
2007-08-30 12:02 ` Uros Bizjak
2007-08-30 13:03 ` Richard Guenther
2007-08-30 13:08 ` Uros Bizjak
2007-08-30 15:53 ` Tobias Burnus
2007-08-30 16:08 ` Richard Guenther
2007-08-30 19:40 ` Gerald Pfeifer
2007-08-30 15:56 ` Uros Bizjak
-- strict thread matches above, loose matches on Subject: below --
2007-08-30 16:59 Uros Bizjak
2007-08-20 15:03 Richard Guenther
2007-08-20 22:36 ` Hans-Peter Nilsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).