public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-20 19:57 Uros Bizjak
  2007-08-21 12:41 ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-20 19:57 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka

Hello!

> This is about the best thing we can do at the moment without introducing
> libgcc-math, so I'd like to go ahead with this for 4.3 at least.
>
> Any opinion on whether we automatically should link acml_mv?

What about having to specify full library name to -mveclib= ? This name 
can be processed by appropriate _SPEC define to automatically link 
specified library. The benefit of specifying a full name would be to 
distinguish between i.e. acml_mv and (possible) acml_mv2.

(Ideally, we should detect when acml_mv library was added using 
-lacml_mv and trigger correct veclib handler. I'm not sure, this is 
possible...).

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-20 19:57 [PATCH] Update support patch for ACML vectorized intrinsic library Uros Bizjak
@ 2007-08-21 12:41 ` Richard Guenther
  2007-08-24 11:03   ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-21 12:41 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka

On Mon, 20 Aug 2007, Uros Bizjak wrote:

> Hello!
> 
> > This is about the best thing we can do at the moment without introducing
> > libgcc-math, so I'd like to go ahead with this for 4.3 at least.
> >
> > Any opinion on whether we automatically should link acml_mv?
> 
> What about having to specify full library name to -mveclib= ? This name can be
> processed by appropriate _SPEC define to automatically link specified library.
> The benefit of specifying a full name would be to distinguish between i.e.
> acml_mv and (possible) acml_mv2.

Uh, I don't like giving full paths to an option.  If acml_mv will become
acml_mv2 it probably changes the ABI so we would need adjustments to the
code anyway, so I don't expect this to happen.

We should be able to process -mveclib=acml in the specs processing as well
and just add -lacml_mv - the question was mainly whether we should do that
(given that gfortran doesn't do it for -fexternal-blas).

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-21 12:41 ` Richard Guenther
@ 2007-08-24 11:03   ` Richard Guenther
  2007-08-24 11:13     ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-24 11:03 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka

On Tue, 21 Aug 2007, Richard Guenther wrote:

> On Mon, 20 Aug 2007, Uros Bizjak wrote:
> 
> > Hello!
> > 
> > > This is about the best thing we can do at the moment without introducing
> > > libgcc-math, so I'd like to go ahead with this for 4.3 at least.
> > >
> > > Any opinion on whether we automatically should link acml_mv?
> > 
> > What about having to specify full library name to -mveclib= ? This name can be
> > processed by appropriate _SPEC define to automatically link specified library.
> > The benefit of specifying a full name would be to distinguish between i.e.
> > acml_mv and (possible) acml_mv2.
> 
> Uh, I don't like giving full paths to an option.  If acml_mv will become
> acml_mv2 it probably changes the ABI so we would need adjustments to the
> code anyway, so I don't expect this to happen.
> 
> We should be able to process -mveclib=acml in the specs processing as well
> and just add -lacml_mv - the question was mainly whether we should do that
> (given that gfortran doesn't do it for -fexternal-blas).

So, the following extra hunk for the patch automatically links the
library.

Is the patch ok for mainline?

Thanks,
Richard.


        * config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
        Link with libacml_mv as needed, if building with -mveclib=acml.

Index: config/i386/linux64.h
===================================================================
*** config/i386/linux64.h.orig  2007-08-20 13:44:10.000000000 +0200
--- config/i386/linux64.h       2007-08-24 12:48:44.000000000 +0200
*************** along with GCC; see the file COPYING3.
*** 74,79 ****
--- 74,86 ----
        %{" SPEC_64 ":%{!dynamic-linker:-dynamic-linker " 
LINUX_DYNAMIC_LINKER64 "}}} \
      %{static:-static}}"

+ #undef  LIB_SPEC
+ #define LIB_SPEC\
+   "%{pthread:-lpthread} \
+    %{shared:-lc} \
+    %{!shared:%{mieee-fp:-lieee} %{profile:-lc_p}%{!profile:-lc}} \
+    %{mveclib=acml:--as-needed -lacml_mv --no-as-needed}"
+
  /* Similar to standard Linux, but adding -ffast-math support.  */
  #undef  ENDFILE_SPEC
  #define ENDFILE_SPEC \

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
  2007-08-24 11:03   ` Richard Guenther
@ 2007-08-24 11:13     ` Uros Bizjak
  2007-08-24 11:18       ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-24 11:13 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka

On 8/24/07, Richard Guenther <rguenther@suse.de> wrote:

> So, the following extra hunk for the patch automatically links the
> library.
>
> Is the patch ok for mainline?

>         * config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
>         Link with libacml_mv as needed, if building with -mveclib=acml.

I still think that having -mveclib=acml_mv and to pass acml_mv
automatically to -l as a variable is better. But I'll leave this to
your choice (but perhaps we should wait for another opinion).

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-24 11:13     ` Uros Bizjak
@ 2007-08-24 11:18       ` Richard Guenther
  2007-08-24 12:34         ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-24 11:18 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Jan Hubicka

On Fri, 24 Aug 2007, Uros Bizjak wrote:

> On 8/24/07, Richard Guenther <rguenther@suse.de> wrote:
> 
> > So, the following extra hunk for the patch automatically links the
> > library.
> >
> > Is the patch ok for mainline?
> 
> >         * config/i386/linux64.h (LIB_SPEC): Copy from config/linux.h.
> >         Link with libacml_mv as needed, if building with -mveclib=acml.
> 
> I still think that having -mveclib=acml_mv and to pass acml_mv
> automatically to -l as a variable is better. But I'll leave this to
> your choice (but perhaps we should wait for another opinion).

I'm somewhat unsure here - for one -mveclib was supposed to select
an ABI for the vectorization library, but of course automatically
linking against some specific library name makes this pointless
(that is, it doesn't allow for an alternate implementation with
a different name).  Now if we want some automatic linking then still
the option should be easy enough to recognize (acml_mv is not from
my point of view).

So I am fine with either of two options - not linking automatically
or linking against libacml_mv with -mveclib=acml.

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-24 11:18       ` Richard Guenther
@ 2007-08-24 12:34         ` Paolo Bonzini
  2007-08-30 12:02           ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2007-08-24 12:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Uros Bizjak, GCC Patches, Jan Hubicka


> So I am fine with either of two options - not linking automatically
> or linking against libacml_mv with -mveclib=acml.

What about having -mveclibabi=acml doing what it does now, and 
-mveclib=acml doing "-mveclibabi=acml -lacml_mv"?  Which you could also 
read as, link automatically with libacml_mv and make a note to implement 
-mveclibabi when the need arises...

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
  2007-08-24 12:34         ` Paolo Bonzini
@ 2007-08-30 12:02           ` Uros Bizjak
  2007-08-30 13:03             ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 12:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, GCC Patches, Jan Hubicka

On 8/24/07, Paolo Bonzini <bonzini@gnu.org> wrote:

> > So I am fine with either of two options - not linking automatically
> > or linking against libacml_mv with -mveclib=acml.
>
> What about having -mveclibabi=acml doing what it does now, and
> -mveclib=acml doing "-mveclibabi=acml -lacml_mv"?  Which you could also
> read as, link automatically with libacml_mv and make a note to implement
> -mveclibabi when the need arises...

Let's follow the example with -fexternal-blas and for this moment
implement original Richard's solution. However, I'd rename
proposed-mveclib option into -mveclibabi=... as suggested by Paolo,
because this is IMO less confusing.

So, we will have: -mveclibabi=acml -lacml_mv.

Also, it is possible to add something like gfortran has for -fexternal-blas:

`-fexternal-blas'
     This option will make `gfortran' generate calls to BLAS functions
     for some matrix operations like `MATMUL', instead of using our own
     algorithms, if the size of the matrices involved is larger than a
     given limit (see `-fblas-matmul-limit').  This may be profitable
     if an optimized vendor BLAS library is available.  The BLAS
     library will have to be specified at link time.

to the docs?

The (original) patch is OK for mainline as far as i386 is concerned.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-30 12:02           ` Uros Bizjak
@ 2007-08-30 13:03             ` Richard Guenther
  2007-08-30 13:08               ` Uros Bizjak
                                 ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Richard Guenther @ 2007-08-30 13:03 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka

On Thu, 30 Aug 2007, Uros Bizjak wrote:

> On 8/24/07, Paolo Bonzini <bonzini@gnu.org> wrote:
> 
> > > So I am fine with either of two options - not linking automatically
> > > or linking against libacml_mv with -mveclib=acml.
> >
> > What about having -mveclibabi=acml doing what it does now, and
> > -mveclib=acml doing "-mveclibabi=acml -lacml_mv"?  Which you could also
> > read as, link automatically with libacml_mv and make a note to implement
> > -mveclibabi when the need arises...
> 
> Let's follow the example with -fexternal-blas and for this moment
> implement original Richard's solution. However, I'd rename
> proposed-mveclib option into -mveclibabi=... as suggested by Paolo,
> because this is IMO less confusing.
> 
> So, we will have: -mveclibabi=acml -lacml_mv.
> 
> Also, it is possible to add something like gfortran has for -fexternal-blas:
> 
> `-fexternal-blas'
>      This option will make `gfortran' generate calls to BLAS functions
>      for some matrix operations like `MATMUL', instead of using our own
>      algorithms, if the size of the matrices involved is larger than a
>      given limit (see `-fblas-matmul-limit').  This may be profitable
>      if an optimized vendor BLAS library is available.  The BLAS
>      library will have to be specified at link time.
> 
> to the docs?

I am re-testing the following.  The option is now -mveclibabi and
the docs have been extended to mention -ftree-vectorize and the
required linking adjustment.

Does this look ok?

Thanks,
Richard.

2007-08-24  Richard Guenther  <rguenther@suse.de>

	* doc/invoke.texi (-mveclibabi): Document new target option.
	* config/i386/i386.opt (-mveclibabi): New target option.
	* config/i386/i386.c (ix86_veclib_handler): Handler for
	vectorization library support.
	(override_options): Handle the -mveclibabi option, initialize
	the vectorization library handler.
	(ix86_builtin_vectorized_function): As fallback call the
	vectorization library handler, if set.
	(ix86_veclibabi_acml): New static function for ACML ABI style
	vectorization support.

Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2007-08-30 14:25:10.000000000 +0200
--- gcc/doc/invoke.texi	2007-08-30 14:27:11.000000000 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 557,563 ****
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
! -mpc32 -mpc64 -mpc80 mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
  -m32  -m64 -mlarge-data-threshold=@var{num}}
--- 557,563 ----
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
! -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
  -m32  -m64 -mlarge-data-threshold=@var{num}}
*************** vectorized variants RCPPS and RSQRTPS) i
*** 10440,10445 ****
--- 10440,10458 ----
  vectorized variants).  These instructions will be generated only when
  @option{-funsafe-math-optimizations} is enabled.
  
+ @item -mveclibabi=@var{type}
+ @opindex mveclibabi
+ Specifies the ABI type to use for vectorizing intrinsics using an
+ external library.  Supported types are @code{acml} for the AMD
+ math core library style of interfacing.  GCC will currently emit
+ calls to @code{__vrd2_sin}, @code{__vrd2_cos}, @code{__vrd2_exp},
+ @code{__vrd2_log}, @code{__vrd2_log2}, @code{__vrd2_log10},
+ @code{__vrs4_sinf}, @code{__vrs4_cosf}, @code{__vrs4_expf},
+ @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
+ and @code{__vrs4_powf} when using this type and @option{-ftree-vectorize}
+ is enabled.  A ACML ABI compatible library will have to be specified
+ at link time.
+ 
  @item -mpush-args
  @itemx -mno-push-args
  @opindex mpush-args
Index: gcc/config/i386/i386.opt
===================================================================
*** gcc/config/i386/i386.opt.orig	2007-08-30 14:25:10.000000000 +0200
--- gcc/config/i386/i386.opt	2007-08-30 14:27:46.000000000 +0200
*************** mtune=
*** 182,187 ****
--- 182,191 ----
  Target RejectNegative Joined Var(ix86_tune_string)
  Schedule code for given CPU
  
+ mveclibabi=
+ Target RejectNegative Joined Var(ix86_veclibabi_string)
+ Vector library ABI to use
+ 
  ;; ISA support
  
  m32
Index: gcc/config/i386/i386.c
===================================================================
*** gcc/config/i386/i386.c.orig	2007-08-30 14:25:10.000000000 +0200
--- gcc/config/i386/i386.c	2007-08-30 14:29:02.000000000 +0200
*************** static int ix86_isa_flags_explicit;
*** 1620,1625 ****
--- 1620,1629 ----
  
  #define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4
  
+ /* Vectorization library interface and handlers.  */
+ tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL;
+ static tree ix86_veclibabi_acml (enum built_in_function, tree, tree);
+ 
  /* Implement TARGET_HANDLE_OPTION.  */
  
  static bool
*************** override_options (void)
*** 2409,2414 ****
--- 2413,2428 ----
    if (!TARGET_80387)
      target_flags &= ~MASK_FLOAT_RETURNS;
  
+   /* Use external vectorized library in vectorizing intrinsics.  */
+   if (ix86_veclibabi_string)
+     {
+       if (strcmp (ix86_veclibabi_string, "acml") == 0)
+ 	ix86_veclib_handler = ix86_veclibabi_acml;
+       else
+ 	error ("unknown vectorization library ABI type (%s) for "
+ 	       "-mveclibabi= switch", ix86_veclibabi_string);
+     }
+ 
    if ((x86_accumulate_outgoing_args & ix86_tune_mask)
        && !(target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS)
        && !optimize_size)
*************** ix86_builtin_vectorized_function (unsign
*** 19934,19966 ****
        if (out_mode == DFmode && out_n == 2
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_SQRTPD];
!       return NULL_TREE;
  
      case BUILT_IN_SQRTF:
        if (out_mode == SFmode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_SQRTPS];
!       return NULL_TREE;
  
      case BUILT_IN_LRINT:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
!       return NULL_TREE;
  
      case BUILT_IN_LRINTF:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
!       return NULL_TREE;
  
      default:
        ;
      }
  
    return NULL_TREE;
  }
  
  /* Returns a decl of a function that implements conversion of the
     input vector of type TYPE, or NULL_TREE if it is not available.  */
  
--- 19948,20069 ----
        if (out_mode == DFmode && out_n == 2
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_SQRTPD];
!       break;
  
      case BUILT_IN_SQRTF:
        if (out_mode == SFmode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_SQRTPS];
!       break;
  
      case BUILT_IN_LRINT:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
!       break;
  
      case BUILT_IN_LRINTF:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
!       break;
  
      default:
        ;
      }
  
+   /* Dispatch to a handler for a vectorization library.  */
+   if (ix86_veclib_handler)
+     return (*ix86_veclib_handler)(fn, type_out, type_in);
+ 
    return NULL_TREE;
  }
  
+ /* Handler for an ACML-style interface to a library with vectorized
+    intrinsics.  */
+ 
+ static tree
+ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
+ {
+   char name[20] = "__vr.._";
+   tree fntype, new_fndecl, args;
+   unsigned arity;
+   const char *bname;
+   enum machine_mode el_mode, in_mode;
+   int n, in_n;
+ 
+   /* The ACML is 64bits only and suitable for unsafe math only as
+      it does not correctly support parts of IEEE with the required
+      precision such as denormals.  */
+   if (!TARGET_64BIT
+       || !flag_unsafe_math_optimizations)
+     return NULL_TREE;
+ 
+   el_mode = TYPE_MODE (TREE_TYPE (type_out));
+   n = TYPE_VECTOR_SUBPARTS (type_out);
+   in_mode = TYPE_MODE (TREE_TYPE (type_in));
+   in_n = TYPE_VECTOR_SUBPARTS (type_in);
+   if (el_mode != in_mode
+       || n != in_n)
+     return NULL_TREE;
+ 
+   switch (fn)
+     {
+     case BUILT_IN_SIN:
+     case BUILT_IN_COS:
+     case BUILT_IN_EXP:
+     case BUILT_IN_LOG:
+     case BUILT_IN_LOG2:
+     case BUILT_IN_LOG10:
+       name[4] = 'd';
+       name[5] = '2';
+       if (el_mode != DFmode
+ 	  || n != 2)
+ 	return NULL_TREE;
+       break;
+ 
+     case BUILT_IN_SINF:
+     case BUILT_IN_COSF:
+     case BUILT_IN_EXPF:
+     case BUILT_IN_POWF:
+     case BUILT_IN_LOGF:
+     case BUILT_IN_LOG2F:
+     case BUILT_IN_LOG10F:
+       name[4] = 's';
+       name[5] = '4';
+       if (el_mode != SFmode
+ 	  || n != 4)
+ 	return NULL_TREE;
+       break;
+     
+     default:
+       return NULL_TREE;
+     }
+ 
+   bname = IDENTIFIER_POINTER (DECL_NAME (implicit_built_in_decls[fn]));
+   sprintf (name + 7, "%s", bname+10);
+ 
+   arity = 0;
+   for (args = DECL_ARGUMENTS (implicit_built_in_decls[fn]); args;
+        args = TREE_CHAIN (args))
+     arity++;
+ 
+   if (arity == 1)
+     fntype = build_function_type_list (type_out, type_in, NULL);
+   else
+     fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+ 
+   /* Build a function declaration for the vectorized function.  */
+   new_fndecl = build_decl (FUNCTION_DECL, get_identifier (name), fntype);
+   TREE_PUBLIC (new_fndecl) = 1;
+   DECL_EXTERNAL (new_fndecl) = 1;
+   DECL_IS_NOVOPS (new_fndecl) = 1;
+   TREE_READONLY (new_fndecl) = 1;
+ 
+   return new_fndecl;
+ }
+ 
+ 
  /* Returns a decl of a function that implements conversion of the
     input vector of type TYPE, or NULL_TREE if it is not available.  */
  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
  2007-08-30 13:03             ` Richard Guenther
@ 2007-08-30 13:08               ` Uros Bizjak
  2007-08-30 15:53               ` Tobias Burnus
  2007-08-30 15:56               ` Uros Bizjak
  2 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 13:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka

On 8/30/07, Richard Guenther <rguenther@suse.de> wrote:

> 2007-08-24  Richard Guenther  <rguenther@suse.de>
>
>         * doc/invoke.texi (-mveclibabi): Document new target option.
>         * config/i386/i386.opt (-mveclibabi): New target option.
>         * config/i386/i386.c (ix86_veclib_handler): Handler for
>         vectorization library support.
>         (override_options): Handle the -mveclibabi option, initialize
>         the vectorization library handler.
>         (ix86_builtin_vectorized_function): As fallback call the
>         vectorization library handler, if set.
>         (ix86_veclibabi_acml): New static function for ACML ABI style
>         vectorization support.

This is OK for mainline; maybe a target testcase that checks if calls
are indeed generated would be also nice here (a lot of users are
looking into the testsuite, how certain feature is invoked).

> + @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
> + and @code{__vrs4_powf} when using this type and @option{-ftree-vectorize}
> + is enabled.  A ACML ABI compatible library will have to be specified

An ACML ABI ...

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-30 13:03             ` Richard Guenther
  2007-08-30 13:08               ` Uros Bizjak
@ 2007-08-30 15:53               ` Tobias Burnus
  2007-08-30 16:08                 ` Richard Guenther
  2007-08-30 15:56               ` Uros Bizjak
  2 siblings, 1 reply; 16+ messages in thread
From: Tobias Burnus @ 2007-08-30 15:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches

Richard Guenther wrote:
> I am re-testing the following.  The option is now -mveclibabi and
> the docs have been extended to mention -ftree-vectorize and the
> required linking adjustment.

As it is now checked in, can your write something for
http://gcc.gnu.org/gcc-4.3/changes.html ?

 * * *

Some initial results for the Polyhedron test on Athlon64 4800+ using:

gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
-ftree-vectorize -msse3 -O3

and the same plus -mveclibabi=acml -lacml_mv (ACML 3.6.1)


Result:

Test     noACML ACML
----------------------------
ac       13.87  13.89  100%
aermod   38.12  32.91   86% <<<
air      14.09  14.11  100%
capacita 82.45  82.37  100%
channel  12.70  12.67  100%
doduc    43.15  34.81   80% <<<
fatigue  12.11  12.04   99%
gas_dyn  12.03  12.09  100%
induct   48.83  48.10   99%
linpk    26.00  25.93  100%
mdbx     24.28  24.29  100%
nf       29.69  29.67  100%
protein  64.51  64.37  100%
rnflow   36.80  36.99  101%
test_fpu 19.98  19.98  100%
tfft      7.74   7.65   99%
----------------------------
Geo.Mean 24.46  23.88   97.6% <<

Tobias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
  2007-08-30 13:03             ` Richard Guenther
  2007-08-30 13:08               ` Uros Bizjak
  2007-08-30 15:53               ` Tobias Burnus
@ 2007-08-30 15:56               ` Uros Bizjak
  2 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 15:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Paolo Bonzini, GCC Patches, Jan Hubicka

Richard Guenther wrote:

> 2007-08-24  Richard Guenther  <rguenther@suse.de>
>
> 	* doc/invoke.texi (-mveclibabi): Document new target option.
> 	* config/i386/i386.opt (-mveclibabi): New target option.
> 	* config/i386/i386.c (ix86_veclib_handler): Handler for
> 	vectorization library support.
>   

BTW: Do you perhaps have some benchmark results at hand, to illustrate 
impact of this change on popular benchmark scores?

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic   library
  2007-08-30 15:53               ` Tobias Burnus
@ 2007-08-30 16:08                 ` Richard Guenther
  2007-08-30 19:40                   ` Gerald Pfeifer
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-30 16:08 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: GCC Patches, Gerald Pfeifer

On Thu, 30 Aug 2007, Tobias Burnus wrote:

> Richard Guenther wrote:
> > I am re-testing the following.  The option is now -mveclibabi and
> > the docs have been extended to mention -ftree-vectorize and the
> > required linking adjustment.
> 
> As it is now checked in, can your write something for
> http://gcc.gnu.org/gcc-4.3/changes.html ?

Like the following?  Ok?

Thanks,
Richard.

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.3/changes.html,v
retrieving revision 1.66
diff -u -r1.66 changes.html
--- changes.html	25 Aug 2007 16:17:48 -0000	1.66
+++ changes.html	30 Aug 2007 15:53:11 -0000
@@ -403,6 +403,9 @@
 	<code>signed</code> or <code>unsigned</code> quad (TImode) integer
 	types.  Additionally, all operations generate the full set of IEEE
 	exceptions and support the full set of IEEE rounding modes.</li>
+    <li>GCC can now utilize the ACML library for vectorizing calls to
+	a set of C99 functions on x86_64 if <code>-mveclibabi=acml</code>
+	is specified and you link to an ACML ABI compatible library.</li>
   </ul>
 
 <h3>ARM</h3>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic    library
  2007-08-30 16:08                 ` Richard Guenther
@ 2007-08-30 19:40                   ` Gerald Pfeifer
  0 siblings, 0 replies; 16+ messages in thread
From: Gerald Pfeifer @ 2007-08-30 19:40 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Tobias Burnus, GCC Patches

On Thu, 30 Aug 2007, Richard Guenther wrote:
> Like the following?  Ok?

Looks good to my eyes.  Thanks,
Gerald

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-30 16:59 Uros Bizjak
  0 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2007-08-30 16:59 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tobias Burnus, Richard Guenther

Hello!

> Some initial results for the Polyhedron test on Athlon64 4800+ using:
>
> gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
> -ftree-vectorize -msse3 -O3
>
> and the same plus -mveclibabi=acml -lacml_mv (ACML 3.6.1)
>
>
> Result:
>
> Test     noACML ACML
> ----------------------------
> ac       13.87  13.89  100%
> aermod   38.12  32.91   86% <<<
> air      14.09  14.11  100%
> capacita 82.45  82.37  100%

In aermod case, no function gets vectorized, but it looks that optimized 
scalar functions are picked from acml_mv library.

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Update support patch for ACML vectorized intrinsic  library
  2007-08-20 15:03 Richard Guenther
@ 2007-08-20 22:36 ` Hans-Peter Nilsson
  0 siblings, 0 replies; 16+ messages in thread
From: Hans-Peter Nilsson @ 2007-08-20 22:36 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Mon, 20 Aug 2007, Richard Guenther wrote:
> Index: doc/invoke.texi
> ! -mveclib=@var{type} -mpc32 -mpc64 -mpc80 mstackrealign @gol

Not part of your change, but I believe there's a missing "-" on
mstackrealign.

brgds, H-P

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Update support patch for ACML vectorized intrinsic library
@ 2007-08-20 15:03 Richard Guenther
  2007-08-20 22:36 ` Hans-Peter Nilsson
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-08-20 15:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka


This is an updated version for the AMD optimized math intrinsic library
(libacml_mv).  With this patch we can vectorize calls to sin, cos, exp,
log, log2 and log10 for double and sinf, cosf, expf, powf, logf, log2f
and log10f for float arguments.  The user still needs to manually link
against acml_mv (and obviously have it installed).

This is about the best thing we can do at the moment without introducing
libgcc-math, so I'd like to go ahead with this for 4.3 at least.

Any opinion on whether we automatically should link acml_mv?

I expect that intel folks may want to add support for the corresponding
functions in libimf (I believe the MKL doesn't have support, but the
intel compiler ships with libimf at least) and possibly the ppc/spu
folks to add support for their library.

Any objections?  [I'm waiting for the tree to unbreak for testing]

Thanks,
Richard.


2006-12-10  Richard Guenther  <rguenther@suse.de>

	* doc/invoke.texi (-mveclib): Document new target option.
	* config/i386/i386.opt (-mveclib): New target option.
	* config/i386/i386.c (ix86_veclib_handler): Handler for
	vectorization library support.
	(override_options): Handle the -mveclib option, initialize
	the vectorization library handler.
	(ix86_builtin_vectorized_function): As fallback call the
	vectorization library handler, if set.
	(ix86_veclib_acml): New static function for ACML style
	vectorization support.

Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi.orig	2007-08-20 13:43:35.000000000 +0200
--- doc/invoke.texi	2007-08-20 16:13:29.000000000 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 556,562 ****
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
! -mpc32 -mpc64 -mpc80 mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
  -m32  -m64 -mlarge-data-threshold=@var{num}}
--- 556,562 ----
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
! -mveclib=@var{type} -mpc32 -mpc64 -mpc80 mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
  -m32  -m64 -mlarge-data-threshold=@var{num}}
*************** vectorized variants RCPPS and RSQRTPS) i
*** 10427,10432 ****
--- 10427,10443 ----
  vectorized variants).  These instructions will be generated only when
  @option{-funsafe-math-optimizations} is enabled.
  
+ @item -mveclib=@var{type}
+ @opindex mveclib
+ Specifies the ABI type to use for vectorizing intrinsics using an
+ external library.  Supported types are @code{acml} for the AMD
+ math core library style of interfacing.  GCC will currently emit
+ calls to @code{__vrd2_sin}, @code{__vrd2_cos}, @code{__vrd2_exp},
+ @code{__vrd2_log}, @code{__vrd2_log2}, @code{__vrd2_log10},
+ @code{__vrs4_sinf}, @code{__vrs4_cosf}, @code{__vrs4_expf},
+ @code{__vrs4_logf}, @code{__vrs4_log2f}, @code{__vrs4_log10f}
+ and @code{__vrs4_powf} when using this type.
+ 
  @item -mpush-args
  @itemx -mno-push-args
  @opindex mpush-args
Index: config/i386/i386.opt
===================================================================
*** config/i386/i386.opt.orig	2007-08-20 13:44:10.000000000 +0200
--- config/i386/i386.opt	2007-08-20 16:14:22.000000000 +0200
*************** mtune=
*** 182,187 ****
--- 182,191 ----
  Target RejectNegative Joined Var(ix86_tune_string)
  Schedule code for given CPU
  
+ mveclib=
+ Target RejectNegative Joined Var(ix86_veclib_string)
+ Vector library interface to use
+ 
  ;; ISA support
  
  m32
Index: config/i386/i386.c
===================================================================
*** config/i386/i386.c.orig	2007-08-20 13:44:10.000000000 +0200
--- config/i386/i386.c	2007-08-20 16:34:13.000000000 +0200
*************** static int ix86_isa_flags_explicit;
*** 1620,1625 ****
--- 1620,1629 ----
  
  #define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4
  
+ /* Vectorization library interface and handlers.  */
+ tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL;
+ static tree ix86_veclib_acml (enum built_in_function, tree, tree);
+ 
  /* Implement TARGET_HANDLE_OPTION.  */
  
  static bool
*************** override_options (void)
*** 2409,2414 ****
--- 2413,2428 ----
    if (!TARGET_80387)
      target_flags &= ~MASK_FLOAT_RETURNS;
  
+   /* Use external vectorized library in vectorizing intrinsics.  */
+   if (ix86_veclib_string)
+     {
+       if (strcmp (ix86_veclib_string, "acml") == 0)
+ 	ix86_veclib_handler = ix86_veclib_acml;
+       else
+ 	error ("unknown vectorization library type (%s) for -mveclib= switch",
+ 	       ix86_veclib_string);
+     }
+ 
    if ((x86_accumulate_outgoing_args & ix86_tune_mask)
        && !(target_flags_explicit & MASK_ACCUMULATE_OUTGOING_ARGS)
        && !optimize_size)
*************** ix86_builtin_vectorized_function (unsign
*** 19919,19951 ****
        if (out_mode == DFmode && out_n == 2
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_SQRTPD];
!       return NULL_TREE;
  
      case BUILT_IN_SQRTF:
        if (out_mode == SFmode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_SQRTPS];
!       return NULL_TREE;
  
      case BUILT_IN_LRINT:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
!       return NULL_TREE;
  
      case BUILT_IN_LRINTF:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
!       return NULL_TREE;
  
      default:
        ;
      }
  
    return NULL_TREE;
  }
  
  /* Returns a decl of a function that implements conversion of the
     input vector of type TYPE, or NULL_TREE if it is not available.  */
  
--- 19933,20054 ----
        if (out_mode == DFmode && out_n == 2
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_SQRTPD];
!       break;
  
      case BUILT_IN_SQRTF:
        if (out_mode == SFmode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_SQRTPS];
!       break;
  
      case BUILT_IN_LRINT:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == DFmode && in_n == 2)
  	return ix86_builtins[IX86_BUILTIN_VEC_PACK_SFIX];
!       break;
  
      case BUILT_IN_LRINTF:
        if (out_mode == SImode && out_n == 4
  	  && in_mode == SFmode && in_n == 4)
  	return ix86_builtins[IX86_BUILTIN_CVTPS2DQ];
!       break;
  
      default:
        ;
      }
  
+   /* Dispatch to a handler for a vectorization library.  */
+   if (ix86_veclib_handler)
+     return (*ix86_veclib_handler)(fn, type_out, type_in);
+ 
    return NULL_TREE;
  }
  
+ /* Handler for an ACML-style interface to a library with vectorized
+    intrinsics.  */
+ 
+ static tree
+ ix86_veclib_acml (enum built_in_function fn, tree type_out, tree type_in)
+ {
+   char name[20] = "__vr.._";
+   tree fntype, new_fndecl, args;
+   unsigned arity;
+   const char *bname;
+   enum machine_mode el_mode, in_mode;
+   int n, in_n;
+ 
+   /* The ACML is 64bits only and suitable for unsafe math only as
+      it does not correctly support parts of IEEE with the required
+      precision such as denormals.  */
+   if (!TARGET_64BIT
+       || !flag_unsafe_math_optimizations)
+     return NULL_TREE;
+ 
+   el_mode = TYPE_MODE (TREE_TYPE (type_out));
+   n = TYPE_VECTOR_SUBPARTS (type_out);
+   in_mode = TYPE_MODE (TREE_TYPE (type_in));
+   in_n = TYPE_VECTOR_SUBPARTS (type_in);
+   if (el_mode != in_mode
+       || n != in_n)
+     return NULL_TREE;
+ 
+   switch (fn)
+     {
+     case BUILT_IN_SIN:
+     case BUILT_IN_COS:
+     case BUILT_IN_EXP:
+     case BUILT_IN_LOG:
+     case BUILT_IN_LOG2:
+     case BUILT_IN_LOG10:
+       name[4] = 'd';
+       name[5] = '2';
+       if (el_mode != DFmode
+ 	  || n != 2)
+ 	return NULL_TREE;
+       break;
+ 
+     case BUILT_IN_SINF:
+     case BUILT_IN_COSF:
+     case BUILT_IN_EXPF:
+     case BUILT_IN_POWF:
+     case BUILT_IN_LOGF:
+     case BUILT_IN_LOG2F:
+     case BUILT_IN_LOG10F:
+       name[4] = 's';
+       name[5] = '4';
+       if (el_mode != SFmode
+ 	  || n != 4)
+ 	return NULL_TREE;
+       break;
+     
+     default:
+       return NULL_TREE;
+     }
+ 
+   bname = IDENTIFIER_POINTER (DECL_NAME (implicit_built_in_decls[fn]));
+   sprintf (name + 7, "%s", bname+10);
+ 
+   arity = 0;
+   for (args = DECL_ARGUMENTS (implicit_built_in_decls[fn]); args;
+        args = TREE_CHAIN (args))
+     arity++;
+ 
+   if (arity == 1)
+     fntype = build_function_type_list (type_out, type_in, NULL);
+   else
+     fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+ 
+   /* Build a function declaration for the vectorized function.  */
+   new_fndecl = build_decl (FUNCTION_DECL, get_identifier (name), fntype);
+   TREE_PUBLIC (new_fndecl) = 1;
+   DECL_EXTERNAL (new_fndecl) = 1;
+   DECL_IS_NOVOPS (new_fndecl) = 1;
+   TREE_READONLY (new_fndecl) = 1;
+ 
+   return new_fndecl;
+ }
+ 
+ 
  /* Returns a decl of a function that implements conversion of the
     input vector of type TYPE, or NULL_TREE if it is not available.  */
  

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-08-30 18:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-20 19:57 [PATCH] Update support patch for ACML vectorized intrinsic library Uros Bizjak
2007-08-21 12:41 ` Richard Guenther
2007-08-24 11:03   ` Richard Guenther
2007-08-24 11:13     ` Uros Bizjak
2007-08-24 11:18       ` Richard Guenther
2007-08-24 12:34         ` Paolo Bonzini
2007-08-30 12:02           ` Uros Bizjak
2007-08-30 13:03             ` Richard Guenther
2007-08-30 13:08               ` Uros Bizjak
2007-08-30 15:53               ` Tobias Burnus
2007-08-30 16:08                 ` Richard Guenther
2007-08-30 19:40                   ` Gerald Pfeifer
2007-08-30 15:56               ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2007-08-30 16:59 Uros Bizjak
2007-08-20 15:03 Richard Guenther
2007-08-20 22:36 ` Hans-Peter Nilsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).