public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Carl Love <cel@us.ibm.com>
Cc: Peter Bergner <bergner@vnet.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values
Date: Wed, 7 Jun 2023 17:36:13 +0800	[thread overview]
Message-ID: <a69fbe29-6ca8-da7b-a894-180494c80acd@linux.ibm.com> (raw)
In-Reply-To: <18922c628b853f4ac9620d6b3b91842d8dddbe54.camel@us.ibm.com>

Hi,

on 2023/6/7 03:54, Carl Love wrote:
> On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
>>> GCC maintainers:
>>>
>>> The following patch adds three buitins for inserting and extracting
>>> the
>>> exponent and significand for an IEEE 128-bit floating point
>>> values. 
>>> The builtins are valid for Power 9 and Power 10.  
>>
>> We already have:
>>
>> unsigned long long int scalar_extract_exp (__ieee128 source);
>> unsigned __int128 scalar_extract_sig (__ieee128 source);
>> ieee_128 scalar_insert_exp (unsigned __int128 significand,
>>                             unsigned long long int exponent);
>> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long
>> int exponent);
>>
>> you need to say something about the requirements or the justification
>> for
>> adding more, for this patch itself, some comments are inline below,
>> thanks!
> 
> I implemented the patch based on a request for the builtins.  It didn't
> include any justification so I reached out to Steve Monroe who
> requested the builtins to understand why he wanted them.  Here is his
> reply:
> 
>    Basically there is no clean and performant way to transfer between a
>    vector type and the ieee128 scalar, despite the fact that both
>    reside in vector registers. Also a union transfer does not work
>    correctly on most GCC versions (and will likely break again in the
>    next release). I offer the long sad history of the IBM long double
>    float runtime.

Thanks for clarifying this.  As the proposed changes, I think he meant
to say "Basically there is no clean and performant way to transfer between
a vector type and the scalar **types**". :) Because the proposed changes
are:
  scalar_extract_exp: unsigned long long => vector unsigned long long
  scalar_extract_sig: unsigned __int128  => vector unsigned __int128
  scalar_insert_exp: unsigned __int128 => vector unsigned __int128
                     unsigned long long => vector unsigned long long.

> 
>    Also there are __ieee128 operations that are provided by builtins
>    for POWER9 but are not provided by libgcc (for POWER8).
> 
>    Finally I can prove that a softfloat __ieee128 implementation using
>    VMX integer operations, out-performs the current libgcc
>    implementation using DW GPRs.
> 
>    The details are in the PVECLIB documentation
>    pveclib/vec__f128__ppc.h
> 
> 
>>
>>> The patch has been tested on both Power 9 and Power 10.
>>>
>>> Please let me know if this patch is acceptable for
>>> mainline.  Thanks.
>>>
>>>             Carl 
>>>
>>>
>>> ----------------------
>>> From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00
>>> 2001
>>> From: Carl Love <cel@us.ibm.com>
>>> Date: Wed, 12 Apr 2023 17:46:37 -0400
>>> Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
>>> point values
>>>
>>> Add support for the following builtins:
>>>
>>>  __vector unsigned long long int __builtin_extractf128_exp
>>> (__ieee128);
>>
>> Could you make the name similar to the existing one?  The existing
>> one
>>   
>>   unsigned long long int scalar_extract_exp (__ieee128 source);
>>
>> has nothing like f128 on its name, this variant is just to change the
>> return type to vector type, how about scalar_extract_exp_to_vec?
> 
> I changed the name  __builtin_extractf128_exp  to
> __builtin_scalar_extract_exp_to_vec.
> 
>>
>>>  __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
>>
>> Ditto.
> 
> I changed the name  __builtin_extractf128_sig to
> __builtin_scalar_extract_sig_to_vec.
> 
>>
>>>  __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
>>> 			             __vector unsigned long long);
>>
>> This one can just overload the existing scalar_insert_exp?
> 
> 
> I tried making this one an overloaded version of
> scalar_insert_exp.  However, the overload with the vector arguments
> isn't recognized when I put the overload definition at the end of the
> list of overloads.  When I tried putting the vector version as the
> first overloaded definition, I get an internal error
> on  __builtin_vsx_scalar_insert_exp_q which is has the same arguments
> types but as scalars not vectors.  Best I can tell, there is an issue
> with mixing scalar and vector arguments in an overloaded builtin.

No, it's not due to mixing scalar and vector arguments, I looked into
this and found we have some special handling for this builtin in
altivec_resolve_overloaded_builtin, see the code:

    case RS6000_OVLD_VEC_VSIE:
      {
	machine_mode arg1_mode = TYPE_MODE (types[0]);

	/* If supplied first argument is wider than 64 bits, resolve to
	   128-bit variant of built-in function.  */
	if (GET_MODE_PRECISION (arg1_mode) > 64)
	  {
	    /* If first argument is of float variety, choose variant
	       that expects __ieee128 argument.  Otherwise, expect
	       __int128 argument.  */
	    if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
	      instance_code = RS6000_BIF_VSIEQPF;
	    else
	      instance_code = RS6000_BIF_VSIEQP;
	  }
          ....
 
When the 1st arg's precision is larger than 64, it just picks up
RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for
what you newly added.  Something like (against your v2):

   const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
-    VSIEDP_VULL xsiexpqpf_f128_kf {}
+    VSIEQPV xsiexpqpf_f128_kf {}


@@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
                __int128 argument.  */
             if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
               instance_code = RS6000_BIF_VSIEQPF;
+            else if (VECTOR_MODE_P (arg1_mode))
+              instance_code = RS6000_BIF_VSIEQPV;
             else
               instance_code = RS6000_BIF_VSIEQP;
           }

diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index 102ead9f80b..84743114c8e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4515,8 +4515,8 @@
     VSIEQP
   _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long);
     VSIEQPF
-  _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
-    VSIEDP_VULL
+  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
+    VSIEQPV

> 
> I renamed __builtin_insertf128_exp as
> __builtin_vsx_scalar_insert_exp_vqp which is just the vector version of
>   the existing __builtin_vsx_scalar_insert_exp_qp builtin.
>>
>> gcc/
>>> 	* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>>> 	 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>>> 	builtin definitions.
>>> 	* config/rs6000.md (extractf128_exp_<mode>,
>>> insertf128_exp_<mode>,
>>> 	extractf128_sig_<mode>): Add define_expand for new builtins.
>>> 	(xsxexpqp_f128_<mode>, xsxsigqp_f128_<mode>,
>>> siexpqpf_f128_<mode>):
>>> 	Add define_insn for new builtins.
>>> 	* doc/extend.texi (__builtin_extractf128_exp,
>>> __builtin_extractf128_sig,
>>> 	__builtin_insertf128_exp): Add documentation for new builtins.
>>>
>>> gcc/testsuite/
>>> 	* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
>>> 	* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
>>> 	* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def         |  9 +++
>>>  gcc/config/rs6000/vsx.md                      | 66
>>> ++++++++++++++++++-
>>>  gcc/doc/extend.texi                           | 28 ++++++++
>>>  .../powerpc/bfp/extract-exp-ieee128.c         | 49 ++++++++++++++
>>>  .../powerpc/bfp/extract-sig-ieee128.c         | 56
>>> ++++++++++++++++
>>>  .../powerpc/bfp/insert-exp-ieee128.c          | 58
>>> ++++++++++++++++
>>>  6 files changed, 265 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> exp-ieee128.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> sig-ieee128.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-
>>> exp-ieee128.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 638d0bc72ca..3247a7f7673 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -2876,6 +2876,15 @@
>>>    pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>>>      XL_LEN_R xl_len_r {}
>>>  
>>> +  vull __builtin_extractf128_exp (_Float128);
>>> +    EEXPKF extractf128_exp_kf {}
>>> +
>>> +  vuq __builtin_extractf128_sig (_Float128);
>>> +    ESIGKF extractf128_sig_kf {}
>>> +
>>> +  _Float128 __builtin_insertf128_exp (vuq, vull);
>>> +    IEXPKF_VULL insertf128_exp_kf {}
>>> +
>>
>> Put them to be near its related ones like
>> __builtin_vsx_scalar_extract_expq
>> etc.
>>
> 
> I moved the definitions to be near the other definitions.
> 
>>>  
>>>  ; Builtins requiring hardware support for IEEE-128 floating-point.
>>>  [ieee128-hw]
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 7d845df5c2d..2a9f875ba57 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -369,7 +369,10 @@
>>>     UNSPEC_XXSPLTI32DX
>>>     UNSPEC_XXBLEND
>>>     UNSPEC_XXPERMX
>>> -  ])
>>> +   UNSPEC_EXTRACTEXPIEEE
>>> +   UNSPEC_EXTRACTSIGIEEE
>>> +   UNSPEC_INSERTEXPIEEE
>>
>> These are not necessary, just use the existing UNSPEC_VSX_SXEXPDP
>> etc.
> 
> I removed these, as they are not needed.
> 
>>
>>> +])
>>>  
>>>  (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
>>>  				 UNSPEC_VSX_XVCVBF16SPN])
>>> @@ -4155,6 +4158,38 @@
>>>   "vins<wd>rx %0,%1,%2"
>>>   [(set_attr "type" "vecsimple")])
>>>  
>>> +(define_expand "extractf128_exp_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> +		  UNSPEC_EXTRACTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsxexpqp_f128_<mode> (operands[0], operands[1]));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_expand "insertf128_exp_<mode>"
>>> +  [(set (match_operand:IEEE128 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:V1TI 1
>>> "altivec_register_operand")
>>> +		   (match_operand:V2DI 2 "altivec_register_operand")]
>>> +		  UNSPEC_INSERTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsiexpqpf_f128_<mode> (operands[0], operands[1],
>>> +				        operands[2]));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_expand "extractf128_sig_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> +		  UNSPEC_EXTRACTSIGIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsxsigqp_f128_<mode> (operands[0], operands[1]));
>>> +  DONE;
>>> +})
>>
>> These define_expands are not necessary, the called gen functions from
>> the
>> define_insns can be directly used as bif pattern in rs6000-
>> builtins.def.
> 
> Right, I just mapped directly to the instruction the above were
> calling.
>>
>>> +
>>>  (define_expand "vreplace_elt_<mode>"
>>>    [(set (match_operand:REPLACE_ELT 0 "register_operand")
>>>    (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1
>>> "register_operand")
>>> @@ -5016,6 +5051,15 @@
>>>    "xsxexpqp %0,%1"
>>>    [(set_attr "type" "vecmove")])
>>>  
>>> +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating
>>> point format
>>> +(define_insn "xsxexpqp_f128_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
>>> +	(unspec:V2DI [(match_operand:IEEE128 1
>>> "altivec_register_operand" "v")]
>>> +	 UNSPEC_VSX_SXEXPDP))]
>>> +  "TARGET_P9_VECTOR"
>>> +  "xsxexpqp %0,%1"
>>> +  [(set_attr "type" "vecmove")])
>>
>> I think you can just merge this into the existing xsxexpqp_<mode> by
>> extending
>> with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for unspec:xxx.
>>
>> xxx can be one mode iterator for DI and V2DI.
> 
> 
> OK, I played with this, it is rather sophisticated syntax.  I didn't
> seem to be able to get the syntax correct and thus get all this to
> work.  It also seemed to require updating some case statements for the
> instructions in various files.  I opted to not change this and just
> keep it simple.  Not sure if that is OK or not?
> 

I meant something like:

(define_mode_iterator VSEEQP_DI  [V2DI DI])

;; VSX Scalar Extract Exponent Quad-Precision
(define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>"
  [(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v")
        (unspec:VSEEQP_DI
          [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
          UNSPEC_VSX_SXEXPDP))]
  "TARGET_P9_VECTOR"
  "xsxexpqp %0,%1"
  [(set_attr "type" "vecmove")])

since what you need just one different mode from DI for the operand 0,
using one mode iterator can make you not need to duplicate it with
a different mode.  With above, you can use xsxexpqp_kf_di for the
previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed
__builtin_scalar_extract_exp_to_vec.

BR,
Kewen

  reply	other threads:[~2023-06-07  9:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 15:52 Carl Love
2023-06-05  8:45 ` Kewen.Lin
2023-06-06 19:54   ` [PATCH ver 2] " Carl Love
2023-06-06 19:54   ` [PATCH] " Carl Love
2023-06-07  9:36     ` Kewen.Lin [this message]
2023-06-08 15:21       ` Carl Love
2023-06-08 15:21       ` [PATCH ver 3] " Carl Love
2023-06-13  3:10         ` Kewen.Lin
2023-06-14 16:17           ` Carl Love

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a69fbe29-6ca8-da7b-a894-180494c80acd@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=bergner@vnet.ibm.com \
    --cc=cel@us.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).