From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Carl Love <cel@us.ibm.com>
Cc: Peter Bergner <bergner@vnet.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values
Date: Wed, 7 Jun 2023 17:36:13 +0800 [thread overview]
Message-ID: <a69fbe29-6ca8-da7b-a894-180494c80acd@linux.ibm.com> (raw)
In-Reply-To: <18922c628b853f4ac9620d6b3b91842d8dddbe54.camel@us.ibm.com>
Hi,
on 2023/6/7 03:54, Carl Love wrote:
> On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
>>> GCC maintainers:
>>>
>>> The following patch adds three buitins for inserting and extracting
>>> the
>>> exponent and significand for an IEEE 128-bit floating point
>>> values.
>>> The builtins are valid for Power 9 and Power 10.
>>
>> We already have:
>>
>> unsigned long long int scalar_extract_exp (__ieee128 source);
>> unsigned __int128 scalar_extract_sig (__ieee128 source);
>> ieee_128 scalar_insert_exp (unsigned __int128 significand,
>> unsigned long long int exponent);
>> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long
>> int exponent);
>>
>> you need to say something about the requirements or the justification
>> for
>> adding more, for this patch itself, some comments are inline below,
>> thanks!
>
> I implemented the patch based on a request for the builtins. It didn't
> include any justification so I reached out to Steve Monroe who
> requested the builtins to understand why he wanted them. Here is his
> reply:
>
> Basically there is no clean and performant way to transfer between a
> vector type and the ieee128 scalar, despite the fact that both
> reside in vector registers. Also a union transfer does not work
> correctly on most GCC versions (and will likely break again in the
> next release). I offer the long sad history of the IBM long double
> float runtime.
Thanks for clarifying this. As the proposed changes, I think he meant
to say "Basically there is no clean and performant way to transfer between
a vector type and the scalar **types**". :) Because the proposed changes
are:
scalar_extract_exp: unsigned long long => vector unsigned long long
scalar_extract_sig: unsigned __int128 => vector unsigned __int128
scalar_insert_exp: unsigned __int128 => vector unsigned __int128
unsigned long long => vector unsigned long long.
>
> Also there are __ieee128 operations that are provided by builtins
> for POWER9 but are not provided by libgcc (for POWER8).
>
> Finally I can prove that a softfloat __ieee128 implementation using
> VMX integer operations, out-performs the current libgcc
> implementation using DW GPRs.
>
> The details are in the PVECLIB documentation
> pveclib/vec__f128__ppc.h
>
>
>>
>>> The patch has been tested on both Power 9 and Power 10.
>>>
>>> Please let me know if this patch is acceptable for
>>> mainline. Thanks.
>>>
>>> Carl
>>>
>>>
>>> ----------------------
>>> From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00
>>> 2001
>>> From: Carl Love <cel@us.ibm.com>
>>> Date: Wed, 12 Apr 2023 17:46:37 -0400
>>> Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
>>> point values
>>>
>>> Add support for the following builtins:
>>>
>>> __vector unsigned long long int __builtin_extractf128_exp
>>> (__ieee128);
>>
>> Could you make the name similar to the existing one? The existing
>> one
>>
>> unsigned long long int scalar_extract_exp (__ieee128 source);
>>
>> has nothing like f128 on its name, this variant is just to change the
>> return type to vector type, how about scalar_extract_exp_to_vec?
>
> I changed the name __builtin_extractf128_exp to
> __builtin_scalar_extract_exp_to_vec.
>
>>
>>> __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
>>
>> Ditto.
>
> I changed the name __builtin_extractf128_sig to
> __builtin_scalar_extract_sig_to_vec.
>
>>
>>> __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
>>> __vector unsigned long long);
>>
>> This one can just overload the existing scalar_insert_exp?
>
>
> I tried making this one an overloaded version of
> scalar_insert_exp. However, the overload with the vector arguments
> isn't recognized when I put the overload definition at the end of the
> list of overloads. When I tried putting the vector version as the
> first overloaded definition, I get an internal error
> on __builtin_vsx_scalar_insert_exp_q which is has the same arguments
> types but as scalars not vectors. Best I can tell, there is an issue
> with mixing scalar and vector arguments in an overloaded builtin.
No, it's not due to mixing scalar and vector arguments, I looked into
this and found we have some special handling for this builtin in
altivec_resolve_overloaded_builtin, see the code:
case RS6000_OVLD_VEC_VSIE:
{
machine_mode arg1_mode = TYPE_MODE (types[0]);
/* If supplied first argument is wider than 64 bits, resolve to
128-bit variant of built-in function. */
if (GET_MODE_PRECISION (arg1_mode) > 64)
{
/* If first argument is of float variety, choose variant
that expects __ieee128 argument. Otherwise, expect
__int128 argument. */
if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
instance_code = RS6000_BIF_VSIEQPF;
else
instance_code = RS6000_BIF_VSIEQP;
}
....
When the 1st arg's precision is larger than 64, it just picks up
RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for
what you newly added. Something like (against your v2):
const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
- VSIEDP_VULL xsiexpqpf_f128_kf {}
+ VSIEQPV xsiexpqpf_f128_kf {}
@@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
__int128 argument. */
if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
instance_code = RS6000_BIF_VSIEQPF;
+ else if (VECTOR_MODE_P (arg1_mode))
+ instance_code = RS6000_BIF_VSIEQPV;
else
instance_code = RS6000_BIF_VSIEQP;
}
diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index 102ead9f80b..84743114c8e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4515,8 +4515,8 @@
VSIEQP
_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long);
VSIEQPF
- _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
- VSIEDP_VULL
+ _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
+ VSIEQPV
>
> I renamed __builtin_insertf128_exp as
> __builtin_vsx_scalar_insert_exp_vqp which is just the vector version of
> the existing __builtin_vsx_scalar_insert_exp_qp builtin.
>>
>> gcc/
>>> * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>>> __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>>> builtin definitions.
>>> * config/rs6000.md (extractf128_exp_<mode>,
>>> insertf128_exp_<mode>,
>>> extractf128_sig_<mode>): Add define_expand for new builtins.
>>> (xsxexpqp_f128_<mode>, xsxsigqp_f128_<mode>,
>>> siexpqpf_f128_<mode>):
>>> Add define_insn for new builtins.
>>> * doc/extend.texi (__builtin_extractf128_exp,
>>> __builtin_extractf128_sig,
>>> __builtin_insertf128_exp): Add documentation for new builtins.
>>>
>>> gcc/testsuite/
>>> * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
>>> * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
>>> * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
>>> ---
>>> gcc/config/rs6000/rs6000-builtins.def | 9 +++
>>> gcc/config/rs6000/vsx.md | 66
>>> ++++++++++++++++++-
>>> gcc/doc/extend.texi | 28 ++++++++
>>> .../powerpc/bfp/extract-exp-ieee128.c | 49 ++++++++++++++
>>> .../powerpc/bfp/extract-sig-ieee128.c | 56
>>> ++++++++++++++++
>>> .../powerpc/bfp/insert-exp-ieee128.c | 58
>>> ++++++++++++++++
>>> 6 files changed, 265 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> exp-ieee128.c
>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> sig-ieee128.c
>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-
>>> exp-ieee128.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 638d0bc72ca..3247a7f7673 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -2876,6 +2876,15 @@
>>> pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>>> XL_LEN_R xl_len_r {}
>>>
>>> + vull __builtin_extractf128_exp (_Float128);
>>> + EEXPKF extractf128_exp_kf {}
>>> +
>>> + vuq __builtin_extractf128_sig (_Float128);
>>> + ESIGKF extractf128_sig_kf {}
>>> +
>>> + _Float128 __builtin_insertf128_exp (vuq, vull);
>>> + IEXPKF_VULL insertf128_exp_kf {}
>>> +
>>
>> Put them to be near its related ones like
>> __builtin_vsx_scalar_extract_expq
>> etc.
>>
>
> I moved the definitions to be near the other definitions.
>
>>>
>>> ; Builtins requiring hardware support for IEEE-128 floating-point.
>>> [ieee128-hw]
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 7d845df5c2d..2a9f875ba57 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -369,7 +369,10 @@
>>> UNSPEC_XXSPLTI32DX
>>> UNSPEC_XXBLEND
>>> UNSPEC_XXPERMX
>>> - ])
>>> + UNSPEC_EXTRACTEXPIEEE
>>> + UNSPEC_EXTRACTSIGIEEE
>>> + UNSPEC_INSERTEXPIEEE
>>
>> These are not necessary, just use the existing UNSPEC_VSX_SXEXPDP
>> etc.
>
> I removed these, as they are not needed.
>
>>
>>> +])
>>>
>>> (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
>>> UNSPEC_VSX_XVCVBF16SPN])
>>> @@ -4155,6 +4158,38 @@
>>> "vins<wd>rx %0,%1,%2"
>>> [(set_attr "type" "vecsimple")])
>>>
>>> +(define_expand "extractf128_exp_<mode>"
>>> + [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> + (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> + UNSPEC_EXTRACTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> + emit_insn (gen_xsxexpqp_f128_<mode> (operands[0], operands[1]));
>>> + DONE;
>>> +})
>>> +
>>> +(define_expand "insertf128_exp_<mode>"
>>> + [(set (match_operand:IEEE128 0 "altivec_register_operand")
>>> + (unspec:IEEE128 [(match_operand:V1TI 1
>>> "altivec_register_operand")
>>> + (match_operand:V2DI 2 "altivec_register_operand")]
>>> + UNSPEC_INSERTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> + emit_insn (gen_xsiexpqpf_f128_<mode> (operands[0], operands[1],
>>> + operands[2]));
>>> + DONE;
>>> +})
>>> +
>>> +(define_expand "extractf128_sig_<mode>"
>>> + [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> + (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> + UNSPEC_EXTRACTSIGIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> + emit_insn (gen_xsxsigqp_f128_<mode> (operands[0], operands[1]));
>>> + DONE;
>>> +})
>>
>> These define_expands are not necessary, the called gen functions from
>> the
>> define_insns can be directly used as bif pattern in rs6000-
>> builtins.def.
>
> Right, I just mapped directly to the instruction the above were
> calling.
>>
>>> +
>>> (define_expand "vreplace_elt_<mode>"
>>> [(set (match_operand:REPLACE_ELT 0 "register_operand")
>>> (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1
>>> "register_operand")
>>> @@ -5016,6 +5051,15 @@
>>> "xsxexpqp %0,%1"
>>> [(set_attr "type" "vecmove")])
>>>
>>> +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating
>>> point format
>>> +(define_insn "xsxexpqp_f128_<mode>"
>>> + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
>>> + (unspec:V2DI [(match_operand:IEEE128 1
>>> "altivec_register_operand" "v")]
>>> + UNSPEC_VSX_SXEXPDP))]
>>> + "TARGET_P9_VECTOR"
>>> + "xsxexpqp %0,%1"
>>> + [(set_attr "type" "vecmove")])
>>
>> I think you can just merge this into the existing xsxexpqp_<mode> by
>> extending
>> with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for unspec:xxx.
>>
>> xxx can be one mode iterator for DI and V2DI.
>
>
> OK, I played with this, it is rather sophisticated syntax. I didn't
> seem to be able to get the syntax correct and thus get all this to
> work. It also seemed to require updating some case statements for the
> instructions in various files. I opted to not change this and just
> keep it simple. Not sure if that is OK or not?
>
I meant something like:
(define_mode_iterator VSEEQP_DI [V2DI DI])
;; VSX Scalar Extract Exponent Quad-Precision
(define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>"
[(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v")
(unspec:VSEEQP_DI
[(match_operand:IEEE128 1 "altivec_register_operand" "v")]
UNSPEC_VSX_SXEXPDP))]
"TARGET_P9_VECTOR"
"xsxexpqp %0,%1"
[(set_attr "type" "vecmove")])
since what you need just one different mode from DI for the operand 0,
using one mode iterator can make you not need to duplicate it with
a different mode. With above, you can use xsxexpqp_kf_di for the
previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed
__builtin_scalar_extract_exp_to_vec.
BR,
Kewen
next prev parent reply other threads:[~2023-06-07 9:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-02 15:52 Carl Love
2023-06-05 8:45 ` Kewen.Lin
2023-06-06 19:54 ` [PATCH ver 2] " Carl Love
2023-06-06 19:54 ` [PATCH] " Carl Love
2023-06-07 9:36 ` Kewen.Lin [this message]
2023-06-08 15:21 ` Carl Love
2023-06-08 15:21 ` [PATCH ver 3] " Carl Love
2023-06-13 3:10 ` Kewen.Lin
2023-06-14 16:17 ` Carl Love
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a69fbe29-6ca8-da7b-a894-180494c80acd@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=bergner@vnet.ibm.com \
--cc=cel@us.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).