From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 037733851C3C for ; Thu, 3 Dec 2020 19:55:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 037733851C3C Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0B3Jl2YF187777; Thu, 3 Dec 2020 14:55:12 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3576fv8682-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Dec 2020 14:55:11 -0500 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0B3Jl6kO187934; Thu, 3 Dec 2020 14:55:11 -0500 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 3576fv867r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Dec 2020 14:55:11 -0500 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0B3Jqp8m020165; Thu, 3 Dec 2020 19:55:10 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma04dal.us.ibm.com with ESMTP id 353e69xjsa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Dec 2020 19:55:10 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0B3Jt96t65405228 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Dec 2020 19:55:09 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5958E2805E; Thu, 3 Dec 2020 19:55:09 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8663B28059; Thu, 3 Dec 2020 19:55:08 +0000 (GMT) Received: from lexx (unknown [9.211.149.1]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 3 Dec 2020 19:55:08 +0000 (GMT) Message-ID: Subject: Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions From: will schmidt To: Carl Love , Segher Boessenkool , Pat Haugen Cc: GCC Patches , David Edelsohn Date: Thu, 03 Dec 2020 13:55:07 -0600 In-Reply-To: <95cbf0665876f3828a6266703aa6967b089716b6.camel@us.ibm.com> References: <0801554741c7f11d26ded3a2243462cb2790c215.camel@us.ibm.com> <20201119232651.GA2672@gate.crashing.org> <5d20abfbfd6e1edd7712305449e6053df8ca3043.camel@us.ibm.com> <81d86281-30d4-4d5a-b5d1-e961aa3c6573@linux.ibm.com> <20201126023036.GD2672@gate.crashing.org> <95cbf0665876f3828a6266703aa6967b089716b6.camel@us.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-12-03_11:2020-12-03, 2020-12-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 spamscore=0 clxscore=1015 mlxlogscore=999 phishscore=0 bulkscore=0 mlxscore=0 adultscore=0 suspectscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012030110 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2020 19:55:17 -0000 On Tue, 2020-12-01 at 15:48 -0800, Carl Love via Gcc-patches wrote: > Segher, Pat: > > I have updated the patch to address the comments below. > > On Wed, 2020-11-25 at 20:30 -0600, Segher Boessenkool wrote: > > On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote: > > > On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote: > > > > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote: > > > > > +(define_insn "modu_" > > > > > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > > > > > + (umod:VIlong (match_operand:VIlong 1 > > > > > "vsx_register_operand" "v") > > > > > + (match_operand:VIlong 2 > > > > > "vsx_register_operand" "v")))] > > > > > + "TARGET_POWER10" > > > > > + "vmodu %0,%1,%2" > > > > > + [(set_attr "type" "vecdiv") > > > > > + (set_attr "size" "128")]) > > > > > > > > We should only be setting "size" "128" for instructions that > > > > operate on scalar 128-bit data items (i.e. 'vdivesq' etc). > > > > Since > > > > the above insns are either V2DI/V4SI (ala VIlong > > > > mode_iterator), > > > > they shouldn't be marked as size 128. If you want to set the > > > > size > > > > based on mode, (set_attr "size" "") should do the trick I > > > > believe. > > > > > > Well, after you update "(define_mode_attr bits" in rs6000.md for > > > V2DI/V4SI. > > > > So far, was only used for scalars. I agree that for vectors > > it > > makes most sense to do the element size (because the vector size > > always > > is 128 bits, and for scheduling the element size can matter). But, > > the > > definitions of and now say > > > > ;; What data size does this instruction work on? > > ;; This is used for insert, mul and others as necessary. > > (define_attr "size" "8,16,32,64,128" (const_string "32")) > > > > and > > > > ;; How many bits in this mode? > > (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64") > > (SF "32") (DF "64")]) > > so those need a bit of update as well then :-) > > I set the size based on the vector element size, extendeing the > define_mode_attr bits definition. Please take a look at the updated > patch. Hopefully I have this all correct. Thanks. > > Note, I retested the updated patch on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > powerpc64le-unknown-linux-gnu (Power 10 LE) > > Thanks for the help. > > Carl > Continued from yesterday.. Thanks -Will > ------------------------------------------------------------------- > ---- > rs6000, vector integer multiply/divide/modulo instructions > > 2020-12-01 Carl Love > > gcc/ > * config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, > vec_mod): New > defines. > * config/rs6000/altivec.md (VIlong): Move define to file > vsx.md. > * config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI, > DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI, > DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI, > MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI): > Add builtin define. > (MULH, DIVE, MOD): Add new BU_P10_OVERLOAD_2 definitions. > * config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV, > P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, > P10_BUILTIN_VEC_VMULH): No mentions of these three P10_BUILTIN_VEC_* in patch below. > New overloaded definitions. > (builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI, > P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI, > P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI, > P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI, > P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case > statement for builtins. > * config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute. just VIlong Maybe s/define_mod_attribute/define_mod_attr / ? > (UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs. > (vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement. I don't see vsx_mul_v2di or vsx_udiv_v2di in the patch contexts, Looks OK per a look at trunks vsx.md. > (dives_, diveu_, div3, uvdiv3, > mods_, modu_, mulhs_, mulhu_, > mulv2di3): > Add define_insn, mode is VIlong. > * doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, > vec_mod): Add > builtin descriptions. > > gcc/testsuite/ > * gcc.target/powerpc/builtins-1-p10-runnable.c: New test file. > --- > gcc/config/rs6000/altivec.h | 5 + > gcc/config/rs6000/altivec.md | 2 - > gcc/config/rs6000/rs6000-builtin.def | 22 + > gcc/config/rs6000/rs6000-call.c | 49 +++ > gcc/config/rs6000/rs6000.md | 3 +- > gcc/config/rs6000/vsx.md | 213 +++++++--- > gcc/doc/extend.texi | 120 ++++++ > .../powerpc/builtins-1-p10-runnable.c | 398 > ++++++++++++++++++ > 8 files changed, 759 insertions(+), 53 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10- > runnable.c > > diff --git a/gcc/config/rs6000/altivec.h > b/gcc/config/rs6000/altivec.h > index e1884f51bd8..12ccbd2fc2f 100644 > --- a/gcc/config/rs6000/altivec.h > +++ b/gcc/config/rs6000/altivec.h > @@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle, > #define vec_strir_p(a) __builtin_vec_strir_p (a) > #define vec_stril_p(a) __builtin_vec_stril_p (a) > > +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b)) > +#define vec_div(a, b) __builtin_vec_div ((a), (b)) > +#define vec_dive(a, b) __builtin_vec_dive ((a), (b)) > +#define vec_mod(a, b) __builtin_vec_mod ((a), (b)) > + > /* VSX Mask Manipulation builtin. */ > #define vec_genbm __builtin_vec_mtvsrbm > #define vec_genhm __builtin_vec_mtvsrhm > diff --git a/gcc/config/rs6000/altivec.md > b/gcc/config/rs6000/altivec.md > index 6a6ce0f84ed..f10f1cdd8a7 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -193,8 +193,6 @@ > > ;; Short vec int modes > (define_mode_iterator VIshort [V8HI V16QI]) > -;; Longer vec int modes for rotate/mask ops > -(define_mode_iterator VIlong [V2DI V4SI]) > ;; Vec float modes > (define_mode_iterator VF [V4SF]) > ;; Vec modes, pity mode iterators are not composable > diff --git a/gcc/config/rs6000/rs6000-builtin.def > b/gcc/config/rs6000/rs6000-builtin.def > index 47b1f74e616..e9ea2114615 100644 > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -2883,6 +2883,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, > vsrdb_v8hi) > BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si) > BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di) > > +BU_P10V_AV_2 (DIVES_V4SI, "vdivesw", CONST, dives_v4si) > +BU_P10V_AV_2 (DIVES_V2DI, "vdivesd", CONST, dives_v2di) > +BU_P10V_AV_2 (DIVEU_V4SI, "vdiveuw", CONST, diveu_v4si) > +BU_P10V_AV_2 (DIVEU_V2DI, "vdiveud", CONST, diveu_v2di) > +BU_P10V_AV_2 (DIVS_V4SI, "vdivsw", CONST, divv4si3) > +BU_P10V_AV_2 (DIVS_V2DI, "vdivsd", CONST, divv2di3) > +BU_P10V_AV_2 (DIVU_V4SI, "vdivuw", CONST, udivv4si3) > +BU_P10V_AV_2 (DIVU_V2DI, "vdivud", CONST, udivv2di3) > +BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, mods_v2di) > +BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, mods_v4si) > +BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, modu_v2di) > +BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, modu_v4si) > +BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di) > +BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si) > +BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di) > +BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si) > +BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3) > + > BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, > xxspltiw_v4si) > BU_P10V_VSX_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, > xxspltiw_v4sf) > > @@ -2958,6 +2976,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p") > BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros") > BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones") > > +BU_P10_OVERLOAD_2 (MULH, "mulh") > +BU_P10_OVERLOAD_2 (DIVE, "dive") > +BU_P10_OVERLOAD_2 (MOD, "mod") > + > > BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm") > BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm") > diff --git a/gcc/config/rs6000/rs6000-call.c > b/gcc/config/rs6000/rs6000-call.c > index 45bc048b5c7..5b310ea9039 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -1069,6 +1069,35 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, > { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI, > RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > RS6000_BTI_unsigned_V2DI, 0 }, > + > + { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVS_V4SI, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, > + { VSX_BUILTIN_VEC_DIV, P10V_BUILTIN_DIVU_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, 0 }, Is there already/should there be entries here for DIVU_V2DI ? (and DIVS_V2SI?) There was/is a case statement added for DIVU_V2DI noted below. > + > + { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V4SI, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, > + { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, 0 }, > + { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVES_V2DI, > + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, > + { P10_BUILTIN_VEC_DIVE, P10V_BUILTIN_DIVEU_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > + RS6000_BTI_unsigned_V2DI, 0 }, > + > + { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V4SI, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, > + { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, 0 }, > + { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODS_V2DI, > + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, > + { P10_BUILTIN_VEC_MOD, P10V_BUILTIN_MODU_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > + RS6000_BTI_unsigned_V2DI, 0 }, > + > { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVSXDDP, > RS6000_BTI_V2DF, RS6000_BTI_V2DI, 0, 0 }, > { VSX_BUILTIN_VEC_DOUBLE, VSX_BUILTIN_XVCVUXDDP, > @@ -1909,6 +1938,17 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, > RS6000_BTI_unsigned_V16QI, 0 }, > { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB, > RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, > RS6000_BTI_bool_V16QI, 0 }, > + { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V4SI, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, > + { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, 0 }, > + { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHS_V2DI, > + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, > + { P10_BUILTIN_VEC_MULH, P10V_BUILTIN_MULHU_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > + RS6000_BTI_unsigned_V2DI, 0 }, > + > { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB, > RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, > RS6000_BTI_unsigned_V16QI, 0 }, > { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB, > @@ -14438,6 +14478,15 @@ builtin_function_type (machine_mode > mode_ret, machine_mode mode_arg0, > case P10V_BUILTIN_XXGENPCVM_V8HI: > case P10V_BUILTIN_XXGENPCVM_V4SI: > case P10V_BUILTIN_XXGENPCVM_V2DI: > + case P10V_BUILTIN_DIVEU_V4SI: > + case P10V_BUILTIN_DIVEU_V2DI: > + case P10V_BUILTIN_DIVU_V4SI: > + case P10V_BUILTIN_DIVU_V2DI: ^ this case statment may be missing a table entry above. (Or it's part of a previous/subsequent patch). > + case P10V_BUILTIN_MODU_V2DI: > + case P10V_BUILTIN_MODU_V4SI: > + case P10V_BUILTIN_MULHU_V2DI: > + case P10V_BUILTIN_MULHU_V4SI: > + case P10V_BUILTIN_MULLD_V2DI: I didn't notice an entry adding MULLD_V2DI in the tables above. > h.uns_p[0] = 1; > h.uns_p[1] = 1; > h.uns_p[2] = 1; > diff --git a/gcc/config/rs6000/rs6000.md > b/gcc/config/rs6000/rs6000.md > index b89990f46bf..1575cf54580 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -670,7 +670,8 @@ > > ;; How many bits in this mode? > (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64") > - (SF "32") (DF "64")]) > + (SF "32") (DF "64") > + (V4SI "32") (V2DI "64")]) > > ; DImode bits > (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")]) ok > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 947631d83ee..0cc202e7c74 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -267,6 +267,10 @@ > (define_mode_iterator VSX_MM [V16QI V8HI V4SI V2DI V1TI]) > (define_mode_iterator VSX_MM4 [V16QI V8HI V4SI V2DI]) > > +;; Longer vec int modes for rotate/mask ops > +;; and Vector Integer Multiply/Divide/Modulo Instructions > +(define_mode_iterator VIlong [V2DI V4SI]) > + > ;; Constants for creating unspecs > (define_c_enum "unspec" > [UNSPEC_VSX_CONCAT > @@ -363,8 +367,11 @@ > UNSPEC_INSERTR > UNSPEC_REPLACE_ELT > UNSPEC_REPLACE_UN > + UNSPEC_VDIVES > + UNSPEC_VDIVEU > ]) > > + > (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 > UNSPEC_VSX_XVCVBF16SPN]) > > @@ -1623,28 +1630,35 @@ > rtx op0 = operands[0]; > rtx op1 = operands[1]; > rtx op2 = operands[2]; > - rtx op3 = gen_reg_rtx (DImode); > - rtx op4 = gen_reg_rtx (DImode); > - rtx op5 = gen_reg_rtx (DImode); > - emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); > - emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); > - if (TARGET_POWERPC64) > - emit_insn (gen_muldi3 (op5, op3, op4)); > - else > - { > - rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false); > - emit_move_insn (op5, ret); > - } > - emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); > - emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); > - if (TARGET_POWERPC64) > - emit_insn (gen_muldi3 (op3, op3, op4)); > + > + if (TARGET_POWER10) > + emit_insn (gen_mulv2di3 (op0, op1, op2) ); > + > else > { > - rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false); > - emit_move_insn (op3, ret); > + rtx op3 = gen_reg_rtx (DImode); > + rtx op4 = gen_reg_rtx (DImode); > + rtx op5 = gen_reg_rtx (DImode); > + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); > + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); > + if (TARGET_POWERPC64) > + emit_insn (gen_muldi3 (op5, op3, op4)); > + else > + { > + rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false); > + emit_move_insn (op5, ret); > + } > + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); > + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); > + if (TARGET_POWERPC64) > + emit_insn (gen_muldi3 (op3, op3, op4)); > + else > + { > + rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false); > + emit_move_insn (op3, ret); > + } > + emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); > } > - emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); ok > DONE; > } > [(set_attr "type" "mul")]) > @@ -1718,37 +1732,47 @@ > rtx op0 = operands[0]; > rtx op1 = operands[1]; > rtx op2 = operands[2]; > - rtx op3 = gen_reg_rtx (DImode); > - rtx op4 = gen_reg_rtx (DImode); > - rtx op5 = gen_reg_rtx (DImode); > - emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); > - emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); > - if (TARGET_POWERPC64) > - emit_insn (gen_udivdi3 (op5, op3, op4)); > - else > - { > - rtx libfunc = optab_libfunc (udiv_optab, DImode); > - rtx target = emit_library_call_value (libfunc, > - op5, LCT_NORMAL, DImode, > - op3, DImode, > - op4, DImode); > - emit_move_insn (op5, target); > - } > - emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); > - emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); > - if (TARGET_POWERPC64) > - emit_insn (gen_udivdi3 (op3, op3, op4)); > - else > - { > - rtx libfunc = optab_libfunc (udiv_optab, DImode); > - rtx target = emit_library_call_value (libfunc, > - op3, LCT_NORMAL, DImode, > - op3, DImode, > - op4, DImode); > - emit_move_insn (op3, target); > - } > - emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); > - DONE; > + > + if (TARGET_POWER10) > + emit_insn (gen_udivv2di3 (op0, op1, op2) ); > + unnecessary blank line ? > + else > + { > + rtx op3 = gen_reg_rtx (DImode); > + rtx op4 = gen_reg_rtx (DImode); > + rtx op5 = gen_reg_rtx (DImode); > + > + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); > + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); > + > + if (TARGET_POWERPC64) > + emit_insn (gen_udivdi3 (op5, op3, op4)); > + else > + { > + rtx libfunc = optab_libfunc (udiv_optab, DImode); > + rtx target = emit_library_call_value (libfunc, > + op5, LCT_NORMAL, DImode, > + op3, DImode, > + op4, DImode); > + emit_move_insn (op5, target); > + } > + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); > + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); > + > + if (TARGET_POWERPC64) > + emit_insn (gen_udivdi3 (op3, op3, op4)); > + else > + { > + rtx libfunc = optab_libfunc (udiv_optab, DImode); > + rtx target = emit_library_call_value (libfunc, > + op3, LCT_NORMAL, DImode, > + op3, DImode, > + op4, DImode); > + emit_move_insn (op3, target); > + } > + emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); > + } > + DONE; > } > [(set_attr "type" "div")]) > ok. > @@ -6104,3 +6128,92 @@ > "TARGET_POWER10" > "vexpandm %0,%1" > [(set_attr "type" "vecsimple")]) > + > +(define_insn "dives_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > + UNSPEC_VDIVES))] > + "TARGET_POWER10" > + "vdives %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "diveu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > + UNSPEC_VDIVEU))] > + "TARGET_POWER10" > + "vdiveu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "div3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivs %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "udiv3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "mods_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmods %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "modu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmodu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "")]) > + > +(define_insn "mulhs_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (mult:VIlong (ashiftrt > + (match_operand:VIlong 1 "vsx_register_operand" "v") > + (const_int 32)) > + (ashiftrt > + (match_operand:VIlong 2 "vsx_register_operand" "v") > + (const_int 32))))] > + "TARGET_POWER10" > + "vmulhs %0,%1,%2" > + [(set_attr "type" "veccomplex")]) > + > +(define_insn "mulhu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (us_mult:VIlong (ashiftrt > + (match_operand:VIlong 1 "vsx_register_operand" "v") > + (const_int 32)) > + (ashiftrt > + (match_operand:VIlong 2 "vsx_register_operand" "v") > + (const_int 32))))] > + "TARGET_POWER10" > + "vmulhu %0,%1,%2" > + [(set_attr "type" "veccomplex")]) > + > +;; Vector multiply low double word > +(define_insn "mulv2di3" > + [(set (match_operand:V2DI 0 "vsx_register_operand" "=v") > + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v") > + (match_operand:V2DI 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmulld %0,%1,%2" > + [(set_attr "type" "veccomplex")]) I've just skimmed over the rest. Nothing jumped out at me in the documentation or testcases below. Thanks, -Will > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 23ede966bae..e20abd8f1f5 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -21568,6 +21568,126 @@ integer value between 0 and 255 inclusive. > @exdent vector unsigned int vec_genpcvm (vector unsigned long long int, > const int) > @end smallexample > + > +Vector Integer Multiply/Divide/Modulo > + > +@smallexample > +@exdent vector signed int > +@exdent vec_mulh (vector signed int a, vector signed int b) > +@exdent vector unsigned int > +@exdent vec_mulh (vector unsigned int a, vector unsigned int b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 3, do the following. The integer > +value in word element @code{i} of a is multiplied by the integer value in word > +element @code{i} of b. The high-order 32 bits of the 64-bit product are placed > +into word element @code{i} of the vector returned. > + > +@smallexample > +@exdent vector signed long long > +@exdent vec_mulh (vector signed long long a, vector signed long long b) > +@exdent vector unsigned long long > +@exdent vec_mulh (vector unsigned long long a, vector unsigned long long b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 1, do the following. The integer > +value in doubleword element @code{i} of a is multiplied by the integer value in > +doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product > +are placed into doubleword element @code{i} of the vector returned. > + > +@smallexample > +@exdent vector unsigned long long > +@exdent vec_mul (vector unsigned long long a, vector unsigned long long b) > +@exdent vector signed long long > +@exdent vec_mul (vector signed long long a, vector signed long long b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 1, do the following. The integer > +value in doubleword element @code{i} of a is multiplied by the integer value in > +doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product > +are placed into doubleword element @code{i} of the vector returned. > + > +@smallexample > +@exdent vector signed int > +@exdent vec_div (vector signed int a, vector signed int b) > +@exdent vector unsigned int > +@exdent vec_div (vector unsigned int a, vector unsigned int b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 3, do the following. The integer in > +word element @code{i} of a is divided by the integer in word element @code{i} > +of b. The unique integer quotient is placed into the word element @code{i} of > +the vector returned. If an attempt is made to perform any of the divisions > + ÷ 0 then the quotient is undefined. > + > +@smallexample > +@exdent vector signed long long > +@exdent vec_div (vector signed long long a, vector signed long long b) > +@exdent vector unsigned long long > +@exdent vec_div (vector unsigned long long a, vector unsigned long long b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 1, do the following. The integer in > +doubleword element @code{i} of a is divided by the integer in doubleword > +element @code{i} of b. The unique integer quotient is placed into the > +doubleword element @code{i} of the vector returned. If an attempt is made to > +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or ÷ 0 then > +the quotient is undefined. > + > +@smallexample > +@exdent vector signed int > +@exdent vec_dive (vector signed int a, vector signed int b) > +@exdent vector unsigned int > +@exdent vec_dive (vector unsigned int a, vector unsigned int b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 3, do the following. The integer in > +word element @code{i} of a is shifted left by 32 bits, then divided by the > +integer in word element @code{i} of b. The unique integer quotient is placed > +into the word element @code{i} of the vector returned. If the quotient cannot > +be represented in 32 bits, or if an attempt is made to perform any of the > +divisions ÷ 0 then the quotient is undefined. > + > +@smallexample > +@exdent vector signed long long > +@exdent vec_dive (vector signed long long a, vector signed long long b) > +@exdent vector unsigned long long > +@exdent vec_dive (vector unsigned long long a, vector unsigned long long b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 1, do the following. The integer in > +doubleword element @code{i} of a is shifted left by 64 bits, then divided by > +the integer in doubleword element @code{i} of b. The unique integer quotient is > +placed into the doubleword element @code{i} of the vector returned. If the > +quotient cannot be represented in 64 bits, or if an attempt is made to perform > + ÷ 0 then the quotient is undefined. > + > +@smallexample > +@exdent vector signed int > +@exdent vec_mod (vector signed int a, vector signed int b) > +@exdent vector unsigned int > +@exdent vec_mod (vector unsigned int a, vector unsigned int b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 3, do the following. The integer in > +word element @code{i} of a is divided by the integer in word element @code{i} > +of b. The unique integer remainder is placed into the word element @code{i} of > +the vector returned. If an attempt is made to perform any of the divisions > +0x8000_0000 ÷ -1 or ÷ 0 then the remainder is undefined. > + > +@smallexample > +@exdent vector signed long long > +@exdent vec_mod (vector signed long long a, vector signed long long b) > +@exdent vector unsigned long long > +@exdent vec_mod (vector unsigned long long a, vector unsigned long long b) > +@end smallexample > + > +For each integer value @code{i} from 0 to 1, do the following. The integer in > +doubleword element @code{i} of a is divided by the integer in doubleword > +element @code{i} of b. The unique integer remainder is placed into the > +doubleword element @code{i} of the vector returned. If an attempt is made to > +perform ÷ 0 then the remainder is undefined. > + > Generate PCV from specified Mask size, as if implemented by the > @code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where > immediate value is either 0, 1, 2 or 3. > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c > new file mode 100644 > index 00000000000..222c8b3a409 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c > @@ -0,0 +1,398 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target power10_hw } */ > +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > + > +/* { dg-final { scan-assembler-times {\mvdivsw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdivuw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmodsw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmoduw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmodsd\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmodud\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmulhsd\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmulhud\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 } } */ > + > +#include > +#include > +#include > +#include > + > +#define DEBUG 0 > + > +#ifdef DEBUG > +#include > +#endif > + > +void abort (void); > + > +int main() > + { > + int i; > + vector int i_arg1, i_arg2; > + vector unsigned int u_arg1, u_arg2; > + vector long long int d_arg1, d_arg2; > + vector long long unsigned int ud_arg1, ud_arg2; > + > + vector int vec_i_expected, vec_i_result; > + vector unsigned int vec_u_expected, vec_u_result; > + vector long long int vec_d_expected, vec_d_result; > + vector long long unsigned int vec_ud_expected, vec_ud_result; > + > + /* Signed word divide */ > + i_arg1 = (vector int){ 20, 40, 60, 80}; > + i_arg2 = (vector int){ 2, 2, 2, 2}; > + vec_i_expected = (vector int){10, 20, 30, 40}; > + > + vec_i_result = vec_div (i_arg1, i_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_i_expected[i] != vec_i_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_div signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_i_result[i], i, vec_i_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned word divide */ > + u_arg1 = (vector unsigned int){ 20, 40, 60, 80}; > + u_arg2 = (vector unsigned int){ 2, 2, 2, 2}; > + vec_u_expected = (vector unsigned int){10, 20, 30, 40}; > + > + vec_u_result = vec_div (u_arg1, u_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_u_expected[i] != vec_u_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_div unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_u_result[i], i, vec_u_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed double word divide */ > + d_arg1 = (vector long long){ 24, 68}; > + d_arg2 = (vector long long){ 2, 2}; > + vec_d_expected = (vector long long){12, 34}; > + > + vec_d_result = vec_div (d_arg1, d_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_d_expected[i] != vec_d_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_div signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_d_result[i], i, vec_d_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned double word divide */ > + ud_arg1 = (vector unsigned long long){ 24, 68}; > + ud_arg2 = (vector unsigned long long){ 2, 2}; > + vec_ud_expected = (vector unsigned long long){12, 34}; > + > + vec_ud_result = vec_div (ud_arg1, ud_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_ud_expected[i] != vec_ud_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_div unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_ud_result[i], i, vec_ud_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Divide Extended signed word result = (arg1 << 32)/arg2 */ > + i_arg1 = (vector int){ 2, 4, 6, 8}; > + i_arg2 = (vector int){ 2048, 2048, 2048, 2048}; > + vec_i_expected = (vector int){4194304, 8388608, 12582912, 16777216}; > + > + vec_i_result = vec_dive (i_arg1, i_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_i_expected[i] != vec_i_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_dive signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_i_result[i], i, vec_i_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Divide Extended unsigned word result = (arg1 << 32)/arg2 */ > + u_arg1 = (vector unsigned int){ 2, 4, 6, 8}; > + u_arg2 = (vector unsigned int){ 2048, 2048, 2048, 2048}; > + vec_u_expected = (vector unsigned int){4194304, 8388608, > + 12582912, 16777216}; > + > + vec_u_result = vec_dive (u_arg1, u_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_u_expected[i] != vec_u_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_dive unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_u_result[i], i, vec_u_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Divide Extended double signed esult = (arg1 << 64)/arg2 */ > + d_arg1 = (vector long long int){ 2, 4}; > + d_arg2 = (vector long long int){ 4294967296, 4294967296}; > + > + vec_d_expected = (vector long long int){8589934592, 17179869184}; > + > + vec_d_result = vec_dive (d_arg1, d_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_d_expected[i] != vec_d_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_dive signed result[%d] = %lld != " > + "expected[%d] = %lld\n", > + i, vec_d_result[i], i, vec_d_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Divide Extended double unsigned result = (arg1 << 64)/arg2 */ > + ud_arg1 = (vector long long unsigned int){ 2, 4}; > + ud_arg2 = (vector long long unsigned int){ 4294967296, 4294967296}; > + > + vec_ud_expected = (vector long long unsigned int){8589934592, > + 17179869184}; > + > + vec_ud_result = vec_dive (ud_arg1, ud_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_ud_expected[i] != vec_ud_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_dive unsigned result[%d] = %lld != " > + "expected[%d] = %lld\n", > + i, vec_ud_result[i], i, vec_ud_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed word modulo */ > + i_arg1 = (vector int){ 23, 45, 61, 89}; > + i_arg2 = (vector int){ 2, 2, 2, 2}; > + vec_i_expected = (vector int){1, 1, 1, 1}; > + > + vec_i_result = vec_mod (i_arg1, i_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_i_expected[i] != vec_i_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mod signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_i_result[i], i, vec_i_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned word modulo */ > + u_arg1 = (vector unsigned int){ 25, 41, 67, 86}; > + u_arg2 = (vector unsigned int){ 3, 3, 3, 3}; > + vec_u_expected = (vector unsigned int){1, 2, 1, 2}; > + > + vec_u_result = vec_mod (u_arg1, u_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_u_expected[i] != vec_u_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mod unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_u_result[i], i, vec_u_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed double word modulo */ > + d_arg1 = (vector long long){ 24, 68}; > + d_arg2 = (vector long long){ 7, 7}; > + vec_d_expected = (vector long long){3, 5}; > + > + vec_d_result = vec_mod (d_arg1, d_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_d_expected[i] != vec_d_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mod signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_d_result[i], i, vec_d_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned double word modulo */ > + ud_arg1 = (vector unsigned long long){ 24, 68}; > + ud_arg2 = (vector unsigned long long){ 8, 8}; > + vec_ud_expected = (vector unsigned long long){0, 4}; > + > + vec_ud_result = vec_mod (ud_arg1, ud_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_ud_expected[i] != vec_ud_result[i]) > +#ifdef DEBUG > + printf("ERROR vecmod unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_ud_result[i], i, vec_ud_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed word multiply high */ > + i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 }; > + i_arg2 = (vector int){ 2, 3, 4, 5}; > + // vec_i_expected = (vector int){-1, -2, -2, -3}; > + vec_i_expected = (vector int){1, -2, -2, -3}; > + > + vec_i_result = vec_mulh (i_arg1, i_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_i_expected[i] != vec_i_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mulh signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_i_result[i], i, vec_i_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned word multiply high */ > + u_arg1 = (vector unsigned int){ 2147483648, 2147483648, > + 2147483648, 2147483648 }; > + u_arg2 = (vector unsigned int){ 4, 5, 6, 7 }; > + vec_u_expected = (vector unsigned int){2, 2, 3, 3 }; > + > + vec_u_result = vec_mulh (u_arg1, u_arg2); > + > + for (i = 0; i < 4; i++) > + { > + if (vec_u_expected[i] != vec_u_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mulh unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_u_result[i], i, vec_u_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed double word multiply high */ > + d_arg1 = (vector long long int){ 2305843009213693951, > + 4611686018427387903 }; > + d_arg2 = (vector long long int){ 12, 20 }; > + vec_d_expected = (vector long long int){ 1, 4 }; > + > + vec_d_result = vec_mulh (d_arg1, d_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_d_expected[i] != vec_d_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mulh signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_d_result[i], i, vec_d_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned double word multiply high */ > + ud_arg1 = (vector unsigned long long int){ 2305843009213693951, > + 4611686018427387903 }; > + ud_arg2 = (vector unsigned long long int){ 32, 10 }; > + vec_ud_expected = (vector unsigned long long int){ 3, 2 }; > + > + vec_ud_result = vec_mulh (ud_arg1, ud_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_ud_expected[i] != vec_ud_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mulh unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_ud_result[i], i, vec_ud_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Unsigned double word multiply low */ > + ud_arg1 = (vector unsigned long long int){ 2048, 4096 }; > + ud_arg2 = (vector unsigned long long int){ 2, 4 }; > + vec_ud_expected = (vector unsigned long long int){ 4096, 16384 }; > + > + vec_ud_result = vec_mul (ud_arg1, ud_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_ud_expected[i] != vec_ud_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mul unsigned result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_ud_result[i], i, vec_ud_expected[i]); > +#else > + abort(); > +#endif > + } > + > + /* Signed double word multiply low */ > + d_arg1 = (vector signed long long int){ 2048, 4096 }; > + d_arg2 = (vector signed long long int){ 2, 4 }; > + vec_d_expected = (vector signed long long int){ 4096, 16384 }; > + > + vec_d_result = vec_mul (d_arg1, d_arg2); > + > + for (i = 0; i < 2; i++) > + { > + if (vec_d_expected[i] != vec_d_result[i]) > +#ifdef DEBUG > + printf("ERROR vec_mul signed result[%d] = %d != " > + "expected[%d] = %d\n", > + i, vec_d_result[i], i, vec_d_expected[i]); > +#else > + abort(); > +#endif > + } > + }