From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 940A43856B69; Tue, 6 Dec 2022 11:27:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 940A43856B69 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2B69lAS1022935; Tue, 6 Dec 2022 11:27:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : cc : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=L55MWzQjQJKxjTuCTjw4VTRQPC1svuUjQUDBbec35Bs=; b=QsAspya2Bj5bu1XEyYThxINXlpFSXdu+SuD6rVvu7JWDdzWma6WNu+rR8w04yDJ35kmY fTdoW9GUyl7HMcexxCejyCk+kzHOT/mGHA/Me2hBqBENcVnVHvfIXLnyaiMEjqvrhIZs J0KCZkIbD/LfB8ngVPZzZUNvxxhmDhXzf9GOml95nJ8zytyBs7Kj2uwYDigKQO/lLIlB oZPZn4syMFidGEXkLx+BPLdr2GKoLZ8yNtIDSSnH/kS3GaWDLYPdzWIzGRjLDpwlF8k4 Iuq/n5AGCJ5wud2I60VfdBQL1xHkqPxXZBoyUFo6V+lanCd8sH8nsuOQtpOzntxdflU3 CQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m8g4kh3qn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Dec 2022 11:27:13 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2B6B91uh024825; Tue, 6 Dec 2022 11:27:13 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m8g4kh3pt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Dec 2022 11:27:13 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.16.1.2) with ESMTP id 2B5K5GZk016460; Tue, 6 Dec 2022 11:27:11 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3m9m6y0y3c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Dec 2022 11:27:11 +0000 Received: from d06av24.portsmouth.uk.ibm.com ([9.149.105.60]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2B6BR7Xj40370550 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 6 Dec 2022 11:27:07 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3866742041; Tue, 6 Dec 2022 11:27:07 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DBD2B4203F; Tue, 6 Dec 2022 11:27:03 +0000 (GMT) Received: from [9.197.224.206] (unknown [9.197.224.206]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 6 Dec 2022 11:27:03 +0000 (GMT) Message-ID: Date: Tue, 6 Dec 2022 19:27:01 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299 Content-Language: en-US To: Michael Meissner References: Cc: gcc-patches@gcc.gnu.org, Segher Boessenkool , Peter Bergner , David Edelsohn , Will Schmidt , William Seurer From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9zikWk-GYIfiIsUNS93p3MSlZmXvMsIU X-Proofpoint-ORIG-GUID: g-EtbyByiYPEuqllFvn3RDwPwy9VT04T X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-06_06,2022-12-06_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 adultscore=0 phishscore=0 priorityscore=1501 impostorscore=0 spamscore=0 malwarescore=0 clxscore=1015 suspectscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2212060092 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Mike, Thanks for fixing this, some comments are inlined below. on 2022/11/2 10:42, Michael Meissner wrote: > This patch fixes the issue that GCC cannot build when the default long double > is IEEE 128-bit. It fails in building libgcc, specifically when it is trying > to buld the __mulkc3 function in libgcc. It is failing in gimple-range-fold.cc > during the evrp pass. Ultimately it is failing because the code declared the > type to use TFmode but it used F128 functions (i.e. KFmode). > > typedef float TFtype __attribute__((mode (TF))); > typedef __complex float TCtype __attribute__((mode (TC))); > > TCtype > __mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d) > { > TFtype ac, bd, ad, bc, x, y; > TCtype res; > > ac = a * c; > bd = b * d; > ad = a * d; > bc = b * c; > > x = ac - bd; > y = ad + bc; > > if (__builtin_isnan (x) && __builtin_isnan (y)) > { > _Bool recalc = 0; > if (__builtin_isinf (a) || __builtin_isinf (b)) > { > > a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a); > b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b); > if (__builtin_isnan (c)) > c = __builtin_copysignf128 (0, c); > if (__builtin_isnan (d)) > d = __builtin_copysignf128 (0, d); > recalc = 1; > } > if (__builtin_isinf (c) || __builtin_isinf (d)) > { > > c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c); > d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d); > if (__builtin_isnan (a)) > a = __builtin_copysignf128 (0, a); > if (__builtin_isnan (b)) > b = __builtin_copysignf128 (0, b); > recalc = 1; > } > if (!recalc > && (__builtin_isinf (ac) || __builtin_isinf (bd) > || __builtin_isinf (ad) || __builtin_isinf (bc))) > { > > if (__builtin_isnan (a)) > a = __builtin_copysignf128 (0, a); > if (__builtin_isnan (b)) > b = __builtin_copysignf128 (0, b); > if (__builtin_isnan (c)) > c = __builtin_copysignf128 (0, c); > if (__builtin_isnan (d)) > d = __builtin_copysignf128 (0, d); > recalc = 1; > } > if (recalc) > { > x = __builtin_inff128 () * (a * c - b * d); > y = __builtin_inff128 () * (a * d + b * c); > } > } > > __real__ res = x; > __imag__ res = y; > return res; > } > One further reduced test case can be: typedef float TFtype __attribute__((mode (TF))); TFtype test (TFtype t) { return __builtin_copysignf128 (1.0q, t); } Since this reduced test case is quite small, maybe it's good to make it as one test case associated with this patch. > Currently GCC uses the long double type node for __float128 if long double is > IEEE 128-bit. It did not use the node for _Float128. > > Originally this was noticed if you call the nansq function to make a signaling > NaN (nansq is mapped to nansf128). Because the type node for _Float128 is > different from __float128, the machine independent code converts signaling NaNs > to quiet NaNs if the types are not compatible. The following tests used to > fail when run on a system where long double is IEEE 128-bit: > > gcc.dg/torture/float128-nan.c > gcc.target/powerpc/nan128-1.c > > This patch makes both __float128 and _Float128 use the same type node. > > One side effect of not using the long double type node for __float128 is that we > must only use KFmode for _Float128/__float128. The libstdc++ library won't > build if we use TFmode for _Float128 and __float128 when long double is IEEE > 128-bit. > Sorry that I didn't get the point of the latter sentence, this proposed patch uses KFmode for _Float128 and __float128, do you mean that would be fine for libstdc++ library building since we don't use TFmode for them? > Another minor side effect is that the f128 round to odd fused multiply-add > function will not merge negatition with the FMA operation when the type is long > double. If the type is __float128 or _Float128, then it will continue to do the > optimization. The round to odd functions are defined in terms of __float128 > arguments. For example: > > long double > do_fms (long double a, long double b, long double c) > { > return __builtin_fmaf128_round_to_odd (a, b, -c); > } > > will generate (assuming -mabi=ieeelongdouble): > > xsnegqp 4,4 > xsmaddqpo 4,2,3 > xxlor 34,36,36 > > while: > > __float128 > do_fms (__float128 a, __float128 b, __float128 c) > { > return __builtin_fmaf128_round_to_odd (a, b, -c); > } > > will generate: > > xsmsubqpo 4,2,3 > xxlor 34,36,36 > It sounds like a tiny "regression", maybe we should open a PR to track it? > I tested all 3 patchs for PR target/107299 on: > > 1) LE Power10 using --with-cpu=power10 --with-long-double-format=ieee > 2) LE Power10 using --with-cpu=power10 --with-long-double-format=ibm > 3) LE Power9 using --with-cpu=power9 --with-long-double-format=ibm > 4) BE Power8 using --with-cpu=power8 --with-long-double-format=ibm > > Once all 3 patches have been applied, we can once again build GCC when long > double is IEEE 128-bit. There were no other regressions with these patches. > Can I check these patches into the trunk? > > 2022-11-01 Michael Meissner > > gcc/ > > PR target/107299 > * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the > _Float128 type for __float128. > (rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the > built-in passes or returns TFmode. If the predicate failed because the > modes were different, use convert_move to load up the value instead of > copy_to_mode_reg. > * config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't > translate IEEE 128-bit floating point modes to explicit IEEE 128-bit > modes (KFmode or KCmode), even if long double is IEEE 128-bit. You meant to say "Translate .." not "Don't .."? > (rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the > time if we support IEEE 128-bit floating point. > (rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode. > > gcc/testsuite/ > > PR target/107299 > * gcc.target/powerpc/float128-hw12.c: New test. > * gcc.target/powerpc/float128-hw13.c: Likewise. > * gcc.target/powerpc/float128-hw4.c: Update insns. > --- > gcc/config/rs6000/rs6000-builtin.cc | 237 ++++++++++-------- > gcc/config/rs6000/rs6000.cc | 31 ++- > .../gcc.target/powerpc/float128-hw12.c | 137 ++++++++++ > .../gcc.target/powerpc/float128-hw13.c | 137 ++++++++++ > .../gcc.target/powerpc/float128-hw4.c | 10 +- > 5 files changed, 431 insertions(+), 121 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc > index 90ab39dc258..e5298f45363 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -730,25 +730,28 @@ rs6000_init_builtins (void) > > if (TARGET_FLOAT128_TYPE) > { > - if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128) > - ieee128_float_type_node = long_double_type_node; > - else > + /* In the past we used long_double_type_node when long double was IEEE > + 128-bit. However, this means that the _Float128 type > + (i.e. float128_type_node) is a different type from __float128 > + (i.e. ieee128_float_type_nonde). This leads to some corner cases, ~~~~ node > + such as processing signaling NaNs with the nansf128 built-in function > + (which returns a _Float128 value) and assign it to a long double or > + __float128 value. The two explicit IEEE 128-bit types should always > + use the same internal type. > + > + For C we only need to register the __ieee128 name for it. For C++, we > + create a distinct type which will mangle differently (u9__ieee128) > + vs. _Float128 (DF128_) and behave backwards compatibly. */ > + if (float128t_type_node == NULL_TREE) > { > - /* For C we only need to register the __ieee128 name for > - it. For C++, we create a distinct type which will mangle > - differently (u9__ieee128) vs. _Float128 (DF128_) and behave > - backwards compatibly. */ > - if (float128t_type_node == NULL_TREE) > - { > - float128t_type_node = make_node (REAL_TYPE); > - TYPE_PRECISION (float128t_type_node) > - = TYPE_PRECISION (float128_type_node); > - layout_type (float128t_type_node); > - SET_TYPE_MODE (float128t_type_node, > - TYPE_MODE (float128_type_node)); > - } > - ieee128_float_type_node = float128t_type_node; > + float128t_type_node = make_node (REAL_TYPE); > + TYPE_PRECISION (float128t_type_node) > + = TYPE_PRECISION (float128_type_node); > + layout_type (float128t_type_node); > + SET_TYPE_MODE (float128t_type_node, > + TYPE_MODE (float128_type_node)); > } > + ieee128_float_type_node = float128t_type_node; > t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST); > lang_hooks.types.register_builtin_type (ieee128_float_type_node, > "__ieee128"); > @@ -3265,13 +3268,13 @@ htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode, > > /* Expand an expression EXP that calls a built-in function, > with result going to TARGET if that's convenient > - (and in mode MODE if that's convenient). > + (and in mode RETURN_MODE if that's convenient). > SUBTARGET may be used as the target for computing one of EXP's operands. > IGNORE is nonzero if the value is to be ignored. > Use the new builtin infrastructure. */ > rtx > rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, > - machine_mode /* mode */, int ignore) > + machine_mode return_mode, int ignore) > { > tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); > enum rs6000_gen_builtins fcode > @@ -3287,78 +3290,99 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, > size_t uns_fcode = (size_t)fcode; > enum insn_code icode = rs6000_builtin_info[uns_fcode].icode; > > - /* TODO: The following commentary and code is inherited from the original > - builtin processing code. The commentary is a bit confusing, with the > - intent being that KFmode is always IEEE-128, IFmode is always IBM > - double-double, and TFmode is the current long double. The code is > - confusing in that it converts from KFmode to TFmode pattern names, > - when the other direction is more intuitive. Try to address this. */ > - > - /* We have two different modes (KFmode, TFmode) that are the IEEE > - 128-bit floating point type, depending on whether long double is the > - IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode). > - It is simpler if we only define one variant of the built-in function, > - and switch the code when defining it, rather than defining two built- > - ins and using the overload table in rs6000-c.cc to switch between the > - two. If we don't have the proper assembler, don't do this switch > - because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing. */ > - if (FLOAT128_IEEE_P (TFmode)) > - switch (icode) > - { > - case CODE_FOR_sqrtkf2_odd: > - icode = CODE_FOR_sqrttf2_odd; > - break; > - case CODE_FOR_trunckfdf2_odd: > - icode = CODE_FOR_trunctfdf2_odd; > - break; > - case CODE_FOR_addkf3_odd: > - icode = CODE_FOR_addtf3_odd; > - break; > - case CODE_FOR_subkf3_odd: > - icode = CODE_FOR_subtf3_odd; > - break; > - case CODE_FOR_mulkf3_odd: > - icode = CODE_FOR_multf3_odd; > - break; > - case CODE_FOR_divkf3_odd: > - icode = CODE_FOR_divtf3_odd; > - break; > - case CODE_FOR_fmakf4_odd: > - icode = CODE_FOR_fmatf4_odd; > - break; > - case CODE_FOR_xsxexpqp_kf: > - icode = CODE_FOR_xsxexpqp_tf; > - break; > - case CODE_FOR_xsxsigqp_kf: > - icode = CODE_FOR_xsxsigqp_tf; > - break; > - case CODE_FOR_xststdcnegqp_kf: > - icode = CODE_FOR_xststdcnegqp_tf; > - break; > - case CODE_FOR_xsiexpqp_kf: > - icode = CODE_FOR_xsiexpqp_tf; > - break; > - case CODE_FOR_xsiexpqpf_kf: > - icode = CODE_FOR_xsiexpqpf_tf; > - break; > - case CODE_FOR_xststdcqp_kf: > - icode = CODE_FOR_xststdcqp_tf; > - break; > - case CODE_FOR_xscmpexpqp_eq_kf: > - icode = CODE_FOR_xscmpexpqp_eq_tf; > - break; > - case CODE_FOR_xscmpexpqp_lt_kf: > - icode = CODE_FOR_xscmpexpqp_lt_tf; > - break; > - case CODE_FOR_xscmpexpqp_gt_kf: > - icode = CODE_FOR_xscmpexpqp_gt_tf; > - break; > - case CODE_FOR_xscmpexpqp_unordered_kf: > - icode = CODE_FOR_xscmpexpqp_unordered_tf; > - break; > - default: > - break; > - } > + /* For 128-bit long double, we may need both the KFmode built-in functions > + and IFmode built-in functions to the equivalent TFmode built-in function, > + if either a TFmode result is expected or any of the arguments use > + TFmode. */ > + if (TARGET_LONG_DOUBLE_128) > + { > + bool uses_tf_mode = return_mode == TFmode; > + if (!uses_tf_mode) > + { > + call_expr_arg_iterator iter; > + tree arg; > + FOR_EACH_CALL_EXPR_ARG (arg, iter, exp) > + { > + if (arg != error_mark_node > + && TYPE_MODE (TREE_TYPE (arg)) == TFmode) > + { > + uses_tf_mode = true; > + break; > + } > + } > + } > + > + /* Convert KFmode built-in functions to TFmode when long double is IEEE > + 128-bit. */ > + if (uses_tf_mode && FLOAT128_IEEE_P (TFmode)) > + switch (icode) > + { > + case CODE_FOR_sqrtkf2_odd: > + icode = CODE_FOR_sqrttf2_odd; > + break; > + case CODE_FOR_trunckfdf2_odd: > + icode = CODE_FOR_trunctfdf2_odd; > + break; > + case CODE_FOR_addkf3_odd: > + icode = CODE_FOR_addtf3_odd; > + break; > + case CODE_FOR_subkf3_odd: > + icode = CODE_FOR_subtf3_odd; > + break; > + case CODE_FOR_mulkf3_odd: > + icode = CODE_FOR_multf3_odd; > + break; > + case CODE_FOR_divkf3_odd: > + icode = CODE_FOR_divtf3_odd; > + break; > + case CODE_FOR_fmakf4_odd: > + icode = CODE_FOR_fmatf4_odd; > + break; > + case CODE_FOR_xsxexpqp_kf: > + icode = CODE_FOR_xsxexpqp_tf; > + break; > + case CODE_FOR_xsxsigqp_kf: > + icode = CODE_FOR_xsxsigqp_tf; > + break; > + case CODE_FOR_xststdcnegqp_kf: > + icode = CODE_FOR_xststdcnegqp_tf; > + break; > + case CODE_FOR_xsiexpqp_kf: > + icode = CODE_FOR_xsiexpqp_tf; > + break; > + case CODE_FOR_xsiexpqpf_kf: > + icode = CODE_FOR_xsiexpqpf_tf; > + break; > + case CODE_FOR_xststdcqp_kf: > + icode = CODE_FOR_xststdcqp_tf; > + break; > + case CODE_FOR_xscmpexpqp_eq_kf: > + icode = CODE_FOR_xscmpexpqp_eq_tf; > + break; > + case CODE_FOR_xscmpexpqp_lt_kf: > + icode = CODE_FOR_xscmpexpqp_lt_tf; > + break; > + case CODE_FOR_xscmpexpqp_gt_kf: > + icode = CODE_FOR_xscmpexpqp_gt_tf; > + break; > + case CODE_FOR_xscmpexpqp_unordered_kf: > + icode = CODE_FOR_xscmpexpqp_unordered_tf; > + break; > + default: > + break; > + } > + > + /* Convert IFmode built-in functions to TFmode when long double is IBM > + 128-bit. */ > + else if (uses_tf_mode && FLOAT128_IBM_P (TFmode)) > + { > + if (icode == CODE_FOR_packif) > + icode = CODE_FOR_packtf; > + > + else if (icode == CODE_FOR_unpackif) > + icode = CODE_FOR_unpacktf; > + } > + } > > /* In case of "#pragma target" changes, we initialize all builtins > but check for actual availability now, during expand time. For > @@ -3481,18 +3505,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, > > if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD) > { > - if (fcode == RS6000_BIF_PACK_IF) > - { > - icode = CODE_FOR_packtf; > - fcode = RS6000_BIF_PACK_TF; > - uns_fcode = (size_t) fcode; > - } > - else if (fcode == RS6000_BIF_UNPACK_IF) > - { > - icode = CODE_FOR_unpacktf; > - fcode = RS6000_BIF_UNPACK_TF; > - uns_fcode = (size_t) fcode; > - } > } This remaining "if" hunk is useless. The others looks good to me. BR, Kewen