From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id DF8023858413 for ; Fri, 4 Nov 2022 13:16:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF8023858413 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667567807; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uqOwP2aZtsCinM1iRd+GoFu5FWKxaohh3I10TGKFyxw=; b=UQd5kshdFCqJhOvTTia1FL1s2i4r94mrKAa8VTmlssQT6zADDtakMVBGWtTdtardPX/1yX nHUlpO9TiK5yv2yCtCpGNY9Fzc2xyKXy3ZgpicV+KvTOWhq5gJrBcA1BR4QVc2xFyZRRlD u4Q1jNZWiHzO5kPuqsNnyULXUc/xX/M= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-52-r85MyleuO8qck1jqJ4OTGA-1; Fri, 04 Nov 2022 09:16:44 -0400 X-MC-Unique: r85MyleuO8qck1jqJ4OTGA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 07E611C0CE63 for ; Fri, 4 Nov 2022 13:16:44 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.193.252]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 963C14EA4C; Fri, 4 Nov 2022 13:16:43 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2A4DGebS2676097 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 4 Nov 2022 14:16:41 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2A4DGef92676096; Fri, 4 Nov 2022 14:16:40 +0100 Date: Fri, 4 Nov 2022 14:16:40 +0100 From: Jakub Jelinek To: Aldy Hernandez Cc: GCC patches , Andrew MacLeod Subject: Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats. Message-ID: Reply-To: Jakub Jelinek References: <20221013123649.474497-1-aldyh@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > --- a/gcc/range-op-float.cc > +++ b/gcc/range-op-float.cc > @@ -200,6 +200,116 @@ frelop_early_resolve (irange &r, tree type, > && relop_early_resolve (r, type, op1, op2, rel, my_rel)); > } > > +// If R contains a NAN of unknown sign, update the NAN's signbit > +// depending on two operands. > + > +inline void > +update_nan_sign (frange &r, const frange &op1, const frange &op2) > +{ > + if (!r.maybe_isnan ()) > + return; > + > + bool op1_nan = op1.maybe_isnan (); > + bool op2_nan = op2.maybe_isnan (); > + bool sign1, sign2; > + > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > + if (op1_nan && op2_nan) > + { > + if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2)) > + r.update_nan (sign1 | sign2); > + } > + else if (op1_nan) > + { > + if (op1.nan_signbit_p (sign1)) > + r.update_nan (sign1); > + } > + else if (op2_nan) > + { > + if (op2.nan_signbit_p (sign2)) > + r.update_nan (sign2); > + } > +} IEEE 754-2008 says: "When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation." so, while one can e.g. see on x86_64 that in simple -O0 int main () { volatile float f1 = __builtin_nansf (""); volatile float f2 = __builtin_copysignf (__builtin_nansf (""), -1.0f); volatile float f3 = __builtin_nanf (""); volatile float f4 = __builtin_copysignf (__builtin_nanf (""), -1.0f); volatile float fzero = 0.0f; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); f1 = -f1; f2 = -f2; f3 = -f3; f4 = -f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); volatile float f5 = f1 + fzero; volatile float f6 = fzero + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + fzero; f6 = fzero + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + fzero; f6 = fzero + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + fzero; f6 = fzero + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f1 + f2; f6 = f2 + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + f3; f6 = f3 + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + f4; f6 = f4 + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + f1; f6 = f1 + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); return 0; } result of: 7fa00000 ffa00000 7fc00000 ffc00000 ffa00000 7fa00000 ffc00000 7fc00000 ffe00000 ffe00000 7fe00000 7fe00000 ffc00000 ffc00000 7fc00000 7fc00000 7fe00000 ffe00000 ffc00000 7fe00000 7fc00000 ffc00000 ffe00000 7fc00000 which basically shows that copysign copies sign bit, negation toggles it, binary operation with a single NaN (quiet or signaling) get that NaN with its sign and for binary operation on 2 NaNs (again quiet or signaling) one gets sign from the second? operand, I think the above IEEE text doesn't guarantee it except for a simple assignment (but with no mode conversion; that one copies the bit), NEGATE_EXPR (toggles it), ABS_EXPR (clears it), __builtin_copysign*/IFN_COPYSIGN (copies it from the second operand). Everything else, including invalid operation cases, should set the possible sign bit values of NANs to 0/1 rather than just one of them. Perhaps COND_EXPR 2nd/3rd operand is a move too. As you are adding a binary operation, that should be one of those cases where we drop the NAN sign to VARYING. > + > +// If either operand is a NAN, set R to the combination of both NANs > +// signwise and return TRUE. > + > +inline bool > +propagate_nans (frange &r, const frange &op1, const frange &op2) > +{ > + if (op1.known_isnan () || op2.known_isnan ()) > + { > + r.set_nan (op1.type ()); > + update_nan_sign (r, op1, op2); > + return true; > + } > + return false; > +} > + > +// Set VALUE to its next real value, or INF if the operation overflows. > + > +inline void > +frange_nextafter (enum machine_mode mode, > + REAL_VALUE_TYPE &value, > + const REAL_VALUE_TYPE &inf) > +{ > + const real_format *fmt = REAL_MODE_FORMAT (mode); > + REAL_VALUE_TYPE tmp; > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > + if (overflow) > + value = inf; > + else > + value = tmp; > +} > + > +// Like real_arithmetic, but round the result to INF if the operation > +// produced inexact results. > +// > +// ?? There is still one problematic case, i387. With > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in > +// XFmode (long_double_type_node), so that case is OK. But without > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode > +// precision (64-bit mantissa) and only occassionally rounded to > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > +// this is ok as well though it is just occassionally more precise. ?? > + > +static void > +frange_arithmetic (enum tree_code code, tree type, > + REAL_VALUE_TYPE &result, > + const REAL_VALUE_TYPE &op1, > + const REAL_VALUE_TYPE &op2, > + const REAL_VALUE_TYPE &inf) > +{ > + REAL_VALUE_TYPE value; > + enum machine_mode mode = TYPE_MODE (type); > + bool mode_composite = MODE_COMPOSITE_P (mode); > + > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > + real_convert (&result, mode, &value); > + > + // Be extra careful if there may be discrepancies between the > + // compile and runtime results. > + if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value) > + : !real_less (&value, &result))) > + && (inexact || !real_identical (&result, &value))) > + { > + if (mode_composite) > + { > + if (real_isdenormal (&result) > + || real_iszero (&result)) > + { > + // IBM extended denormals only have DFmode precision. > + REAL_VALUE_TYPE tmp; > + real_convert (&tmp, DFmode, &value); > + frange_nextafter (DFmode, tmp, inf); > + real_convert (&result, mode, &tmp); > + return; > + } As discussed before, this might not be correct for the larger double double denormals (is correct for real_iszero). I'll try to play with it in self-tests and compare with what one gets from libm nextafterl: int main () { long double a = __builtin_nextafterl (0.0L, 1.0L); __builtin_printf ("%La\n", a); for (int i = 0; i < 108; i++, a *= 2.0L) __builtin_printf ("%d %La %La\n", i, a, __builtin_nextafterl (a, 1.0L)); } 0x0.0000000000001p-1022 0 0x0.0000000000001p-1022 0x0.0000000000002p-1022 1 0x0.0000000000002p-1022 0x0.0000000000003p-1022 2 0x0.0000000000004p-1022 0x0.0000000000005p-1022 3 0x0.0000000000008p-1022 0x0.0000000000009p-1022 4 0x0.000000000001p-1022 0x0.0000000000011p-1022 5 0x0.000000000002p-1022 0x0.0000000000021p-1022 6 0x0.000000000004p-1022 0x0.0000000000041p-1022 7 0x0.000000000008p-1022 0x0.0000000000081p-1022 8 0x0.00000000001p-1022 0x0.0000000000101p-1022 9 0x0.00000000002p-1022 0x0.0000000000201p-1022 10 0x0.00000000004p-1022 0x0.0000000000401p-1022 11 0x0.00000000008p-1022 0x0.0000000000801p-1022 12 0x0.0000000001p-1022 0x0.0000000001001p-1022 13 0x0.0000000002p-1022 0x0.0000000002001p-1022 14 0x0.0000000004p-1022 0x0.0000000004001p-1022 15 0x0.0000000008p-1022 0x0.0000000008001p-1022 16 0x0.000000001p-1022 0x0.0000000010001p-1022 17 0x0.000000002p-1022 0x0.0000000020001p-1022 18 0x0.000000004p-1022 0x0.0000000040001p-1022 19 0x0.000000008p-1022 0x0.0000000080001p-1022 20 0x0.00000001p-1022 0x0.0000000100001p-1022 21 0x0.00000002p-1022 0x0.0000000200001p-1022 22 0x0.00000004p-1022 0x0.0000000400001p-1022 23 0x0.00000008p-1022 0x0.0000000800001p-1022 24 0x0.0000001p-1022 0x0.0000001000001p-1022 25 0x0.0000002p-1022 0x0.0000002000001p-1022 26 0x0.0000004p-1022 0x0.0000004000001p-1022 27 0x0.0000008p-1022 0x0.0000008000001p-1022 28 0x0.000001p-1022 0x0.0000010000001p-1022 29 0x0.000002p-1022 0x0.0000020000001p-1022 30 0x0.000004p-1022 0x0.0000040000001p-1022 31 0x0.000008p-1022 0x0.0000080000001p-1022 32 0x0.00001p-1022 0x0.0000100000001p-1022 33 0x0.00002p-1022 0x0.0000200000001p-1022 34 0x0.00004p-1022 0x0.0000400000001p-1022 35 0x0.00008p-1022 0x0.0000800000001p-1022 36 0x0.0001p-1022 0x0.0001000000001p-1022 37 0x0.0002p-1022 0x0.0002000000001p-1022 38 0x0.0004p-1022 0x0.0004000000001p-1022 39 0x0.0008p-1022 0x0.0008000000001p-1022 40 0x0.001p-1022 0x0.0010000000001p-1022 41 0x0.002p-1022 0x0.0020000000001p-1022 42 0x0.004p-1022 0x0.0040000000001p-1022 43 0x0.008p-1022 0x0.0080000000001p-1022 44 0x0.01p-1022 0x0.0100000000001p-1022 45 0x0.02p-1022 0x0.0200000000001p-1022 46 0x0.04p-1022 0x0.0400000000001p-1022 47 0x0.08p-1022 0x0.0800000000001p-1022 48 0x0.1p-1022 0x0.1000000000001p-1022 49 0x0.2p-1022 0x0.2000000000001p-1022 50 0x0.4p-1022 0x0.4000000000001p-1022 51 0x0.8p-1022 0x0.8000000000001p-1022 52 0x1p-1022 0x1.0000000000001p-1022 53 0x1p-1021 0x1.00000000000008p-1021 54 0x1p-1020 0x1.00000000000004p-1020 55 0x1p-1019 0x1.00000000000002p-1019 56 0x1p-1018 0x1.00000000000001p-1018 57 0x1p-1017 0x1.000000000000008p-1017 58 0x1p-1016 0x1.000000000000004p-1016 59 0x1p-1015 0x1.000000000000002p-1015 60 0x1p-1014 0x1.000000000000001p-1014 61 0x1p-1013 0x1.0000000000000008p-1013 62 0x1p-1012 0x1.0000000000000004p-1012 63 0x1p-1011 0x1.0000000000000002p-1011 64 0x1p-1010 0x1.0000000000000001p-1010 65 0x1p-1009 0x1.00000000000000008p-1009 66 0x1p-1008 0x1.00000000000000004p-1008 67 0x1p-1007 0x1.00000000000000002p-1007 68 0x1p-1006 0x1.00000000000000001p-1006 69 0x1p-1005 0x1.000000000000000008p-1005 70 0x1p-1004 0x1.000000000000000004p-1004 71 0x1p-1003 0x1.000000000000000002p-1003 72 0x1p-1002 0x1.000000000000000001p-1002 73 0x1p-1001 0x1.0000000000000000008p-1001 74 0x1p-1000 0x1.0000000000000000004p-1000 75 0x1p-999 0x1.0000000000000000002p-999 76 0x1p-998 0x1.0000000000000000001p-998 77 0x1p-997 0x1.00000000000000000008p-997 78 0x1p-996 0x1.00000000000000000004p-996 79 0x1p-995 0x1.00000000000000000002p-995 80 0x1p-994 0x1.00000000000000000001p-994 81 0x1p-993 0x1.000000000000000000008p-993 82 0x1p-992 0x1.000000000000000000004p-992 83 0x1p-991 0x1.000000000000000000002p-991 84 0x1p-990 0x1.000000000000000000001p-990 85 0x1p-989 0x1.0000000000000000000008p-989 86 0x1p-988 0x1.0000000000000000000004p-988 87 0x1p-987 0x1.0000000000000000000002p-987 88 0x1p-986 0x1.0000000000000000000001p-986 89 0x1p-985 0x1.00000000000000000000008p-985 90 0x1p-984 0x1.00000000000000000000004p-984 91 0x1p-983 0x1.00000000000000000000002p-983 92 0x1p-982 0x1.00000000000000000000001p-982 93 0x1p-981 0x1.000000000000000000000008p-981 94 0x1p-980 0x1.000000000000000000000004p-980 95 0x1p-979 0x1.000000000000000000000002p-979 96 0x1p-978 0x1.000000000000000000000001p-978 97 0x1p-977 0x1.0000000000000000000000008p-977 98 0x1p-976 0x1.0000000000000000000000004p-976 99 0x1p-975 0x1.0000000000000000000000002p-975 100 0x1p-974 0x1.0000000000000000000000001p-974 101 0x1p-973 0x1.00000000000000000000000008p-973 102 0x1p-972 0x1.00000000000000000000000004p-972 103 0x1p-971 0x1.00000000000000000000000002p-971 104 0x1p-970 0x1.00000000000000000000000001p-970 105 0x1p-969 0x1.000000000000000000000000008p-969 106 0x1p-968 0x1.000000000000000000000000008p-968 107 0x1p-967 0x1.000000000000000000000000008p-967 Will need to find out which of these numbers GCC real_isdenormal actually treats as denormals and if there are any which aren't. Powerpc64le -mlong-double-128 -mabi=ibmlongdouble gcc certain prints: #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L) #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L) #define __LDBL_MIN__ 2.00416836000897277799610805135016205e-292L #define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221372e-324L and so the LDBL denorm min is correctly the same as double denorm min, while minimum normal is actually quite larger for long double than double (also ok, but unusual, as e.g. __DBL_MIN__ is much smaller than __FLT_MIN__ and __FLT128_MIN__ much smaller than __DBL_MIN__). Note, seems glibc nextafterl for IBM double double is actually adding __DBL_DENORM_MIN__ to the value for the small ones, so perhaps we should do that too and not convert to DFmode and back. Jakub