From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id DF8023858413
	for <gcc-patches@gcc.gnu.org>; Fri,  4 Nov 2022 13:16:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF8023858413
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1667567807;
	h=from:from:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=uqOwP2aZtsCinM1iRd+GoFu5FWKxaohh3I10TGKFyxw=;
	b=UQd5kshdFCqJhOvTTia1FL1s2i4r94mrKAa8VTmlssQT6zADDtakMVBGWtTdtardPX/1yX
	nHUlpO9TiK5yv2yCtCpGNY9Fzc2xyKXy3ZgpicV+KvTOWhq5gJrBcA1BR4QVc2xFyZRRlD
	u4Q1jNZWiHzO5kPuqsNnyULXUc/xX/M=
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-52-r85MyleuO8qck1jqJ4OTGA-1; Fri, 04 Nov 2022 09:16:44 -0400
X-MC-Unique: r85MyleuO8qck1jqJ4OTGA-1
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 07E611C0CE63
	for <gcc-patches@gcc.gnu.org>; Fri,  4 Nov 2022 13:16:44 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.193.252])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 963C14EA4C;
	Fri,  4 Nov 2022 13:16:43 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
	by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2A4DGebS2676097
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
	Fri, 4 Nov 2022 14:16:41 +0100
Received: (from jakub@localhost)
	by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2A4DGef92676096;
	Fri, 4 Nov 2022 14:16:40 +0100
Date: Fri, 4 Nov 2022 14:16:40 +0100
From: Jakub Jelinek <jakub@redhat.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: GCC patches <gcc-patches@gcc.gnu.org>,
        Andrew MacLeod <amacleod@redhat.com>
Subject: Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.
Message-ID: <Y2UQuK4m5XNql3cU@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <20221013123649.474497-1-aldyh@redhat.com>
 <Y0hRjKccB1XjeTC0@tucnak>
 <CAGm3qMXCqT2r97o-tD8LfNa==z0zT-NSM1539rqS48ShknmhzQ@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CAGm3qMXCqT2r97o-tD8LfNa==z0zT-NSM1539rqS48ShknmhzQ@mail.gmail.com>
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote:
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -200,6 +200,116 @@ frelop_early_resolve (irange &r, tree type,
>  	  && relop_early_resolve (r, type, op1, op2, rel, my_rel));
>  }
>  
> +// If R contains a NAN of unknown sign, update the NAN's signbit
> +// depending on two operands.
> +
> +inline void
> +update_nan_sign (frange &r, const frange &op1, const frange &op2)
> +{
> +  if (!r.maybe_isnan ())
> +    return;
> +
> +  bool op1_nan = op1.maybe_isnan ();
> +  bool op2_nan = op2.maybe_isnan ();
> +  bool sign1, sign2;
> +
> +  gcc_checking_assert (!r.nan_signbit_p (sign1));
> +  if (op1_nan && op2_nan)
> +    {
> +      if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2))
> +	r.update_nan (sign1 | sign2);
> +    }
> +  else if (op1_nan)
> +    {
> +      if (op1.nan_signbit_p (sign1))
> +	r.update_nan (sign1);
> +    }
> +  else if (op2_nan)
> +    {
> +      if (op2.nan_signbit_p (sign2))
> +	r.update_nan (sign2);
> +    }
> +}

IEEE 754-2008 says:
"When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however,
that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result,
sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by
the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN
result, even when there is only one input NaN, or when the NaN is produced from an invalid
operation."
so, while one can e.g. see on x86_64 that in simple -O0
int
main ()
{
  volatile float f1 = __builtin_nansf ("");
  volatile float f2 = __builtin_copysignf (__builtin_nansf (""), -1.0f);
  volatile float f3 = __builtin_nanf ("");
  volatile float f4 = __builtin_copysignf (__builtin_nanf (""), -1.0f);
  volatile float fzero = 0.0f;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2);
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4);
  f1 = -f1;
  f2 = -f2;
  f3 = -f3;
  f4 = -f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2);
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4);
  volatile float f5 = f1 + fzero;
  volatile float f6 = fzero + f1;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f2 + fzero;
  f6 = fzero + f2;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f3 + fzero;
  f6 = fzero + f3;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f4 + fzero;
  f6 = fzero + f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f1 + f2;
  f6 = f2 + f1;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f2 + f3;
  f6 = f3 + f2;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f3 + f4;
  f6 = f4 + f3;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  f5 = f4 + f1;
  f6 = f1 + f4;
  __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6);
  return 0;
}
result of:
7fa00000 ffa00000
7fc00000 ffc00000
ffa00000 7fa00000
ffc00000 7fc00000
ffe00000 ffe00000
7fe00000 7fe00000
ffc00000 ffc00000
7fc00000 7fc00000
7fe00000 ffe00000
ffc00000 7fe00000
7fc00000 ffc00000
ffe00000 7fc00000
which basically shows that copysign copies sign bit, negation toggles it,
binary operation with a single NaN (quiet or signaling) get that NaN
with its sign and for binary operation on 2 NaNs (again quiet or signaling)
one gets sign from the second? operand, I think the above IEEE text
doesn't guarantee it except for a simple assignment (but with no mode
conversion; that one copies the bit), NEGATE_EXPR (toggles it),
ABS_EXPR (clears it), __builtin_copysign*/IFN_COPYSIGN (copies it from
the second operand).  Everything else, including invalid operation cases,
should set the possible sign bit values of NANs to 0/1 rather than just
one of them.  Perhaps COND_EXPR 2nd/3rd operand is a move too.

As you are adding a binary operation, that should be one of those cases
where we drop the NAN sign to VARYING.

> +
> +// If either operand is a NAN, set R to the combination of both NANs
> +// signwise and return TRUE.
> +
> +inline bool
> +propagate_nans (frange &r, const frange &op1, const frange &op2)
> +{
> +  if (op1.known_isnan () || op2.known_isnan ())
> +    {
> +      r.set_nan (op1.type ());
> +      update_nan_sign (r, op1, op2);
> +      return true;
> +    }
> +  return false;
> +}
> +
> +// Set VALUE to its next real value, or INF if the operation overflows.
> +
> +inline void
> +frange_nextafter (enum machine_mode mode,
> +		  REAL_VALUE_TYPE &value,
> +		  const REAL_VALUE_TYPE &inf)
> +{
> +  const real_format *fmt = REAL_MODE_FORMAT (mode);
> +  REAL_VALUE_TYPE tmp;
> +  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
> +  if (overflow)
> +    value = inf;
> +  else
> +    value = tmp;
> +}
> +
> +// Like real_arithmetic, but round the result to INF if the operation
> +// produced inexact results.
> +//
> +// ?? There is still one problematic case, i387.  With
> +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> +// XFmode (long_double_type_node), so that case is OK.  But without
> +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> +// precision (64-bit mantissa) and only occassionally rounded to
> +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> +// this is ok as well though it is just occassionally more precise. ??
> +
> +static void
> +frange_arithmetic (enum tree_code code, tree type,
> +		   REAL_VALUE_TYPE &result,
> +		   const REAL_VALUE_TYPE &op1,
> +		   const REAL_VALUE_TYPE &op2,
> +		   const REAL_VALUE_TYPE &inf)
> +{
> +  REAL_VALUE_TYPE value;
> +  enum machine_mode mode = TYPE_MODE (type);
> +  bool mode_composite = MODE_COMPOSITE_P (mode);
> +
> +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> +  real_convert (&result, mode, &value);
> +
> +  // Be extra careful if there may be discrepancies between the
> +  // compile and runtime results.
> +  if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
> +			  : !real_less (&value, &result)))
> +      && (inexact || !real_identical (&result, &value)))
> +    {
> +      if (mode_composite)
> +	{
> +	  if (real_isdenormal (&result)
> +	      || real_iszero (&result))
> +	    {
> +	      // IBM extended denormals only have DFmode precision.
> +	      REAL_VALUE_TYPE tmp;
> +	      real_convert (&tmp, DFmode, &value);
> +	      frange_nextafter (DFmode, tmp, inf);
> +	      real_convert (&result, mode, &tmp);
> +	      return;
> +	    }

As discussed before, this might not be correct for the larger double double
denormals (is correct for real_iszero).  I'll try to play with it in
self-tests and compare with what one gets from libm nextafterl:
int
main ()
{
  long double a = __builtin_nextafterl (0.0L, 1.0L);
  __builtin_printf ("%La\n", a);
  for (int i = 0; i < 108; i++, a *= 2.0L)
    __builtin_printf ("%d %La %La\n", i, a, __builtin_nextafterl (a, 1.0L));
}
0x0.0000000000001p-1022
0 0x0.0000000000001p-1022 0x0.0000000000002p-1022
1 0x0.0000000000002p-1022 0x0.0000000000003p-1022
2 0x0.0000000000004p-1022 0x0.0000000000005p-1022
3 0x0.0000000000008p-1022 0x0.0000000000009p-1022
4 0x0.000000000001p-1022 0x0.0000000000011p-1022
5 0x0.000000000002p-1022 0x0.0000000000021p-1022
6 0x0.000000000004p-1022 0x0.0000000000041p-1022
7 0x0.000000000008p-1022 0x0.0000000000081p-1022
8 0x0.00000000001p-1022 0x0.0000000000101p-1022
9 0x0.00000000002p-1022 0x0.0000000000201p-1022
10 0x0.00000000004p-1022 0x0.0000000000401p-1022
11 0x0.00000000008p-1022 0x0.0000000000801p-1022
12 0x0.0000000001p-1022 0x0.0000000001001p-1022
13 0x0.0000000002p-1022 0x0.0000000002001p-1022
14 0x0.0000000004p-1022 0x0.0000000004001p-1022
15 0x0.0000000008p-1022 0x0.0000000008001p-1022
16 0x0.000000001p-1022 0x0.0000000010001p-1022
17 0x0.000000002p-1022 0x0.0000000020001p-1022
18 0x0.000000004p-1022 0x0.0000000040001p-1022
19 0x0.000000008p-1022 0x0.0000000080001p-1022
20 0x0.00000001p-1022 0x0.0000000100001p-1022
21 0x0.00000002p-1022 0x0.0000000200001p-1022
22 0x0.00000004p-1022 0x0.0000000400001p-1022
23 0x0.00000008p-1022 0x0.0000000800001p-1022
24 0x0.0000001p-1022 0x0.0000001000001p-1022
25 0x0.0000002p-1022 0x0.0000002000001p-1022
26 0x0.0000004p-1022 0x0.0000004000001p-1022
27 0x0.0000008p-1022 0x0.0000008000001p-1022
28 0x0.000001p-1022 0x0.0000010000001p-1022
29 0x0.000002p-1022 0x0.0000020000001p-1022
30 0x0.000004p-1022 0x0.0000040000001p-1022
31 0x0.000008p-1022 0x0.0000080000001p-1022
32 0x0.00001p-1022 0x0.0000100000001p-1022
33 0x0.00002p-1022 0x0.0000200000001p-1022
34 0x0.00004p-1022 0x0.0000400000001p-1022
35 0x0.00008p-1022 0x0.0000800000001p-1022
36 0x0.0001p-1022 0x0.0001000000001p-1022
37 0x0.0002p-1022 0x0.0002000000001p-1022
38 0x0.0004p-1022 0x0.0004000000001p-1022
39 0x0.0008p-1022 0x0.0008000000001p-1022
40 0x0.001p-1022 0x0.0010000000001p-1022
41 0x0.002p-1022 0x0.0020000000001p-1022
42 0x0.004p-1022 0x0.0040000000001p-1022
43 0x0.008p-1022 0x0.0080000000001p-1022
44 0x0.01p-1022 0x0.0100000000001p-1022
45 0x0.02p-1022 0x0.0200000000001p-1022
46 0x0.04p-1022 0x0.0400000000001p-1022
47 0x0.08p-1022 0x0.0800000000001p-1022
48 0x0.1p-1022 0x0.1000000000001p-1022
49 0x0.2p-1022 0x0.2000000000001p-1022
50 0x0.4p-1022 0x0.4000000000001p-1022
51 0x0.8p-1022 0x0.8000000000001p-1022
52 0x1p-1022 0x1.0000000000001p-1022
53 0x1p-1021 0x1.00000000000008p-1021
54 0x1p-1020 0x1.00000000000004p-1020
55 0x1p-1019 0x1.00000000000002p-1019
56 0x1p-1018 0x1.00000000000001p-1018
57 0x1p-1017 0x1.000000000000008p-1017
58 0x1p-1016 0x1.000000000000004p-1016
59 0x1p-1015 0x1.000000000000002p-1015
60 0x1p-1014 0x1.000000000000001p-1014
61 0x1p-1013 0x1.0000000000000008p-1013
62 0x1p-1012 0x1.0000000000000004p-1012
63 0x1p-1011 0x1.0000000000000002p-1011
64 0x1p-1010 0x1.0000000000000001p-1010
65 0x1p-1009 0x1.00000000000000008p-1009
66 0x1p-1008 0x1.00000000000000004p-1008
67 0x1p-1007 0x1.00000000000000002p-1007
68 0x1p-1006 0x1.00000000000000001p-1006
69 0x1p-1005 0x1.000000000000000008p-1005
70 0x1p-1004 0x1.000000000000000004p-1004
71 0x1p-1003 0x1.000000000000000002p-1003
72 0x1p-1002 0x1.000000000000000001p-1002
73 0x1p-1001 0x1.0000000000000000008p-1001
74 0x1p-1000 0x1.0000000000000000004p-1000
75 0x1p-999 0x1.0000000000000000002p-999
76 0x1p-998 0x1.0000000000000000001p-998
77 0x1p-997 0x1.00000000000000000008p-997
78 0x1p-996 0x1.00000000000000000004p-996
79 0x1p-995 0x1.00000000000000000002p-995
80 0x1p-994 0x1.00000000000000000001p-994
81 0x1p-993 0x1.000000000000000000008p-993
82 0x1p-992 0x1.000000000000000000004p-992
83 0x1p-991 0x1.000000000000000000002p-991
84 0x1p-990 0x1.000000000000000000001p-990
85 0x1p-989 0x1.0000000000000000000008p-989
86 0x1p-988 0x1.0000000000000000000004p-988
87 0x1p-987 0x1.0000000000000000000002p-987
88 0x1p-986 0x1.0000000000000000000001p-986
89 0x1p-985 0x1.00000000000000000000008p-985
90 0x1p-984 0x1.00000000000000000000004p-984
91 0x1p-983 0x1.00000000000000000000002p-983
92 0x1p-982 0x1.00000000000000000000001p-982
93 0x1p-981 0x1.000000000000000000000008p-981
94 0x1p-980 0x1.000000000000000000000004p-980
95 0x1p-979 0x1.000000000000000000000002p-979
96 0x1p-978 0x1.000000000000000000000001p-978
97 0x1p-977 0x1.0000000000000000000000008p-977
98 0x1p-976 0x1.0000000000000000000000004p-976
99 0x1p-975 0x1.0000000000000000000000002p-975
100 0x1p-974 0x1.0000000000000000000000001p-974
101 0x1p-973 0x1.00000000000000000000000008p-973
102 0x1p-972 0x1.00000000000000000000000004p-972
103 0x1p-971 0x1.00000000000000000000000002p-971
104 0x1p-970 0x1.00000000000000000000000001p-970
105 0x1p-969 0x1.000000000000000000000000008p-969
106 0x1p-968 0x1.000000000000000000000000008p-968
107 0x1p-967 0x1.000000000000000000000000008p-967

Will need to find out which of these numbers GCC
real_isdenormal actually treats as denormals and if there are
any which aren't.
Powerpc64le -mlong-double-128 -mabi=ibmlongdouble gcc certain prints:
#define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L)
#define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L)
#define __LDBL_MIN__ 2.00416836000897277799610805135016205e-292L
#define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221372e-324L
and so the LDBL denorm min is correctly the same as double denorm min, while
minimum normal is actually quite larger for long double than double (also
ok, but unusual, as e.g. __DBL_MIN__ is much smaller than __FLT_MIN__ and
__FLT128_MIN__ much smaller than __DBL_MIN__).
Note, seems glibc nextafterl for IBM double double is actually adding
__DBL_DENORM_MIN__ to the value for the small ones, so perhaps we should
do that too and not convert to DFmode and back.

	Jakub