From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aldyh@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 836DA3800415
	for <gcc-patches@gcc.gnu.org>; Tue,  6 Sep 2022 07:21:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 836DA3800415
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1662448882;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=sVZX89gsIX1XDxWtpYMwfQpQHM0UhiLZGW+hRl8zLok=;
	b=ABDY02KHs65nUquAKZ5kW+1LLRf6RfsJ7bkKfauj08SWfpokuy78WDoAP97ioe3S0kpZ4+
	0FSxS0gMOgqiDFIZhE1pCeHOTk0/PNpTPBIo7J0nfcWKR6T0XkUkIT4/r9JSVHmzHjEmdf
	BlX+ZkSSDZF9UBLgna30m3WLYrqQ5Bk=
Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com
 [209.85.160.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-671-vgUl4xEgNDmqBgbafCmpXg-1; Tue, 06 Sep 2022 03:21:20 -0400
X-MC-Unique: vgUl4xEgNDmqBgbafCmpXg-1
Received: by mail-oa1-f70.google.com with SMTP id 586e51a60fabf-11ea2e0a080so5668295fac.0
        for <gcc-patches@gcc.gnu.org>; Tue, 06 Sep 2022 00:21:20 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date;
        bh=sVZX89gsIX1XDxWtpYMwfQpQHM0UhiLZGW+hRl8zLok=;
        b=x7wEQxdaM3aXaJnrbmOUMT3AxrCL6iRRp0jbkfEBo9Pr6CsMZkCr7QkWK6PrGA7HLx
         2TQJQkIyfCJ/Xqa7dTdFOTpMW2lBqRzO0j73aTa/KjzZau2YZeufsEzwITuJu/GYX2FJ
         6TIwhVrEndPckGOoeeA7wDECpDpw0Mf6PbxbpRBpCPgJLGjp5Qw7RbFtY3ZD3/wz6Jcs
         FtX5K5LRyi4sAie/KzK5Nb+jTdZM1fm3s8DH50iODhOmAmhzQgqRH07CzxiJ7oNZdvDe
         QLjWrgA7ELZQ6ERBDV/Kl+g4YHEYuwRncfCQUdv4rJ/dT3llRRYrDtbzTmf7QisOOOCL
         9hIg==
X-Gm-Message-State: ACgBeo3UaFDBZkPPSH4ryJJxqWzHIe5nalgk31dpqlWyAwQ+MzniMiQd
	sGXShgHMrefPZi7pk0ETJrXh/OfqwruWCn39Kfio0So3uEXGwjvyaFsuHNM/OGe4MLcUatoTsD/
	Nzj43uu14BSGMTr6efHtR7MU569Wf/gkNnQ==
X-Received: by 2002:a05:6808:f07:b0:344:7739:8e7b with SMTP id m7-20020a0568080f0700b0034477398e7bmr9418348oiw.265.1662448879807;
        Tue, 06 Sep 2022 00:21:19 -0700 (PDT)
X-Google-Smtp-Source: AA6agR4agzqZ8KHiIMUI/RiTMObUFJumw67V3dvCQ0ghpyIP486od8SQmNDnnfHRWXX2L/9cyoiWMMufaYInr0kW7c0=
X-Received: by 2002:a05:6808:f07:b0:344:7739:8e7b with SMTP id
 m7-20020a0568080f0700b0034477398e7bmr9418335oiw.265.1662448879489; Tue, 06
 Sep 2022 00:21:19 -0700 (PDT)
MIME-Version: 1.0
References: <20220905062301.3240191-1-aldyh@redhat.com> <CAFiYyc17QGVXD3k0D-Sj10scS-py13p=4o7uZUOgoepay_ep=A@mail.gmail.com>
 <YxW7/gY8UgLcy2C6@tucnak> <CAGm3qMU0KAqmudP9orh4BSYbDxjqSk=osR=R=yiHRCgyn9k3Cg@mail.gmail.com>
 <CAFiYyc2_=oDfVcorrhDYrhy7FEAoWfqCnRvU5kduvunuT-Uj6Q@mail.gmail.com>
 <CAGm3qMUSJYH1+PHseRu6eTumKM0dZA9ta+H6nTBuaRNmBoj_gg@mail.gmail.com>
 <CAFiYyc3Nohb=QjTwiL6_viCTdkS2=S0gz0KGwHHt7Tw0oBMoLQ@mail.gmail.com>
 <CAGm3qMXmTRT+ATSESE03NvEdBWE-pfavtaiGFjZ23--MK-=Y3Q@mail.gmail.com>
 <CAFiYyc1mKN6uHrR2SuUN7dVJsdrm_h4bPPezdXJfZdmaGQc_GQ@mail.gmail.com>
 <CAGm3qMUJDpn2gw1CyUu64BEdNi9PKtean+OA_E4LQBv+rSTFWQ@mail.gmail.com> <CAFiYyc0tPQP1iT+UT-E=vc8EKWREUgBcJfWJ-rdAFtSQ7BynkA@mail.gmail.com>
In-Reply-To: <CAFiYyc0tPQP1iT+UT-E=vc8EKWREUgBcJfWJ-rdAFtSQ7BynkA@mail.gmail.com>
From: Aldy Hernandez <aldyh@redhat.com>
Date: Tue, 6 Sep 2022 09:21:08 +0200
Message-ID: <CAGm3qMVCf4rxMaeGNtcp2X+OR_+KK-_NJgSiEUi+ZDp9DRbtrw@mail.gmail.com>
Subject: Re: [COMMITTED] Be even more conservative in intersection of NANs.
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>, "MacLeod, Andrew" <amacleod@redhat.com>, 
	GCC patches <gcc-patches@gcc.gnu.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Sep 5, 2022 at 2:16 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 1:45 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > > > > >
> > > > > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > > > >
> > > > > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > > > >
> > > > > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > > > > >
> > > > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > > > >
> > > > > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > > > > >
> > > > > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > > > >
> > > > > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > > > > range either includes -NAN and NAN or one or none of those.
> > > > > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > > > > where it either includes both -0 and 0 or just one of them
> > > > > >
> > > > > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > > > > have a NAN, or the state of the NAN is unknown.
> > > > >
> > > > > Sure.  But we are talking about sets of values a variable can have
> > > > > (a value "range" where "range" is a bit misleading for something
> > > > > like a NAN).  The set of possible values either includes
> > > > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > > > > A set cannot include or not include a "maybe NAN".
> > > > >
> > > > > >  Say [3,5] ?NAN.
> > > > > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > > > > 5.0] ?NAN.
> > > > >
> > > > > On the true side of x >= 5.0 the set of values is described by
> > > > > the [5., +INF] range.  On the false side the set is described
> > > > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > > > set.
> > > > >
> > > > > There's no may-NAN.  There's also no ?4.0, the range either
> > > > > includes 4.0 or it doesn't.
> > > >
> > > > Ah, ok.  I see where the confusion lies.  You're missing that we don't
> > > > have sub-ranges like we do for irange.  We only have two endpoints and
> > > > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > > > However, we can do it with [3,4] ?NAN.  This is by design, but not
> > > > permanent.  I don't have infinite time to work on frange on this cycle
> > > > (I have other things like wide-ints conversion, prange, removal of
> > > > legacy, etc etc), so I wanted something that worked with endpoints,
> > > > signs, and NANs, that's about it.  If at a later time we decide to go
> > > > full throttle with the ability to represent sub-ranges, we can do so.
> > > > Heck, you're welcome to try-- just let me finish the initial
> > > > implementation and get it working correctly first.
> > > >
> > > > It is more important right now to get the usage than the
> > > > representation right.  We could always add sub-ranges, or change the
> > > > representation altogether.  What is very important we agree on is the
> > > > usage, so your suggestions about the FP classification functions below
> > > > are golden.  I'll look into that.
> > > >
> > > > Does that make sense?
> > >
> > > Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
> > > it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
> > > but confusing to leave the notion of a SET here.
> > >
> > > > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > > > singleton.  Both endpoints must match.  We assert this much.  We don't
> > > > propagate it.  We can't do equality to it.  The fact that it lives in
> > > > the endpoints is just an implementation detail.
> > >
> > > And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> > > is just inconsistent.  Why's that necessary?  Is there no "empty range"
> > > (aka UNDEFINED) for frange?
> >
> > So what you're suggesting is replacing the tri-state NAN and SIGN bits
> > with two separate NAN flags (+NAN and -NAN) and representing actual
> > NANs in the undefined range?
>
> Yeah.  Note if you keep the SIGN bits the two-bit NAN state would still
> drop to just one bit - NAN or not NAN.  I'm mostly opposed to the idea
> that we need a "maybe" in addition to NAN and not NAN.
>
> > This is just a representation detail,
> > and I have no problem with it, except that it'll take time to
> > implement.  Patches welcome though ;-).
>
> It's also an API and debug (dumping) detail if that implementation detail
> is exposed.  I'm mostly concerned about that and what documentation
> suggests.
>
> > May I suggest we agree on how to access this information (API), and
> > then we can change the implementation details later?
> >
> > For instance, what you suggest for isfinite, isinf, isnan, etc.  What
> > does isfinite return for [0,INF]?  Will isfinite return whether the
>
> isfinite does (see manual page):
>
>        isfinite(x)   returns a nonzero value if
>                      (fpclassify(x) != FP_NAN && fpclassify(x) != FP_INFINITE)
>
> so it returns false.  (it's not isinfinite).  isinf returns false as well here.
> There's a reason I didn't suggest to implement fpclassify because
> there's no "I don't know" result.
>
> > range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
> > preference).  Shall we return true of isnan?, or only for an actual
> > NAN?
>
> Only for an actual NAN.  But yes, implementing these may result
> in confusing if people use !isnan() because that wouldn't mean the
> number is never a NAN.
>
> So maybe instead have, similar to the poly-int stuff,
>
>   maybe_inf ();
>   known_inf ();
>   maybe_nan ();
>   known_nan ();
>   known_finite ();  // maybe_finite () doesn't make much sense to me
>
> > And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
> > empty set across the board.  We like checks for undefined to be
> > fast...i.e. just checking the m_kind field, not having to worry about
> > checking if some other flag is set.  Also, there are still some silly
> > passes making use of vrange::kind() which is deprecated, and if NAN
> > was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
> > problems.  But I'm sure you can come up with another way of
> > representing NAN.    I really don't have strong opinions about
> > representation details.
>
> Hmm, we have undefined_p () which frange could overload.
>
> Btw, if you don't have subranges then how would you represent
> {-Inf, +Inf}?  Not that this is likely to occur in practice.

We can't.  All intervals are closed, so we represent the above as
[-Inf, +Inf] (which is varying actually).  Using closed intervals is
conservatively correct.  For example {3.0, +Inf] is conservatively
correct as [3.0, +Inf] because we're not excluding any numbers.  The
same way as [-Inf, +Inf] (varying) is correct for any unknown range.

This was a trade off for implementing something relatively quickly,
which catches 90% of what we care about.  I originally implemented
open and closed intervals, and then subranges, and things got
unnecessarily complicated pretty fast.

Remember that the original goal was just folding of symbolic
relationals, and that quickly evolved into keeping track of NANs,
signed zeros, intervals, etc.  The goal was just a bare bones
implementation.  I think we're past that now.  I'm squarely into
nightmare territory now, especially with MODE_COMPOSITE_P, flag
rounding math, and what have yous... Just wait till you see what a
"simple" binary operator looks like :).

BTW, it has occurred to me that open intervals, say > 3.0 could be
implemented as one ULP past 3.0, for example with real_nexafter().
Just a crazy thought... dunno if that runs into representation issues
where the ULP is greater than the smallest precision of the target so
we start excluding numbers incorrectly.

Aldy