From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-vs1-xe2d.google.com (mail-vs1-xe2d.google.com
 [IPv6:2607:f8b0:4864:20::e2d])
 by sourceware.org (Postfix) with ESMTPS id C76203858D28
 for <gcc-patches@gcc.gnu.org>; Mon, 25 Apr 2022 06:24:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C76203858D28
Received: by mail-vs1-xe2d.google.com with SMTP id d2so6376648vsd.12
 for <gcc-patches@gcc.gnu.org>; Sun, 24 Apr 2022 23:24:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=LPe2Htu6YxY0KPul1IIOOcRExQ4jV5pu5lIRYL+4mFo=;
 b=RIAVh/hHNNL3GAf6om+W7EhZTUujjAwz51yp5jSYAZ8jwxC9pJcvBO4yzb/9mPsmAi
 WYIehMqn8GKJv0Jn22n0hPU5/FNgvuVoFpUYYT16Rerik25noO+738joakl4usf4mOkK
 0RlQsxjWSnSjoQSFnZcdNxoOX30fz8+dRnNJEcPKJyteq7uYic4JQWkuc06Ym9A7agPO
 CmDf/VNvEfQ5D+KxaP4neaCyC2ZN5YtcTBp7ZOx2uG7gQ7yzh5jUqi6rOjFS5lTgTQ1j
 ySTMAcspQkiyGq0a7xu+Za5ayLsZritxKikLLBq13SlyZ6t4xMNhP39ooXTYkJV5TPSM
 vl1Q==
X-Gm-Message-State: AOAM531sewa7w98chpdOSHoqoNMIS6Iy00th87hPtiWy7zqophSp94I+
 P0KvJHGAxBXnmjhLnQRZm2wzrXQf6sMhLc7TCiiqZMhd
X-Google-Smtp-Source: ABdhPJyhDjErj1uH83hvdFk3GwxA7fSaFCko2ADKJHWuW8kg1QAVK/LfCyjMpNrVzCb4EWj8TbQdmtGnaxbZOR8tgwk=
X-Received: by 2002:a05:6102:2752:b0:32c:db47:e6da with SMTP id
 p18-20020a056102275200b0032cdb47e6damr102255vsu.1.1650867855939; Sun, 24 Apr
 2022 23:24:15 -0700 (PDT)
MIME-Version: 1.0
References: <20220215095804.GL2646553@tucnak>
 <6685073D-CCC0-404E-82D1-0A3009AD3FC0@oracle.com>
 <CAFiYyc185w6AZTOZ_Tn93qHqzhn+-xVP1-gVZ8xgZ2FT8vKaxg@mail.gmail.com>
 <F178A0CA-DA9B-421A-96DC-8E0DBA1EA971@oracle.com>
 <CAFiYyc2guZ_kwePDLk7ta42JDp0uJtkDBFrALdi-adft1ZJrCg@mail.gmail.com>
 <06744640-90E7-413B-9EEB-B698C9D74959@oracle.com>
 <CAFiYyc2iJ_5wg0ddZZUHMZrXmLeQ7z3aHyQQH3sdOQ0OKfBCJw@mail.gmail.com>
 <3FBB57A8-0B89-4FAA-A000-0C42B9136BFB@oracle.com>
In-Reply-To: <3FBB57A8-0B89-4FAA-A000-0C42B9136BFB@oracle.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 25 Apr 2022 08:24:03 +0200
Message-ID: <CAFiYyc0gEfT6szBDxibXrN3UTYQYnj-BNuOzJG7DshKDJvL7fg@mail.gmail.com>
Subject: Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating
 point constants [PR104522]
To: Qing Zhao <qing.zhao@oracle.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2022 06:24:19 -0000

On Fri, Apr 22, 2022 at 5:01 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On Apr 20, 2022, at 5:38 AM, Richard Biener <richard.guenther@gmail.com=
> wrote:
> >
> > On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao <qing.zhao@oracle.com> wrote=
:
> >>
> >>
> >>
> >>> On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guenther@gmail.c=
om> wrote:
> >>>
> >>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.zhao@oracle.com> wrot=
e:
> >>>>
> >>>> Hi, Richard,
> >>>>
> >>>> Thanks a lot for taking a look at this issue (and Sorry that I haven=
=E2=80=99t fixed this one yet, I was distracted by other tasks then just fo=
rgot this one=E2=80=A6.)
> >>>>
> >>>>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guenther@gmail=
.com> wrote:
> >>>>>
> >>>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
> >>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <jakub@redhat.com> wro=
te:
> >>>>>>>
> >>>>>>> Hi!
> >>>>>>>
> >>>>>>> For IBM double double I've added in PR95450 and PR99648 verificat=
ion that
> >>>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as=
 a REAL_CST
> >>>>>>> or CONST_DOUBLE constant, we try to encode it back to target byte=
s and
> >>>>>>> verify it is the same.
> >>>>>>> This is because our real.c support isn't able to represent all va=
lid values
> >>>>>>> of IBM double double which has variable precision.
> >>>>>>> In PR104522, it has been noted that we have similar problem with =
the
> >>>>>>> Intel/Motorola extended XFmode formats, our internal representati=
on isn't
> >>>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs a=
nd unnormal
> >>>>>>> values.
> >>>>>>> So, the following patch is an attempt to extend that verification=
 to all
> >>>>>>> floats.
> >>>>>>> Unfortunately, it wasn't that straightforward, because the
> >>>>>>> __builtin_clear_padding code exactly for the XFmode long doubles =
needs to
> >>>>>>> discover what bits are padding and does that by interpreting memo=
ry of
> >>>>>>> all 1s.  That is actually a valid supported value, a qNaN with ne=
gative
> >>>>>>> sign with all mantissa bits set, but the verification includes al=
so the
> >>>>>>> padding bits (exactly what __builtin_clear_padding wants to figur=
e out)
> >>>>>>> and so fails the comparison check and so we ICE.
> >>>>>>> The patch fixes that case by moving that verification from
> >>>>>>> native_interpret_real to its caller, so that clear_padding_type c=
an
> >>>>>>> call native_interpret_real and avoid that extra check.
> >>>>>>>
> >>>>>>> With this, the only thing that regresses in the testsuite is
> >>>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t=
-16843010 5
> >>>>>>> because it decides to use a pattern that has non-zero bits in the=
 padding
> >>>>>>> bits of the long double, so the simplify-rtx.cc change prevents f=
olding
> >>>>>>> a SUBREG into a constant.  We emit (the testcase is -O0 but we em=
it worse
> >>>>>>> code at all opt levels) something like:
> >>>>>>>     movabsq $-72340172838076674, %rax
> >>>>>>>     movabsq $-72340172838076674, %rdx
> >>>>>>>     movq    %rax, -48(%rbp)
> >>>>>>>     movq    %rdx, -40(%rbp)
> >>>>>>>     fldt    -48(%rbp)
> >>>>>>>     fstpt   -32(%rbp)
> >>>>>>> instead of
> >>>>>>>     fldt    .LC2(%rip)
> >>>>>>>     fstpt   -32(%rbp)
> >>>>>>> ...
> >>>>>>> .LC2:
> >>>>>>>     .long   -16843010
> >>>>>>>     .long   -16843010
> >>>>>>>     .long   65278
> >>>>>>>     .long   0
> >>>>>>> Note, neither of those sequences actually stores the padding bits=
, fstpt
> >>>>>>> simply doesn't touch them.
> >>>>>>> For vars with clear_padding_real_needs_padding_p types that are a=
llocated
> >>>>>>> to memory at expansion time, I'd say much better would be to do t=
he stores
> >>>>>>> using integral modes rather than XFmode, so do that:
> >>>>>>>     movabsq $-72340172838076674, %rax
> >>>>>>>    movq    %rax, -32(%rbp)
> >>>>>>>    movq    %rax, -24(%rbp)
> >>>>>>> directly.  That is the only way to ensure the padding bits are in=
itialized
> >>>>>>> (or expand __builtin_clear_padding, but then you initialize separ=
ately the
> >>>>>>> value bits and padding bits).
> >>>>>>>
> >>>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as =
mentioned
> >>>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
> >>>>>>
> >>>>>> Thanks, I will try to fix this testing case in a later patch.
> >>>>>
> >>>>> I've looked at this FAIL now and really wonder whether "pattern ini=
t" as
> >>>>> implemented makes any sense for non-integral types.
> >>>>> We end up with
> >>>>> initializing a register (SSA name) with
> >>>>>
> >>>>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe)
> >>>>>
> >>>>> as we go building a TImode constant (we verified we have a TImode S=
ET!)
> >>>>> but then
> >>>>>
> >>>>>        /* Pun the LHS to make sure its type has constant size
> >>>>>           unless it is an SSA name where that's already known.  */
> >>>>>        if (TREE_CODE (lhs) !=3D SSA_NAME)
> >>>>>          lhs =3D build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >>>>>        else
> >>>>>          init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), =
init);
> >>>>> ...
> >>>>>    expand_assignment (lhs, init, false);
> >>>>>
> >>>>> and generally registers do not have any padding.  This weird expans=
ion
> >>>>> then causes us to spill the TImode constant and reload the XFmode v=
alue,
> >>>>> which is definitely not optimal here.
> >>>>>
> >>>>> One approach to avoid the worse code generation would be to use mod=
e
> >>>>> specific patterns for registers (like using a NaN or a target speci=
fic
> >>>>> value that
> >>>>> can be loaded cheaply),
> >>>>
> >>>> You mean that using =E2=80=9Cmode specific patterns=E2=80=9D ONLY fo=
r registers?
> >>>> Can we use =E2=80=9Cmode specific patterns=E2=80=9D consistently for=
 both registers and memory?
> >>>
> >>> The difference is that registers don't have padding while memory
> >>> possibly does, also
> >>> for aggregates using different patterns is too expensive IMHO.  For
> >>> registers the extra
> >>> complication with generic patterns is that efficient initialization
> >>> (without going through memory)
> >>> should be a priority IMHO.
> >>>
> >>> And for stack memory I still think that initializing the full
> >>> allocated frame (including padding
> >>> between variables!) is the best approach.
> >>>
> >>>> LLVM use different patterns for different types (Integral, Float, po=
inter type, etc=E2=80=A6) in order to
> >>>> Catch bugs easily for different types.
> >>>>
> >>>> The beginning versions of this patch tried to do similar as LLVM, bu=
t later we gave up this and
> >>>> Use the same pattern for all different types.
> >>>>
> >>>> If now, we want to use =E2=80=9Cmode specific patterns=E2=80=9D for =
registers, I am wondering whether it=E2=80=99s
> >>>> Better to just use =E2=80=9Cmode specific patterns=E2=80=9D consiste=
ntly for both registers and memory?
> >>>>
> >>>>> another would be to simply fall back to zero
> >>>>> initialization
> >>>>> when we fail to constant fold the initializer like with
> >>>>>
> >>>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >>>>> index 8b1733e20c4..a4b995f71e4 100644
> >>>>> --- a/gcc/internal-fn.cc
> >>>>> +++ b/gcc/internal-fn.cc
> >>>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *st=
mt)
> >>>>>        if (TREE_CODE (lhs) !=3D SSA_NAME)
> >>>>>          lhs =3D build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >>>>>        else
> >>>>> -           init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs=
), init);
> >>>>> +           {
> >>>>> +             init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (l=
hs), init);
> >>>>> +             if (!CONSTANT_CLASS_P (init))
> >>>>> +               init =3D build_zero_cst (TREE_TYPE (lhs));
> >>>>> +           }
> >>>>>      }
> >>>>>     else
> >>>>>      /* Use zero-init also for variable-length sizes.  */
> >>>>>
> >>>>> note that still does not address the issue noted by Jakub that we d=
o not
> >>>>> initialize padding this way - but of course that's because we expan=
d a
> >>>>> special assignment from .DEFERRED_INIT and are not initializing
> >>>>> the storage allocated for the variable on the stack (at -O0) by RTL
> >>>>> expansion.  Ideally .DEFERRED_INIT "expansion" for stack variables
> >>>>> would take place there, simply filling the allocated frame with the
> >>>>> pattern or zero.  That would probably mean that RTL expansion
> >>>>> should scoop up .DEFERRED_INITs for variables it decides not to
> >>>>> expand to pseudos and not leave that to stmt expansion.
> >>>>
> >>>> Need to study this a little bit more...
> >>
> >>
> >> So, Is what you mean in the above a complete rewrite of the current =
=E2=80=9Cexpand_DEFERRED_INIT=E2=80=9D:
> >>
> >> Instead of simply using =E2=80=9Cexpand_builtin_memset=E2=80=9D for =
=E2=80=9Cvariables that are not in register=E2=80=9D and  =E2=80=9Cexpand_a=
ssignment=E2=80=9D
> >> for =E2=80=9Cvariables that are in register=E2=80=9D,  RTL expansion s=
hould directly expand this call in a lower level:
> >>
> >> i.e,
> >>
> >>  tree lhs =3D gimple_call_lhs (stmt);
> >>  =E2=80=A6
> >>
> >>  rtx target =3D expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >>
> >>  If (MEM_P (target))   // the variable is allocated on stack
> >>  {
> >>     // filling the allocated frame with pattern or zero.   How to do i=
t??
> >>  }
> >>  else                     // the variable is in pseudo register.
> >>  {
> >>    rtx init_rtx =3D =E2=80=A6;
> >>    emit_move_insn (target, init_rtx)
> >>  }
> >>
> >>
> >> Is the above a correct understanding?
> >
> > No, I would simply expand the automatic in-memory var .DEFERRED_INIT
> > calls to nothing
> > but instead process their effect at the time we do
> > expand_one_stack_var, which means at
> > CFG expansion time scan for the .DEFERRED_INITs and record whether and =
how to
> > initialize the variables.  We can then group the variables accordingly
> > and block-initialize
> > the stack space also covering padding inbetween variables.
>
> I spent some time reading cfgexpand, especially the =E2=80=9Cexpand_used_=
vars=E2=80=9D part. I have the following questions:
>
> 1. Does =E2=80=9CGroup variables accordingly=E2=80=9D mean: for all the a=
uto variables that associate with .DEFERRED_INIT, during =E2=80=9Cexpand_us=
ed_var=E2=80=9D
> Phase, we should: first =E2=80=9Cdefer the stack allocation=E2=80=9D and =
marking them as =E2=80=9CDEFERRED_INIT=E2=80=9D , and later, during the sta=
ck grouping phase, in addition to the current
> =E2=80=9CPhase 1=E2=80=9D, =E2=80=9Cphase 2=E2=80=9D, =E2=80=9Cphase 3=E2=
=80=9D =E2=80=9Cexpand_stack_vars=E2=80=9D, add another phase in  =E2=80=9C=
expand_stack_vars=E2=80=9D for the ones marked with =E2=80=9CDEFERRED_INIT=
=E2=80=9D, group them together,
> And then =E2=80=9Cblock-initialize=E2=80=9D this part of the stack space?

Yes.

> 2. Looks like that =E2=80=9Cexpand_used_vars=E2=80=9D is done BEFORE =E2=
=80=9Cexpand_gimple_basic_block=E2=80=9D, So, if we insert the pattern-init=
ialization RTLs for
> Block-initializing  the stack space, how can we do it in the =E2=80=9Cexp=
and_used_vars=E2=80=9D phase before all the RTL statements are emitted?

We probably have to emit the actual block initialization at the start
of the function.

>
> Qing
>
> >
> > Richard.
> >
> >> Thanks.
> >>
> >> Qing
> >>
> >>>>
> >>>>>
> >>>>> I'm going to simply XFAIL this testcase for GCC 12 and opened
> >>>>> PR105259 so we can revisit the above for GCC 13.
> >>>>
> >>>> I can further study this bug and try to come up with a good solution=
 in gcc13.
> >>>>
> >>>> Thank you!
> >>>>
> >>>> Qing
> >>>>>
> >>>>> Richard.
> >>>>>
> >>>>>> Qing
>