From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe2d.google.com (mail-vs1-xe2d.google.com [IPv6:2607:f8b0:4864:20::e2d]) by sourceware.org (Postfix) with ESMTPS id C76203858D28 for ; Mon, 25 Apr 2022 06:24:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C76203858D28 Received: by mail-vs1-xe2d.google.com with SMTP id d2so6376648vsd.12 for ; Sun, 24 Apr 2022 23:24:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=LPe2Htu6YxY0KPul1IIOOcRExQ4jV5pu5lIRYL+4mFo=; b=RIAVh/hHNNL3GAf6om+W7EhZTUujjAwz51yp5jSYAZ8jwxC9pJcvBO4yzb/9mPsmAi WYIehMqn8GKJv0Jn22n0hPU5/FNgvuVoFpUYYT16Rerik25noO+738joakl4usf4mOkK 0RlQsxjWSnSjoQSFnZcdNxoOX30fz8+dRnNJEcPKJyteq7uYic4JQWkuc06Ym9A7agPO CmDf/VNvEfQ5D+KxaP4neaCyC2ZN5YtcTBp7ZOx2uG7gQ7yzh5jUqi6rOjFS5lTgTQ1j ySTMAcspQkiyGq0a7xu+Za5ayLsZritxKikLLBq13SlyZ6t4xMNhP39ooXTYkJV5TPSM vl1Q== X-Gm-Message-State: AOAM531sewa7w98chpdOSHoqoNMIS6Iy00th87hPtiWy7zqophSp94I+ P0KvJHGAxBXnmjhLnQRZm2wzrXQf6sMhLc7TCiiqZMhd X-Google-Smtp-Source: ABdhPJyhDjErj1uH83hvdFk3GwxA7fSaFCko2ADKJHWuW8kg1QAVK/LfCyjMpNrVzCb4EWj8TbQdmtGnaxbZOR8tgwk= X-Received: by 2002:a05:6102:2752:b0:32c:db47:e6da with SMTP id p18-20020a056102275200b0032cdb47e6damr102255vsu.1.1650867855939; Sun, 24 Apr 2022 23:24:15 -0700 (PDT) MIME-Version: 1.0 References: <20220215095804.GL2646553@tucnak> <6685073D-CCC0-404E-82D1-0A3009AD3FC0@oracle.com> <06744640-90E7-413B-9EEB-B698C9D74959@oracle.com> <3FBB57A8-0B89-4FAA-A000-0C42B9136BFB@oracle.com> In-Reply-To: <3FBB57A8-0B89-4FAA-A000-0C42B9136BFB@oracle.com> From: Richard Biener Date: Mon, 25 Apr 2022 08:24:03 +0200 Message-ID: Subject: Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522] To: Qing Zhao Cc: Jakub Jelinek , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Apr 2022 06:24:19 -0000 On Fri, Apr 22, 2022 at 5:01 PM Qing Zhao wrote: > > > > > On Apr 20, 2022, at 5:38 AM, Richard Biener wrote: > > > > On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao wrote= : > >> > >> > >> > >>> On Apr 14, 2022, at 1:53 AM, Richard Biener wrote: > >>> > >>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao wrot= e: > >>>> > >>>> Hi, Richard, > >>>> > >>>> Thanks a lot for taking a look at this issue (and Sorry that I haven= =E2=80=99t fixed this one yet, I was distracted by other tasks then just fo= rgot this one=E2=80=A6.) > >>>> > >>>>> On Apr 13, 2022, at 3:41 AM, Richard Biener wrote: > >>>>> > >>>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches > >>>>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek wro= te: > >>>>>>> > >>>>>>> Hi! > >>>>>>> > >>>>>>> For IBM double double I've added in PR95450 and PR99648 verificat= ion that > >>>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as= a REAL_CST > >>>>>>> or CONST_DOUBLE constant, we try to encode it back to target byte= s and > >>>>>>> verify it is the same. > >>>>>>> This is because our real.c support isn't able to represent all va= lid values > >>>>>>> of IBM double double which has variable precision. > >>>>>>> In PR104522, it has been noted that we have similar problem with = the > >>>>>>> Intel/Motorola extended XFmode formats, our internal representati= on isn't > >>>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs a= nd unnormal > >>>>>>> values. > >>>>>>> So, the following patch is an attempt to extend that verification= to all > >>>>>>> floats. > >>>>>>> Unfortunately, it wasn't that straightforward, because the > >>>>>>> __builtin_clear_padding code exactly for the XFmode long doubles = needs to > >>>>>>> discover what bits are padding and does that by interpreting memo= ry of > >>>>>>> all 1s. That is actually a valid supported value, a qNaN with ne= gative > >>>>>>> sign with all mantissa bits set, but the verification includes al= so the > >>>>>>> padding bits (exactly what __builtin_clear_padding wants to figur= e out) > >>>>>>> and so fails the comparison check and so we ICE. > >>>>>>> The patch fixes that case by moving that verification from > >>>>>>> native_interpret_real to its caller, so that clear_padding_type c= an > >>>>>>> call native_interpret_real and avoid that extra check. > >>>>>>> > >>>>>>> With this, the only thing that regresses in the testsuite is > >>>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t= -16843010 5 > >>>>>>> because it decides to use a pattern that has non-zero bits in the= padding > >>>>>>> bits of the long double, so the simplify-rtx.cc change prevents f= olding > >>>>>>> a SUBREG into a constant. We emit (the testcase is -O0 but we em= it worse > >>>>>>> code at all opt levels) something like: > >>>>>>> movabsq $-72340172838076674, %rax > >>>>>>> movabsq $-72340172838076674, %rdx > >>>>>>> movq %rax, -48(%rbp) > >>>>>>> movq %rdx, -40(%rbp) > >>>>>>> fldt -48(%rbp) > >>>>>>> fstpt -32(%rbp) > >>>>>>> instead of > >>>>>>> fldt .LC2(%rip) > >>>>>>> fstpt -32(%rbp) > >>>>>>> ... > >>>>>>> .LC2: > >>>>>>> .long -16843010 > >>>>>>> .long -16843010 > >>>>>>> .long 65278 > >>>>>>> .long 0 > >>>>>>> Note, neither of those sequences actually stores the padding bits= , fstpt > >>>>>>> simply doesn't touch them. > >>>>>>> For vars with clear_padding_real_needs_padding_p types that are a= llocated > >>>>>>> to memory at expansion time, I'd say much better would be to do t= he stores > >>>>>>> using integral modes rather than XFmode, so do that: > >>>>>>> movabsq $-72340172838076674, %rax > >>>>>>> movq %rax, -32(%rbp) > >>>>>>> movq %rax, -24(%rbp) > >>>>>>> directly. That is the only way to ensure the padding bits are in= itialized > >>>>>>> (or expand __builtin_clear_padding, but then you initialize separ= ately the > >>>>>>> value bits and padding bits). > >>>>>>> > >>>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as = mentioned > >>>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved. > >>>>>> > >>>>>> Thanks, I will try to fix this testing case in a later patch. > >>>>> > >>>>> I've looked at this FAIL now and really wonder whether "pattern ini= t" as > >>>>> implemented makes any sense for non-integral types. > >>>>> We end up with > >>>>> initializing a register (SSA name) with > >>>>> > >>>>> VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe) > >>>>> > >>>>> as we go building a TImode constant (we verified we have a TImode S= ET!) > >>>>> but then > >>>>> > >>>>> /* Pun the LHS to make sure its type has constant size > >>>>> unless it is an SSA name where that's already known. */ > >>>>> if (TREE_CODE (lhs) !=3D SSA_NAME) > >>>>> lhs =3D build1 (VIEW_CONVERT_EXPR, itype, lhs); > >>>>> else > >>>>> init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), = init); > >>>>> ... > >>>>> expand_assignment (lhs, init, false); > >>>>> > >>>>> and generally registers do not have any padding. This weird expans= ion > >>>>> then causes us to spill the TImode constant and reload the XFmode v= alue, > >>>>> which is definitely not optimal here. > >>>>> > >>>>> One approach to avoid the worse code generation would be to use mod= e > >>>>> specific patterns for registers (like using a NaN or a target speci= fic > >>>>> value that > >>>>> can be loaded cheaply), > >>>> > >>>> You mean that using =E2=80=9Cmode specific patterns=E2=80=9D ONLY fo= r registers? > >>>> Can we use =E2=80=9Cmode specific patterns=E2=80=9D consistently for= both registers and memory? > >>> > >>> The difference is that registers don't have padding while memory > >>> possibly does, also > >>> for aggregates using different patterns is too expensive IMHO. For > >>> registers the extra > >>> complication with generic patterns is that efficient initialization > >>> (without going through memory) > >>> should be a priority IMHO. > >>> > >>> And for stack memory I still think that initializing the full > >>> allocated frame (including padding > >>> between variables!) is the best approach. > >>> > >>>> LLVM use different patterns for different types (Integral, Float, po= inter type, etc=E2=80=A6) in order to > >>>> Catch bugs easily for different types. > >>>> > >>>> The beginning versions of this patch tried to do similar as LLVM, bu= t later we gave up this and > >>>> Use the same pattern for all different types. > >>>> > >>>> If now, we want to use =E2=80=9Cmode specific patterns=E2=80=9D for = registers, I am wondering whether it=E2=80=99s > >>>> Better to just use =E2=80=9Cmode specific patterns=E2=80=9D consiste= ntly for both registers and memory? > >>>> > >>>>> another would be to simply fall back to zero > >>>>> initialization > >>>>> when we fail to constant fold the initializer like with > >>>>> > >>>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > >>>>> index 8b1733e20c4..a4b995f71e4 100644 > >>>>> --- a/gcc/internal-fn.cc > >>>>> +++ b/gcc/internal-fn.cc > >>>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *st= mt) > >>>>> if (TREE_CODE (lhs) !=3D SSA_NAME) > >>>>> lhs =3D build1 (VIEW_CONVERT_EXPR, itype, lhs); > >>>>> else > >>>>> - init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs= ), init); > >>>>> + { > >>>>> + init =3D fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (l= hs), init); > >>>>> + if (!CONSTANT_CLASS_P (init)) > >>>>> + init =3D build_zero_cst (TREE_TYPE (lhs)); > >>>>> + } > >>>>> } > >>>>> else > >>>>> /* Use zero-init also for variable-length sizes. */ > >>>>> > >>>>> note that still does not address the issue noted by Jakub that we d= o not > >>>>> initialize padding this way - but of course that's because we expan= d a > >>>>> special assignment from .DEFERRED_INIT and are not initializing > >>>>> the storage allocated for the variable on the stack (at -O0) by RTL > >>>>> expansion. Ideally .DEFERRED_INIT "expansion" for stack variables > >>>>> would take place there, simply filling the allocated frame with the > >>>>> pattern or zero. That would probably mean that RTL expansion > >>>>> should scoop up .DEFERRED_INITs for variables it decides not to > >>>>> expand to pseudos and not leave that to stmt expansion. > >>>> > >>>> Need to study this a little bit more... > >> > >> > >> So, Is what you mean in the above a complete rewrite of the current = =E2=80=9Cexpand_DEFERRED_INIT=E2=80=9D: > >> > >> Instead of simply using =E2=80=9Cexpand_builtin_memset=E2=80=9D for = =E2=80=9Cvariables that are not in register=E2=80=9D and =E2=80=9Cexpand_a= ssignment=E2=80=9D > >> for =E2=80=9Cvariables that are in register=E2=80=9D, RTL expansion s= hould directly expand this call in a lower level: > >> > >> i.e, > >> > >> tree lhs =3D gimple_call_lhs (stmt); > >> =E2=80=A6 > >> > >> rtx target =3D expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > >> > >> If (MEM_P (target)) // the variable is allocated on stack > >> { > >> // filling the allocated frame with pattern or zero. How to do i= t?? > >> } > >> else // the variable is in pseudo register. > >> { > >> rtx init_rtx =3D =E2=80=A6; > >> emit_move_insn (target, init_rtx) > >> } > >> > >> > >> Is the above a correct understanding? > > > > No, I would simply expand the automatic in-memory var .DEFERRED_INIT > > calls to nothing > > but instead process their effect at the time we do > > expand_one_stack_var, which means at > > CFG expansion time scan for the .DEFERRED_INITs and record whether and = how to > > initialize the variables. We can then group the variables accordingly > > and block-initialize > > the stack space also covering padding inbetween variables. > > I spent some time reading cfgexpand, especially the =E2=80=9Cexpand_used_= vars=E2=80=9D part. I have the following questions: > > 1. Does =E2=80=9CGroup variables accordingly=E2=80=9D mean: for all the a= uto variables that associate with .DEFERRED_INIT, during =E2=80=9Cexpand_us= ed_var=E2=80=9D > Phase, we should: first =E2=80=9Cdefer the stack allocation=E2=80=9D and = marking them as =E2=80=9CDEFERRED_INIT=E2=80=9D , and later, during the sta= ck grouping phase, in addition to the current > =E2=80=9CPhase 1=E2=80=9D, =E2=80=9Cphase 2=E2=80=9D, =E2=80=9Cphase 3=E2= =80=9D =E2=80=9Cexpand_stack_vars=E2=80=9D, add another phase in =E2=80=9C= expand_stack_vars=E2=80=9D for the ones marked with =E2=80=9CDEFERRED_INIT= =E2=80=9D, group them together, > And then =E2=80=9Cblock-initialize=E2=80=9D this part of the stack space? Yes. > 2. Looks like that =E2=80=9Cexpand_used_vars=E2=80=9D is done BEFORE =E2= =80=9Cexpand_gimple_basic_block=E2=80=9D, So, if we insert the pattern-init= ialization RTLs for > Block-initializing the stack space, how can we do it in the =E2=80=9Cexp= and_used_vars=E2=80=9D phase before all the RTL statements are emitted? We probably have to emit the actual block initialization at the start of the function. > > Qing > > > > > Richard. > > > >> Thanks. > >> > >> Qing > >> > >>>> > >>>>> > >>>>> I'm going to simply XFAIL this testcase for GCC 12 and opened > >>>>> PR105259 so we can revisit the above for GCC 13. > >>>> > >>>> I can further study this bug and try to come up with a good solution= in gcc13. > >>>> > >>>> Thank you! > >>>> > >>>> Qing > >>>>> > >>>>> Richard. > >>>>> > >>>>>> Qing >