public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Qing Zhao <qing.zhao@oracle.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]
Date: Fri, 22 Apr 2022 15:26:10 +0000	[thread overview]
Message-ID: <76198A1E-D02D-4615-9B55-705B728AA174@oracle.com> (raw)
In-Reply-To: <CAFiYyc0-ttgX1dtdOp=XH+F99msZFp==uyijiXgCty4rr-7-0A@mail.gmail.com>



> On Apr 21, 2022, at 2:09 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Apr 20, 2022 at 6:02 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> 
>> 
>>> On Apr 20, 2022, at 5:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>> 
>>>>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>>>> 
>>>>>> Hi, Richard,
>>>>>> 
>>>>>> Thanks a lot for taking a look at this issue (and Sorry that I haven’t fixed this one yet, I was distracted by other tasks then just forgot this one….)
>>>>>> 
>>>>>>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi!
>>>>>>>>> 
>>>>>>>>> For IBM double double I've added in PR95450 and PR99648 verification that
>>>>>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST
>>>>>>>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
>>>>>>>>> verify it is the same.
>>>>>>>>> This is because our real.c support isn't able to represent all valid values
>>>>>>>>> of IBM double double which has variable precision.
>>>>>>>>> In PR104522, it has been noted that we have similar problem with the
>>>>>>>>> Intel/Motorola extended XFmode formats, our internal representation isn't
>>>>>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal
>>>>>>>>> values.
>>>>>>>>> So, the following patch is an attempt to extend that verification to all
>>>>>>>>> floats.
>>>>>>>>> Unfortunately, it wasn't that straightforward, because the
>>>>>>>>> __builtin_clear_padding code exactly for the XFmode long doubles needs to
>>>>>>>>> discover what bits are padding and does that by interpreting memory of
>>>>>>>>> all 1s.  That is actually a valid supported value, a qNaN with negative
>>>>>>>>> sign with all mantissa bits set, but the verification includes also the
>>>>>>>>> padding bits (exactly what __builtin_clear_padding wants to figure out)
>>>>>>>>> and so fails the comparison check and so we ICE.
>>>>>>>>> The patch fixes that case by moving that verification from
>>>>>>>>> native_interpret_real to its caller, so that clear_padding_type can
>>>>>>>>> call native_interpret_real and avoid that extra check.
>>>>>>>>> 
>>>>>>>>> With this, the only thing that regresses in the testsuite is
>>>>>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-16843010 5
>>>>>>>>> because it decides to use a pattern that has non-zero bits in the padding
>>>>>>>>> bits of the long double, so the simplify-rtx.cc change prevents folding
>>>>>>>>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit worse
>>>>>>>>> code at all opt levels) something like:
>>>>>>>>>    movabsq $-72340172838076674, %rax
>>>>>>>>>    movabsq $-72340172838076674, %rdx
>>>>>>>>>    movq    %rax, -48(%rbp)
>>>>>>>>>    movq    %rdx, -40(%rbp)
>>>>>>>>>    fldt    -48(%rbp)
>>>>>>>>>    fstpt   -32(%rbp)
>>>>>>>>> instead of
>>>>>>>>>    fldt    .LC2(%rip)
>>>>>>>>>    fstpt   -32(%rbp)
>>>>>>>>> ...
>>>>>>>>> .LC2:
>>>>>>>>>    .long   -16843010
>>>>>>>>>    .long   -16843010
>>>>>>>>>    .long   65278
>>>>>>>>>    .long   0
>>>>>>>>> Note, neither of those sequences actually stores the padding bits, fstpt
>>>>>>>>> simply doesn't touch them.
>>>>>>>>> For vars with clear_padding_real_needs_padding_p types that are allocated
>>>>>>>>> to memory at expansion time, I'd say much better would be to do the stores
>>>>>>>>> using integral modes rather than XFmode, so do that:
>>>>>>>>>    movabsq $-72340172838076674, %rax
>>>>>>>>>   movq    %rax, -32(%rbp)
>>>>>>>>>   movq    %rax, -24(%rbp)
>>>>>>>>> directly.  That is the only way to ensure the padding bits are initialized
>>>>>>>>> (or expand __builtin_clear_padding, but then you initialize separately the
>>>>>>>>> value bits and padding bits).
>>>>>>>>> 
>>>>>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as mentioned
>>>>>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
>>>>>>>> 
>>>>>>>> Thanks, I will try to fix this testing case in a later patch.
>>>>>>> 
>>>>>>> I've looked at this FAIL now and really wonder whether "pattern init" as
>>>>>>> implemented makes any sense for non-integral types.
>>>>>>> We end up with
>>>>>>> initializing a register (SSA name) with
>>>>>>> 
>>>>>>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe)
>>>>>>> 
>>>>>>> as we go building a TImode constant (we verified we have a TImode SET!)
>>>>>>> but then
>>>>>>> 
>>>>>>>       /* Pun the LHS to make sure its type has constant size
>>>>>>>          unless it is an SSA name where that's already known.  */
>>>>>>>       if (TREE_CODE (lhs) != SSA_NAME)
>>>>>>>         lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>>>>>       else
>>>>>>>         init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>>>>>> ...
>>>>>>>   expand_assignment (lhs, init, false);
>>>>>>> 
>>>>>>> and generally registers do not have any padding.  This weird expansion
>>>>>>> then causes us to spill the TImode constant and reload the XFmode value,
>>>>>>> which is definitely not optimal here.
>>>>>>> 
>>>>>>> One approach to avoid the worse code generation would be to use mode
>>>>>>> specific patterns for registers (like using a NaN or a target specific
>>>>>>> value that
>>>>>>> can be loaded cheaply),
>>>>>> 
>>>>>> You mean that using “mode specific patterns” ONLY for registers?
>>>>>> Can we use “mode specific patterns” consistently for both registers and memory?
>>>>> 
>>>>> The difference is that registers don't have padding while memory
>>>>> possibly does, also
>>>>> for aggregates using different patterns is too expensive IMHO.  For
>>>>> registers the extra
>>>>> complication with generic patterns is that efficient initialization
>>>>> (without going through memory)
>>>>> should be a priority IMHO.
>>>>> 
>>>>> And for stack memory I still think that initializing the full
>>>>> allocated frame (including padding
>>>>> between variables!) is the best approach.
>>>>> 
>>>>>> LLVM use different patterns for different types (Integral, Float, pointer type, etc…) in order to
>>>>>> Catch bugs easily for different types.
>>>>>> 
>>>>>> The beginning versions of this patch tried to do similar as LLVM, but later we gave up this and
>>>>>> Use the same pattern for all different types.
>>>>>> 
>>>>>> If now, we want to use “mode specific patterns” for registers, I am wondering whether it’s
>>>>>> Better to just use “mode specific patterns” consistently for both registers and memory?
>>>>>> 
>>>>>>> another would be to simply fall back to zero
>>>>>>> initialization
>>>>>>> when we fail to constant fold the initializer like with
>>>>>>> 
>>>>>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>>>>>> index 8b1733e20c4..a4b995f71e4 100644
>>>>>>> --- a/gcc/internal-fn.cc
>>>>>>> +++ b/gcc/internal-fn.cc
>>>>>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>>>>       if (TREE_CODE (lhs) != SSA_NAME)
>>>>>>>         lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>>>>>       else
>>>>>>> -           init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>>>>>> +           {
>>>>>>> +             init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>>>>>> +             if (!CONSTANT_CLASS_P (init))
>>>>>>> +               init = build_zero_cst (TREE_TYPE (lhs));
>>>>>>> +           }
>>>>>>>     }
>>>>>>>    else
>>>>>>>     /* Use zero-init also for variable-length sizes.  */
>>>>>>> 
>>>>>>> note that still does not address the issue noted by Jakub that we do not
>>>>>>> initialize padding this way - but of course that's because we expand a
>>>>>>> special assignment from .DEFERRED_INIT and are not initializing
>>>>>>> the storage allocated for the variable on the stack (at -O0) by RTL
>>>>>>> expansion.  Ideally .DEFERRED_INIT "expansion" for stack variables
>>>>>>> would take place there, simply filling the allocated frame with the
>>>>>>> pattern or zero.  That would probably mean that RTL expansion
>>>>>>> should scoop up .DEFERRED_INITs for variables it decides not to
>>>>>>> expand to pseudos and not leave that to stmt expansion.
>>>>>> 
>>>>>> Need to study this a little bit more...
>>>> 
>>>> 
>>>> So, Is what you mean in the above a complete rewrite of the current “expand_DEFERRED_INIT”:
>>>> 
>>>> Instead of simply using “expand_builtin_memset” for “variables that are not in register” and  “expand_assignment”
>>>> for “variables that are in register”,  RTL expansion should directly expand this call in a lower level:
>>>> 
>>>> i.e,
>>>> 
>>>> tree lhs = gimple_call_lhs (stmt);
>>>> …
>>>> 
>>>> rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>>> 
>>>> If (MEM_P (target))   // the variable is allocated on stack
>>>> {
>>>>    // filling the allocated frame with pattern or zero.   How to do it??
>>>> }
>>>> else                     // the variable is in pseudo register.
>>>> {
>>>>   rtx init_rtx = …;
>>>>   emit_move_insn (target, init_rtx)
>>>> }
>>>> 
>>>> 
>>>> Is the above a correct understanding?
>>> 
>>> No, I would simply expand the automatic in-memory var .DEFERRED_INIT
>>> calls to nothing
>>> but instead process their effect at the time we do
>>> expand_one_stack_var, which means at
>>> CFG expansion time scan for the .DEFERRED_INITs and record whether and how to
>>> initialize the variables.  We can then group the variables accordingly
>>> and block-initialize
>>> the stack space also covering padding inbetween variables.
>> 
>> So, the above only covers the variables that will be allocated in stack?
> 
> Yes.
> 
>> For the variables that are in pseudo registers, we still need to simply expand the initialization to a move insn?
> 
> Yes, also for stack VLAs.

Why for stack VLAs? 
> 
>> And for the variables in pseudo registers, later after register allocation, some of them will be spilled into stack, too.
>> Specially for long double/complex double variables that might have padding, shall we specially handle them during that time?
> 
> Ideally we'd never spill uninitialized variables but definitely the
> auto-init can cause the need to spill that very
> value.  That means that auto-init of registers should happen after RA?

All the initialization of registers should happen after RA? Only the ones with “long double/complex double” type that might have padding?

> One would need to see what
> IRA/LRA do to uninitialized pseudo uses.  Computing may uninit uses
> after RA should be possible and
> hopefully all ISAs allow initializing hard registers without requiring
> a scratch register (heh - I'm almost sure
> there will be exceptions …)
Need more study here….

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> I'm going to simply XFAIL this testcase for GCC 12 and opened
>>>>>>> PR105259 so we can revisit the above for GCC 13.
>>>>>> 
>>>>>> I can further study this bug and try to come up with a good solution in gcc13.
>>>>>> 
>>>>>> Thank you!
>>>>>> 
>>>>>> Qing
>>>>>>> 
>>>>>>> Richard.
>>>>>>> 
>>>>>>>> Qing


  reply	other threads:[~2022-04-22 15:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-15  9:58 Jakub Jelinek
2022-02-15 10:55 ` Richard Biener
2022-02-15 16:30 ` Qing Zhao
2022-04-13  8:41   ` Richard Biener
2022-04-13 15:22     ` Qing Zhao
2022-04-14  6:53       ` Richard Biener
2022-04-19 21:36         ` Qing Zhao
2022-04-20 10:38           ` Richard Biener
2022-04-20 16:02             ` Qing Zhao
2022-04-21  7:09               ` Richard Biener
2022-04-22 15:26                 ` Qing Zhao [this message]
2022-04-25  6:27                   ` Richard Biener
2022-04-22 15:00             ` Qing Zhao
2022-04-25  6:24               ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76198A1E-D02D-4615-9B55-705B728AA174@oracle.com \
    --to=qing.zhao@oracle.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).