public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
@ 2021-11-04 20:37 Qing Zhao
  2021-11-05  1:34 ` Kewen.Lin
  2021-11-05  6:05 ` Andrew Pinski
  0 siblings, 2 replies; 18+ messages in thread
From: Qing Zhao @ 2021-11-04 20:37 UTC (permalink / raw)
  To: gcc-patches Nick Alcock via

Hi,

I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)

For gcc11:

wide int max elts =3

For gcc12:

wide int max elts =9

Does anyone know what’s the reason for this difference? 

Thanks a lot for any help.

Qing

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-04 20:37 Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different Qing Zhao
@ 2021-11-05  1:34 ` Kewen.Lin
  2021-11-05  6:05 ` Andrew Pinski
  1 sibling, 0 replies; 18+ messages in thread
From: Kewen.Lin @ 2021-11-05  1:34 UTC (permalink / raw)
  To: Qing Zhao; +Cc: gcc-patches Nick Alcock via, H.J. Lu

Hi Qing,

on 2021/11/5 上午4:37, Qing Zhao via Gcc-patches wrote:
> Hi,
> 
> I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
> 
> For gcc11:
> 
> wide int max elts =3
> 
> For gcc12:
> 
> wide int max elts =9
> 
> Does anyone know what’s the reason for this difference? 
> 

I guess it's due to commit r12-979 (782e57f2c09).

For

  #define WIDE_INT_MAX_ELTS \
    ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) / HOST_BITS_PER_WIDE_INT

Before the change, the MAX_BITSIZE_MODE_ANY_INT is explicitly set as 160.

  -#define MAX_BITSIZE_MODE_ANY_INT (160)

  it's (160+64)/64 = 3

After the change, MAX_BITSIZE_MODE_ANY_INT is counted in function emit_max_int
and becomes 512.

  it's (512+64)/64 = 9

As the commit log, the previous 160 bits seems a workaround for some gone
problem, now the commit makes it use the default way to align with the other
ports.

BR,
Kewen

> Thanks a lot for any help.
> 
> Qing
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-04 20:37 Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different Qing Zhao
  2021-11-05  1:34 ` Kewen.Lin
@ 2021-11-05  6:05 ` Andrew Pinski
  2021-11-05  6:53   ` Jakub Jelinek
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Pinski @ 2021-11-05  6:05 UTC (permalink / raw)
  To: Qing Zhao; +Cc: gcc-patches Nick Alcock via

On Thu, Nov 4, 2021 at 1:38 PM Qing Zhao via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
>
> For gcc11:
>
> wide int max elts =3
>
> For gcc12:
>
> wide int max elts =9
>
> Does anyone know what’s the reason for this difference?
>
> Thanks a lot for any help.

Yes originally, the x86 backend only used OI and XI modes for vectors
during data movement.
This changed with r10-5741-gc57b4c22089 which added the use of OI mode
for TImode adding with overflow and then MAX_BITSIZE_MODE_ANY_INT
changed from 128 to 160 (in r10-6178-gc124b345e46078) to fix the ICE
introduced by that change .
And then with r12-979-g782e57f2c09 removed the define of
MAX_BITSIZE_MODE_ANY_INT.
Now what was not mentioned in r12-979-g782e57f2c09 (or before) of why
MAX_BITSIZE_MODE_ANY_INT was defined in the first place for x86. HJL
assumed there was some problem of why it was defined that way but not
realizing memory usage was the reason.
It was defined to keep the memory usage down as you see that it is now
almost a 3x memory increase for all wi::wide_int.
I do think r12-979-g782e57f2c09 should be reverted with an added
comment on saying defining MAX_BITSIZE_MODE_ANY_INT here is to
decrease the memory footprint.


Thanks,
Andrew Pinski

>
> Qing

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05  6:05 ` Andrew Pinski
@ 2021-11-05  6:53   ` Jakub Jelinek
  2021-11-05 10:01     ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2021-11-05  6:53 UTC (permalink / raw)
  To: Andrew Pinski, hjl.tools; +Cc: Qing Zhao, gcc-patches Nick Alcock via

On Thu, Nov 04, 2021 at 11:05:35PM -0700, Andrew Pinski via Gcc-patches wrote:
> > I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
> >
> > For gcc11:
> >
> > wide int max elts =3
> >
> > For gcc12:
> >
> > wide int max elts =9
> >
> > Does anyone know what’s the reason for this difference?
> >
> > Thanks a lot for any help.
> 
> Yes originally, the x86 backend only used OI and XI modes for vectors
> during data movement.
> This changed with r10-5741-gc57b4c22089 which added the use of OI mode
> for TImode adding with overflow and then MAX_BITSIZE_MODE_ANY_INT
> changed from 128 to 160 (in r10-6178-gc124b345e46078) to fix the ICE
> introduced by that change .
> And then with r12-979-g782e57f2c09 removed the define of
> MAX_BITSIZE_MODE_ANY_INT.
> Now what was not mentioned in r12-979-g782e57f2c09 (or before) of why
> MAX_BITSIZE_MODE_ANY_INT was defined in the first place for x86. HJL
> assumed there was some problem of why it was defined that way but not
> realizing memory usage was the reason.
> It was defined to keep the memory usage down as you see that it is now
> almost a 3x memory increase for all wi::wide_int.
> I do think r12-979-g782e57f2c09 should be reverted with an added
> comment on saying defining MAX_BITSIZE_MODE_ANY_INT here is to
> decrease the memory footprint.

I completely agree.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05  6:53   ` Jakub Jelinek
@ 2021-11-05 10:01     ` Richard Biener
  2021-11-05 12:25       ` H.J. Lu
  2021-11-05 16:11       ` Qing Zhao
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Biener @ 2021-11-05 10:01 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andrew Pinski, H. J. Lu, gcc-patches Nick Alcock via

On Fri, Nov 5, 2021 at 7:54 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Nov 04, 2021 at 11:05:35PM -0700, Andrew Pinski via Gcc-patches wrote:
> > > I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
> > >
> > > For gcc11:
> > >
> > > wide int max elts =3
> > >
> > > For gcc12:
> > >
> > > wide int max elts =9
> > >
> > > Does anyone know what’s the reason for this difference?
> > >
> > > Thanks a lot for any help.
> >
> > Yes originally, the x86 backend only used OI and XI modes for vectors
> > during data movement.
> > This changed with r10-5741-gc57b4c22089 which added the use of OI mode
> > for TImode adding with overflow and then MAX_BITSIZE_MODE_ANY_INT
> > changed from 128 to 160 (in r10-6178-gc124b345e46078) to fix the ICE
> > introduced by that change .
> > And then with r12-979-g782e57f2c09 removed the define of
> > MAX_BITSIZE_MODE_ANY_INT.
> > Now what was not mentioned in r12-979-g782e57f2c09 (or before) of why
> > MAX_BITSIZE_MODE_ANY_INT was defined in the first place for x86. HJL
> > assumed there was some problem of why it was defined that way but not
> > realizing memory usage was the reason.
> > It was defined to keep the memory usage down as you see that it is now
> > almost a 3x memory increase for all wi::wide_int.
> > I do think r12-979-g782e57f2c09 should be reverted with an added
> > comment on saying defining MAX_BITSIZE_MODE_ANY_INT here is to
> > decrease the memory footprint.
>
> I completely agree.

Do we have permanent objects embedding wide[st]_int?  I know of
class loop and loop_bound.  Btw, there are other targets with large
integer modes (aarch64 with XImode) and not defining
MAX_BITSIZE_MODE_ANY_INT

Richard.

>         Jakub
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05 10:01     ` Richard Biener
@ 2021-11-05 12:25       ` H.J. Lu
  2021-11-05 16:11       ` Qing Zhao
  1 sibling, 0 replies; 18+ messages in thread
From: H.J. Lu @ 2021-11-05 12:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Andrew Pinski, gcc-patches Nick Alcock via

On Fri, Nov 5, 2021 at 3:01 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Nov 5, 2021 at 7:54 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Nov 04, 2021 at 11:05:35PM -0700, Andrew Pinski via Gcc-patches wrote:
> > > > I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
> > > >
> > > > For gcc11:
> > > >
> > > > wide int max elts =3
> > > >
> > > > For gcc12:
> > > >
> > > > wide int max elts =9
> > > >
> > > > Does anyone know what’s the reason for this difference?
> > > >
> > > > Thanks a lot for any help.
> > >
> > > Yes originally, the x86 backend only used OI and XI modes for vectors
> > > during data movement.
> > > This changed with r10-5741-gc57b4c22089 which added the use of OI mode
> > > for TImode adding with overflow and then MAX_BITSIZE_MODE_ANY_INT
> > > changed from 128 to 160 (in r10-6178-gc124b345e46078) to fix the ICE
> > > introduced by that change .
> > > And then with r12-979-g782e57f2c09 removed the define of
> > > MAX_BITSIZE_MODE_ANY_INT.
> > > Now what was not mentioned in r12-979-g782e57f2c09 (or before) of why
> > > MAX_BITSIZE_MODE_ANY_INT was defined in the first place for x86. HJL
> > > assumed there was some problem of why it was defined that way but not
> > > realizing memory usage was the reason.
> > > It was defined to keep the memory usage down as you see that it is now
> > > almost a 3x memory increase for all wi::wide_int.
> > > I do think r12-979-g782e57f2c09 should be reverted with an added
> > > comment on saying defining MAX_BITSIZE_MODE_ANY_INT here is to
> > > decrease the memory footprint.
> >
> > I completely agree.
>
> Do we have permanent objects embedding wide[st]_int?  I know of
> class loop and loop_bound.  Btw, there are other targets with large
> integer modes (aarch64 with XImode) and not defining
> MAX_BITSIZE_MODE_ANY_INT
>

MAX_BITSIZE_MODE_ANY_INT was removed so that YMM and ZMM
registers can be used for by_pieces operations.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05 10:01     ` Richard Biener
  2021-11-05 12:25       ` H.J. Lu
@ 2021-11-05 16:11       ` Qing Zhao
  2021-11-05 16:17         ` Jakub Jelinek
  1 sibling, 1 reply; 18+ messages in thread
From: Qing Zhao @ 2021-11-05 16:11 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

Thanks all for the information.
Based on the information so far, my understanding is that we cannot revert r12-979-g782e57f2c09 
Since it’s for enabling YMM and ZMM registers to be used for by_pieces operations on X86.
Let me know if I miss anything here.

FYI. 

This issue was found during my work to back port all the patches of -ftrivial-auto-var-init so far from 
GCC12 to GCC11. 

The following small testing case (_Complex long double)

_Complex long double result;

_Complex long double foo()
{
   _Complex long double temp3;

  result = temp3;
  return result;
}

Failed with -ftrivial-auto-var-init=pattern with GCC11 at the following Line 3087 at call to “build_nonstandard_integer_type”
: (expand_DEFERRED_INIT):

3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
3077           && tree_fits_uhwi_p (var_size)
3078           && (init_type == AUTO_INIT_PATTERN
3079               || !is_gimple_reg_type (var_type))
3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
3081                                 0).exists ())
3082         {
3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
3085           memset (buf, (init_type == AUTO_INIT_PATTERN
3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
3087           tree itype = build_nonstandard_integer_type
3088                          (total_bytes * BITS_PER_UNIT, 1);

The exact failing point is at function “set_min_and_max_values_for_integral_type”:

2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);

For _Complex long double,  “precision” is 256.  
In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512. 
As a result, the above assertion failed on GCC11. 

I am wondering what’s the best fix for this issue in gcc11? 

Qing


> On Nov 5, 2021, at 5:01 AM, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> On Fri, Nov 5, 2021 at 7:54 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> 
>> On Thu, Nov 04, 2021 at 11:05:35PM -0700, Andrew Pinski via Gcc-patches wrote:
>>>> I noticed that the macro “WIDE_INT_MAX_ELTS” has different values in GCC11 and GCC12 (on the same X86 machine)
>>>> 
>>>> For gcc11:
>>>> 
>>>> wide int max elts =3
>>>> 
>>>> For gcc12:
>>>> 
>>>> wide int max elts =9
>>>> 
>>>> Does anyone know what’s the reason for this difference?
>>>> 
>>>> Thanks a lot for any help.
>>> 
>>> Yes originally, the x86 backend only used OI and XI modes for vectors
>>> during data movement.
>>> This changed with r10-5741-gc57b4c22089 which added the use of OI mode
>>> for TImode adding with overflow and then MAX_BITSIZE_MODE_ANY_INT
>>> changed from 128 to 160 (in r10-6178-gc124b345e46078) to fix the ICE
>>> introduced by that change .
>>> And then with r12-979-g782e57f2c09 removed the define of
>>> MAX_BITSIZE_MODE_ANY_INT.
>>> Now what was not mentioned in r12-979-g782e57f2c09 (or before) of why
>>> MAX_BITSIZE_MODE_ANY_INT was defined in the first place for x86. HJL
>>> assumed there was some problem of why it was defined that way but not
>>> realizing memory usage was the reason.
>>> It was defined to keep the memory usage down as you see that it is now
>>> almost a 3x memory increase for all wi::wide_int.
>>> I do think r12-979-g782e57f2c09 should be reverted with an added
>>> comment on saying defining MAX_BITSIZE_MODE_ANY_INT here is to
>>> decrease the memory footprint.
>> 
>> I completely agree.
> 
> Do we have permanent objects embedding wide[st]_int?  I know of
> class loop and loop_bound.  Btw, there are other targets with large
> integer modes (aarch64 with XImode) and not defining
> MAX_BITSIZE_MODE_ANY_INT
> 
> Richard.
> 
>>        Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05 16:11       ` Qing Zhao
@ 2021-11-05 16:17         ` Jakub Jelinek
  2021-11-05 17:37           ` Qing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2021-11-05 16:17 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> 3077           && tree_fits_uhwi_p (var_size)
> 3078           && (init_type == AUTO_INIT_PATTERN
> 3079               || !is_gimple_reg_type (var_type))
> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> 3081                                 0).exists ())
> 3082         {
> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
> 3087           tree itype = build_nonstandard_integer_type
> 3088                          (total_bytes * BITS_PER_UNIT, 1);
> 
> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
> 
> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> 
> For _Complex long double,  “precision” is 256.  
> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512. 
> As a result, the above assertion failed on GCC11. 
> 
> I am wondering what’s the best fix for this issue in gcc11? 

Even for gcc 12 the above is wrong, you can't blindly assume that
build_nonstandard_integer_type will work for arbitrary precisions,
and even if it works that it will actually work.
The fact that such a mode exist is one thing, but
targetm.scalar_mode_supported_p should be tested for whether the mode
is actually supported.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05 16:17         ` Jakub Jelinek
@ 2021-11-05 17:37           ` Qing Zhao
  2021-11-06  9:56             ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Qing Zhao @ 2021-11-05 17:37 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski



> On Nov 5, 2021, at 11:17 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
>> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>> 3077           && tree_fits_uhwi_p (var_size)
>> 3078           && (init_type == AUTO_INIT_PATTERN
>> 3079               || !is_gimple_reg_type (var_type))
>> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>> 3081                                 0).exists ())
>> 3082         {
>> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
>> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
>> 3087           tree itype = build_nonstandard_integer_type
>> 3088                          (total_bytes * BITS_PER_UNIT, 1);
>> 
>> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
>> 
>> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
>> 
>> For _Complex long double,  “precision” is 256.  
>> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512. 
>> As a result, the above assertion failed on GCC11. 
>> 
>> I am wondering what’s the best fix for this issue in gcc11? 
> 
> Even for gcc 12 the above is wrong, you can't blindly assume that
> build_nonstandard_integer_type will work for arbitrary precisions,
> and even if it works that it will actually work.
> The fact that such a mode exist is one thing, but
> targetm.scalar_mode_supported_p should be tested for whether the mode
> is actually supported.

You mean “int_mode_for_size().exists()” is not enough to make sure “build_nonstandard_integer_type” to be valid?
We should add “targetm.scalar_mode_supported_p” too ?

Qing
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-05 17:37           ` Qing Zhao
@ 2021-11-06  9:56             ` Jakub Jelinek
  2021-11-08  8:41               ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2021-11-06  9:56 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Fri, Nov 05, 2021 at 05:37:25PM +0000, Qing Zhao wrote:
> > On Nov 5, 2021, at 11:17 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > 
> > On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
> >> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >> 3077           && tree_fits_uhwi_p (var_size)
> >> 3078           && (init_type == AUTO_INIT_PATTERN
> >> 3079               || !is_gimple_reg_type (var_type))
> >> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> >> 3081                                 0).exists ())
> >> 3082         {
> >> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
> >> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
> >> 3087           tree itype = build_nonstandard_integer_type
> >> 3088                          (total_bytes * BITS_PER_UNIT, 1);
> >> 
> >> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
> >> 
> >> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> >> 
> >> For _Complex long double,  “precision” is 256.  
> >> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512. 
> >> As a result, the above assertion failed on GCC11. 
> >> 
> >> I am wondering what’s the best fix for this issue in gcc11? 
> > 
> > Even for gcc 12 the above is wrong, you can't blindly assume that
> > build_nonstandard_integer_type will work for arbitrary precisions,
> > and even if it works that it will actually work.
> > The fact that such a mode exist is one thing, but
> > targetm.scalar_mode_supported_p should be tested for whether the mode
> > is actually supported.
> 
> You mean “int_mode_for_size().exists()” is not enough to make sure
> “build_nonstandard_integer_type” to be valid?  We should add
> “targetm.scalar_mode_supported_p” too ?

Yeah.  The former says whether the backend has that mode at all.
But some modes may be there only in some specific patterns but
without support for mov, add, etc.  Only for
targetm.scalar_mode_supported_p modes the backend guarantees that
one can use them e.g. in mode attribute and can expect expansion
to expand everything with that mode that is needed in some way.
E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
support __int128_t type, etc.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-06  9:56             ` Jakub Jelinek
@ 2021-11-08  8:41               ` Richard Biener
  2021-11-08 23:47                 ` Qing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2021-11-08  8:41 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Qing Zhao, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Sat, Nov 6, 2021 at 10:56 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Nov 05, 2021 at 05:37:25PM +0000, Qing Zhao wrote:
> > > On Nov 5, 2021, at 11:17 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
> > >> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> > >> 3077           && tree_fits_uhwi_p (var_size)
> > >> 3078           && (init_type == AUTO_INIT_PATTERN
> > >> 3079               || !is_gimple_reg_type (var_type))
> > >> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> > >> 3081                                 0).exists ())
> > >> 3082         {
> > >> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> > >> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> > >> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
> > >> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
> > >> 3087           tree itype = build_nonstandard_integer_type
> > >> 3088                          (total_bytes * BITS_PER_UNIT, 1);
> > >>
> > >> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
> > >>
> > >> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> > >>
> > >> For _Complex long double,  “precision” is 256.
> > >> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512.
> > >> As a result, the above assertion failed on GCC11.
> > >>
> > >> I am wondering what’s the best fix for this issue in gcc11?
> > >
> > > Even for gcc 12 the above is wrong, you can't blindly assume that
> > > build_nonstandard_integer_type will work for arbitrary precisions,
> > > and even if it works that it will actually work.
> > > The fact that such a mode exist is one thing, but
> > > targetm.scalar_mode_supported_p should be tested for whether the mode
> > > is actually supported.
> >
> > You mean “int_mode_for_size().exists()” is not enough to make sure
> > “build_nonstandard_integer_type” to be valid?  We should add
> > “targetm.scalar_mode_supported_p” too ?
>
> Yeah.  The former says whether the backend has that mode at all.
> But some modes may be there only in some specific patterns but
> without support for mov, add, etc.  Only for
> targetm.scalar_mode_supported_p modes the backend guarantees that
> one can use them e.g. in mode attribute and can expect expansion
> to expand everything with that mode that is needed in some way.
> E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
> support __int128_t type, etc.

The memcpy folding code now checks

              scalar_int_mode mode;
              if (int_mode_for_size (ilen * 8, 0).exists (&mode)
                  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
                  && have_insn_for (SET, mode)

thus specifically only have_insn_for (SET, mode), which I guess is
good enough for this case as well?

>         Jakub
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-08  8:41               ` Richard Biener
@ 2021-11-08 23:47                 ` Qing Zhao
  2021-11-09  7:13                   ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Qing Zhao @ 2021-11-08 23:47 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek
  Cc: gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu, andrew Pinski

Hi, I tried both the following patches:

Patch1:

[opc@qinzhao-ol8u3-x86 gcc]$ git diff 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..ca49d2b4514 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
       /* If this variable is in a register use expand_assignment.
         For boolean scalars force zero-init.  */
       tree init;
+      scalar_int_mode var_mode;
       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
          && tree_fits_uhwi_p (var_size)
          && (init_type == AUTO_INIT_PATTERN
              || !is_gimple_reg_type (var_type))
          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
-                               0).exists ())
+                               0).exists (&var_mode)
+         && targetm.scalar_mode_supported_p (var_mode))
        {
          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);

AND

Patch2:
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..7f129655926 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
       /* If this variable is in a register use expand_assignment.
         For boolean scalars force zero-init.  */
       tree init;
+      scalar_int_mode var_mode;
       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
          && tree_fits_uhwi_p (var_size)
          && (init_type == AUTO_INIT_PATTERN
              || !is_gimple_reg_type (var_type))
          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
-                               0).exists ())
+                               0).exists (&var_mode)
+         && have_insn_for (SET, var_mode))
        {
          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);

Have the same effect:

1. Resolved the ICE in gcc11;
2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.

Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.

Thanks.

Qing

> On Nov 8, 2021, at 2:41 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Sat, Nov 6, 2021 at 10:56 AM Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Fri, Nov 05, 2021 at 05:37:25PM +0000, Qing Zhao wrote:
>>>> On Nov 5, 2021, at 11:17 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> 
>>>> On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
>>>>> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>>>> 3077           && tree_fits_uhwi_p (var_size)
>>>>> 3078           && (init_type == AUTO_INIT_PATTERN
>>>>> 3079               || !is_gimple_reg_type (var_type))
>>>>> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>>>>> 3081                                 0).exists ())
>>>>> 3082         {
>>>>> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>>>>> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>>>> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
>>>>> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
>>>>> 3087           tree itype = build_nonstandard_integer_type
>>>>> 3088                          (total_bytes * BITS_PER_UNIT, 1);
>>>>> 
>>>>> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
>>>>> 
>>>>> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
>>>>> 
>>>>> For _Complex long double,  “precision” is 256.
>>>>> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512.
>>>>> As a result, the above assertion failed on GCC11.
>>>>> 
>>>>> I am wondering what’s the best fix for this issue in gcc11?
>>>> 
>>>> Even for gcc 12 the above is wrong, you can't blindly assume that
>>>> build_nonstandard_integer_type will work for arbitrary precisions,
>>>> and even if it works that it will actually work.
>>>> The fact that such a mode exist is one thing, but
>>>> targetm.scalar_mode_supported_p should be tested for whether the mode
>>>> is actually supported.
>>> 
>>> You mean “int_mode_for_size().exists()” is not enough to make sure
>>> “build_nonstandard_integer_type” to be valid?  We should add
>>> “targetm.scalar_mode_supported_p” too ?
>> 
>> Yeah.  The former says whether the backend has that mode at all.
>> But some modes may be there only in some specific patterns but
>> without support for mov, add, etc.  Only for
>> targetm.scalar_mode_supported_p modes the backend guarantees that
>> one can use them e.g. in mode attribute and can expect expansion
>> to expand everything with that mode that is needed in some way.
>> E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
>> support __int128_t type, etc.
> 
> The memcpy folding code now checks
> 
>              scalar_int_mode mode;
>              if (int_mode_for_size (ilen * 8, 0).exists (&mode)
>                  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
>                  && have_insn_for (SET, mode)
> 
> thus specifically only have_insn_for (SET, mode), which I guess is
> good enough for this case as well?
> 
>>        Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-08 23:47                 ` Qing Zhao
@ 2021-11-09  7:13                   ` Richard Biener
  2021-11-09  9:10                     ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2021-11-09  7:13 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Jakub Jelinek, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Tue, Nov 9, 2021 at 12:47 AM Qing Zhao <qing.zhao@oracle.com> wrote:
>
> Hi, I tried both the following patches:
>
> Patch1:
>
> [opc@qinzhao-ol8u3-x86 gcc]$ git diff
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 0cba95411a6..ca49d2b4514 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>        /* If this variable is in a register use expand_assignment.
>          For boolean scalars force zero-init.  */
>        tree init;
> +      scalar_int_mode var_mode;
>        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>           && tree_fits_uhwi_p (var_size)
>           && (init_type == AUTO_INIT_PATTERN
>               || !is_gimple_reg_type (var_type))
>           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> -                               0).exists ())
> +                               0).exists (&var_mode)
> +         && targetm.scalar_mode_supported_p (var_mode))
>         {
>           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>
> AND
>
> Patch2:
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 0cba95411a6..7f129655926 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>        /* If this variable is in a register use expand_assignment.
>          For boolean scalars force zero-init.  */
>        tree init;
> +      scalar_int_mode var_mode;
>        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>           && tree_fits_uhwi_p (var_size)
>           && (init_type == AUTO_INIT_PATTERN
>               || !is_gimple_reg_type (var_type))
>           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> -                               0).exists ())
> +                               0).exists (&var_mode)
> +         && have_insn_for (SET, var_mode))
>         {
>           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>
> Have the same effect:
>
> 1. Resolved the ICE in gcc11;
> 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
>
> Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.

I think zero-initialization is OK, but I'd choose Patch2 for
consistency with what we do in the memcpy
folding.

Richard.

> Thanks.
>
> Qing
>
> > On Nov 8, 2021, at 2:41 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Sat, Nov 6, 2021 at 10:56 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >>
> >> On Fri, Nov 05, 2021 at 05:37:25PM +0000, Qing Zhao wrote:
> >>>> On Nov 5, 2021, at 11:17 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> >>>>
> >>>> On Fri, Nov 05, 2021 at 04:11:36PM +0000, Qing Zhao wrote:
> >>>>> 3076       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >>>>> 3077           && tree_fits_uhwi_p (var_size)
> >>>>> 3078           && (init_type == AUTO_INIT_PATTERN
> >>>>> 3079               || !is_gimple_reg_type (var_type))
> >>>>> 3080           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> >>>>> 3081                                 0).exists ())
> >>>>> 3082         {
> >>>>> 3083           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >>>>> 3084           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >>>>> 3085           memset (buf, (init_type == AUTO_INIT_PATTERN
> >>>>> 3086                         ? INIT_PATTERN_VALUE : 0), total_bytes);
> >>>>> 3087           tree itype = build_nonstandard_integer_type
> >>>>> 3088                          (total_bytes * BITS_PER_UNIT, 1);
> >>>>>
> >>>>> The exact failing point is at function “set_min_and_max_values_for_integral_type”:
> >>>>>
> >>>>> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> >>>>>
> >>>>> For _Complex long double,  “precision” is 256.
> >>>>> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512.
> >>>>> As a result, the above assertion failed on GCC11.
> >>>>>
> >>>>> I am wondering what’s the best fix for this issue in gcc11?
> >>>>
> >>>> Even for gcc 12 the above is wrong, you can't blindly assume that
> >>>> build_nonstandard_integer_type will work for arbitrary precisions,
> >>>> and even if it works that it will actually work.
> >>>> The fact that such a mode exist is one thing, but
> >>>> targetm.scalar_mode_supported_p should be tested for whether the mode
> >>>> is actually supported.
> >>>
> >>> You mean “int_mode_for_size().exists()” is not enough to make sure
> >>> “build_nonstandard_integer_type” to be valid?  We should add
> >>> “targetm.scalar_mode_supported_p” too ?
> >>
> >> Yeah.  The former says whether the backend has that mode at all.
> >> But some modes may be there only in some specific patterns but
> >> without support for mov, add, etc.  Only for
> >> targetm.scalar_mode_supported_p modes the backend guarantees that
> >> one can use them e.g. in mode attribute and can expect expansion
> >> to expand everything with that mode that is needed in some way.
> >> E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
> >> support __int128_t type, etc.
> >
> > The memcpy folding code now checks
> >
> >              scalar_int_mode mode;
> >              if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> >                  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> >                  && have_insn_for (SET, mode)
> >
> > thus specifically only have_insn_for (SET, mode), which I guess is
> > good enough for this case as well?
> >
> >>        Jakub
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-09  7:13                   ` Richard Biener
@ 2021-11-09  9:10                     ` Jakub Jelinek
  2021-11-09 10:44                       ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2021-11-09  9:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Tue, Nov 09, 2021 at 08:13:57AM +0100, Richard Biener wrote:
> > Hi, I tried both the following patches:
> >
> > Patch1:
> >
> > [opc@qinzhao-ol8u3-x86 gcc]$ git diff
> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > index 0cba95411a6..ca49d2b4514 100644
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >        /* If this variable is in a register use expand_assignment.
> >          For boolean scalars force zero-init.  */
> >        tree init;
> > +      scalar_int_mode var_mode;
> >        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >           && tree_fits_uhwi_p (var_size)
> >           && (init_type == AUTO_INIT_PATTERN
> >               || !is_gimple_reg_type (var_type))
> >           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> > -                               0).exists ())
> > +                               0).exists (&var_mode)
> > +         && targetm.scalar_mode_supported_p (var_mode))
> >         {
> >           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >
> > AND
> >
> > Patch2:
> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > index 0cba95411a6..7f129655926 100644
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >        /* If this variable is in a register use expand_assignment.
> >          For boolean scalars force zero-init.  */
> >        tree init;
> > +      scalar_int_mode var_mode;
> >        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >           && tree_fits_uhwi_p (var_size)
> >           && (init_type == AUTO_INIT_PATTERN
> >               || !is_gimple_reg_type (var_type))
> >           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> > -                               0).exists ())
> > +                               0).exists (&var_mode)
> > +         && have_insn_for (SET, var_mode))
> >         {
> >           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >
> > Have the same effect:
> >
> > 1. Resolved the ICE in gcc11;
> > 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
> >
> > Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.
> 
> I think zero-initialization is OK, but I'd choose Patch2 for
> consistency with what we do in the memcpy
> folding.

Note, I think the code leaks memory (buf is never freed) and
should be allocated using
  unsigned char *buf = XNEWVEC (unsigned char *, total_bytes);
and deallocated with XDELETEVEC (buf);
If lhs is SSA_NAME, I think it would be easiest to use
native_interpret_expr directly instead of finding some integral
mode, after memset just native_interpret_expr.
Should be guarded by
BITS_PER_UNIT == 8 && CHAR_BIT == 8 && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
but I think the current code assumes at least that BITS_PER_UNIT == CHAR_BIT
too anyway.
That way you get directly a constant init instead of hopping through
some integral mode which might not exist at all.
If lhs is not SSA_NAME but var_type has COMPLEX_TYPE or VECTOR_TYPE, there
is always an option to find var_mode only for the element type (i.e.
TREE_TYPE (var_type), use var_size for the element size too, and after
you compute init fold_build2 (COMPLEX_EXPR, ) or build_vector_from_val.
For VECTOR_TYPE one would need to check if the vector mode is supported,
sure (but if it isn't, it will go the BUILT_IN_MEMSET way, I'm pretty sure).

Also,
      tree value = (init_type == AUTO_INIT_PATTERN) ?
                    build_int_cst (integer_type_node,
                                   INIT_PATTERN_VALUE) :
                    integer_zero_node;
a few lines before has bad formatting, both ? and : should be at the start
of next lines rather than on end of the previous ones.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-09  9:10                     ` Jakub Jelinek
@ 2021-11-09 10:44                       ` Richard Biener
  2021-11-09 17:19                         ` Qing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2021-11-09 10:44 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Qing Zhao, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Tue, Nov 9, 2021 at 10:10 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 09, 2021 at 08:13:57AM +0100, Richard Biener wrote:
> > > Hi, I tried both the following patches:
> > >
> > > Patch1:
> > >
> > > [opc@qinzhao-ol8u3-x86 gcc]$ git diff
> > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > index 0cba95411a6..ca49d2b4514 100644
> > > --- a/gcc/internal-fn.c
> > > +++ b/gcc/internal-fn.c
> > > @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> > >        /* If this variable is in a register use expand_assignment.
> > >          For boolean scalars force zero-init.  */
> > >        tree init;
> > > +      scalar_int_mode var_mode;
> > >        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> > >           && tree_fits_uhwi_p (var_size)
> > >           && (init_type == AUTO_INIT_PATTERN
> > >               || !is_gimple_reg_type (var_type))
> > >           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> > > -                               0).exists ())
> > > +                               0).exists (&var_mode)
> > > +         && targetm.scalar_mode_supported_p (var_mode))
> > >         {
> > >           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> > >           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> > >
> > > AND
> > >
> > > Patch2:
> > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > index 0cba95411a6..7f129655926 100644
> > > --- a/gcc/internal-fn.c
> > > +++ b/gcc/internal-fn.c
> > > @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> > >        /* If this variable is in a register use expand_assignment.
> > >          For boolean scalars force zero-init.  */
> > >        tree init;
> > > +      scalar_int_mode var_mode;
> > >        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> > >           && tree_fits_uhwi_p (var_size)
> > >           && (init_type == AUTO_INIT_PATTERN
> > >               || !is_gimple_reg_type (var_type))
> > >           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> > > -                               0).exists ())
> > > +                               0).exists (&var_mode)
> > > +         && have_insn_for (SET, var_mode))
> > >         {
> > >           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> > >           unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> > >
> > > Have the same effect:
> > >
> > > 1. Resolved the ICE in gcc11;
> > > 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
> > >
> > > Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.
> >
> > I think zero-initialization is OK, but I'd choose Patch2 for
> > consistency with what we do in the memcpy
> > folding.
>
> Note, I think the code leaks memory (buf is never freed) and
> should be allocated using
>   unsigned char *buf = XNEWVEC (unsigned char *, total_bytes);
> and deallocated with XDELETEVEC (buf);

Oops.  I think the intention was to use XALLOCAVEC

> If lhs is SSA_NAME, I think it would be easiest to use
> native_interpret_expr directly instead of finding some integral
> mode, after memset just native_interpret_expr.

I've removed the native_interpret_expr path, I think it would be best to not
rely on expand_assignment for the case of a register destination.  The
intent is to do an integer move into the destination and avoid issues with
things like XFmode or other modes with padding (and possibly trapping loads).
So it should become some

 (set (subreg:IntMode ...) (const_wide_int ...))

but my RTL fu is too weak to try and so we ended up with expand_assignment
which doesn't like V_C_E on SSA names on the LHS (well, because that's
not valid GIMPLE...).

> Should be guarded by
> BITS_PER_UNIT == 8 && CHAR_BIT == 8 && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> but I think the current code assumes at least that BITS_PER_UNIT == CHAR_BIT
> too anyway.
> That way you get directly a constant init instead of hopping through
> some integral mode which might not exist at all.
> If lhs is not SSA_NAME but var_type has COMPLEX_TYPE or VECTOR_TYPE, there
> is always an option to find var_mode only for the element type (i.e.
> TREE_TYPE (var_type), use var_size for the element size too, and after
> you compute init fold_build2 (COMPLEX_EXPR, ) or build_vector_from_val.
> For VECTOR_TYPE one would need to check if the vector mode is supported,
> sure (but if it isn't, it will go the BUILT_IN_MEMSET way, I'm pretty sure).
>
> Also,
>       tree value = (init_type == AUTO_INIT_PATTERN) ?
>                     build_int_cst (integer_type_node,
>                                    INIT_PATTERN_VALUE) :
>                     integer_zero_node;
> a few lines before has bad formatting, both ? and : should be at the start
> of next lines rather than on end of the previous ones.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-09 10:44                       ` Richard Biener
@ 2021-11-09 17:19                         ` Qing Zhao
  2021-11-10  8:37                           ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Qing Zhao @ 2021-11-09 17:19 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek
  Cc: gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu, andrew Pinski

So, based on the discussion so far,  is the following patch good to go?

Let me know if you have more comments on the following patch:

(At the same time, I am testing this patch on both x86 and aarch64)

thanks.

Qing

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..e8fd16b9c21 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3059,10 +3059,10 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
       mark_addressable (lhs);
       tree var_addr = build_fold_addr_expr (lhs);
 
-      tree value = (init_type == AUTO_INIT_PATTERN) ?
-		    build_int_cst (integer_type_node,
-				   INIT_PATTERN_VALUE) :
-		    integer_zero_node;
+      tree value = (init_type == AUTO_INIT_PATTERN)
+		    ? build_int_cst (integer_type_node,
+				     INIT_PATTERN_VALUE)
+		    : integer_zero_node;
       tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
 				     3, var_addr, value, var_size);
       /* Expand this memset call.  */
@@ -3073,15 +3073,17 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
       /* If this variable is in a register use expand_assignment.
 	 For boolean scalars force zero-init.  */
       tree init;
+      scalar_int_mode var_mode;
       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
 	  && tree_fits_uhwi_p (var_size)
 	  && (init_type == AUTO_INIT_PATTERN
 	      || !is_gimple_reg_type (var_type))
 	  && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
-				0).exists ())
+				0).exists (&var_mode)
+	  && have_insn_for (SET, var_mode))
 	{
 	  unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
-	  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
+	  unsigned char *buf = XALLOCAVEC (unsigned char, total_bytes);
 	  memset (buf, (init_type == AUTO_INIT_PATTERN
 			? INIT_PATTERN_VALUE : 0), total_bytes);
 	  tree itype = build_nonstandard_integer_type
diff --git a/gcc/testsuite/gcc.target/i386/auto-init-6.c b/gcc/testsuite/gcc.target/i386/auto-init-6.c
index 339f8bc2966..e53385f0eb7 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-6.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-6.c
@@ -1,4 +1,6 @@
 /* Verify pattern initialization for complex type automatic variables.  */
+/* Note, _Complex long double is initialized to zeroes due to the current
+   implemenation limitation.  */
 /* { dg-do compile } */
 /* { dg-options "-ftrivial-auto-var-init=pattern -march=x86-64 -mtune=generic -msse" } */
 
@@ -15,6 +17,6 @@ _Complex long double foo()
   return result;
 }
 
-/* { dg-final { scan-assembler-times "long\t-16843010" 10  { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "long\t-16843010" 6  { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "long\t0" 8  { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "long\t-16843010" 6  } } */
 



> On Nov 9, 2021, at 4:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Nov 9, 2021 at 10:10 AM Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Tue, Nov 09, 2021 at 08:13:57AM +0100, Richard Biener wrote:
>>>> Hi, I tried both the following patches:
>>>> 
>>>> Patch1:
>>>> 
>>>> [opc@qinzhao-ol8u3-x86 gcc]$ git diff
>>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>>>> index 0cba95411a6..ca49d2b4514 100644
>>>> --- a/gcc/internal-fn.c
>>>> +++ b/gcc/internal-fn.c
>>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>       /* If this variable is in a register use expand_assignment.
>>>>         For boolean scalars force zero-init.  */
>>>>       tree init;
>>>> +      scalar_int_mode var_mode;
>>>>       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>>>          && tree_fits_uhwi_p (var_size)
>>>>          && (init_type == AUTO_INIT_PATTERN
>>>>              || !is_gimple_reg_type (var_type))
>>>>          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>>>> -                               0).exists ())
>>>> +                               0).exists (&var_mode)
>>>> +         && targetm.scalar_mode_supported_p (var_mode))
>>>>        {
>>>>          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>>>>          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>>> 
>>>> AND
>>>> 
>>>> Patch2:
>>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>>>> index 0cba95411a6..7f129655926 100644
>>>> --- a/gcc/internal-fn.c
>>>> +++ b/gcc/internal-fn.c
>>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>       /* If this variable is in a register use expand_assignment.
>>>>         For boolean scalars force zero-init.  */
>>>>       tree init;
>>>> +      scalar_int_mode var_mode;
>>>>       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>>>          && tree_fits_uhwi_p (var_size)
>>>>          && (init_type == AUTO_INIT_PATTERN
>>>>              || !is_gimple_reg_type (var_type))
>>>>          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>>>> -                               0).exists ())
>>>> +                               0).exists (&var_mode)
>>>> +         && have_insn_for (SET, var_mode))
>>>>        {
>>>>          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>>>>          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>>> 
>>>> Have the same effect:
>>>> 
>>>> 1. Resolved the ICE in gcc11;
>>>> 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
>>>> 
>>>> Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.
>>> 
>>> I think zero-initialization is OK, but I'd choose Patch2 for
>>> consistency with what we do in the memcpy
>>> folding.
>> 
>> Note, I think the code leaks memory (buf is never freed) and
>> should be allocated using
>>  unsigned char *buf = XNEWVEC (unsigned char *, total_bytes);
>> and deallocated with XDELETEVEC (buf);
> 
> Oops.  I think the intention was to use XALLOCAVEC
> 
>> If lhs is SSA_NAME, I think it would be easiest to use
>> native_interpret_expr directly instead of finding some integral
>> mode, after memset just native_interpret_expr.
> 
> I've removed the native_interpret_expr path, I think it would be best to not
> rely on expand_assignment for the case of a register destination.  The
> intent is to do an integer move into the destination and avoid issues with
> things like XFmode or other modes with padding (and possibly trapping loads).
> So it should become some
> 
> (set (subreg:IntMode ...) (const_wide_int ...))
> 
> but my RTL fu is too weak to try and so we ended up with expand_assignment
> which doesn't like V_C_E on SSA names on the LHS (well, because that's
> not valid GIMPLE...).
> 
>> Should be guarded by
>> BITS_PER_UNIT == 8 && CHAR_BIT == 8 && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
>> but I think the current code assumes at least that BITS_PER_UNIT == CHAR_BIT
>> too anyway.
>> That way you get directly a constant init instead of hopping through
>> some integral mode which might not exist at all.
>> If lhs is not SSA_NAME but var_type has COMPLEX_TYPE or VECTOR_TYPE, there
>> is always an option to find var_mode only for the element type (i.e.
>> TREE_TYPE (var_type), use var_size for the element size too, and after
>> you compute init fold_build2 (COMPLEX_EXPR, ) or build_vector_from_val.
>> For VECTOR_TYPE one would need to check if the vector mode is supported,
>> sure (but if it isn't, it will go the BUILT_IN_MEMSET way, I'm pretty sure).
>> 
>> Also,
>>      tree value = (init_type == AUTO_INIT_PATTERN) ?
>>                    build_int_cst (integer_type_node,
>>                                   INIT_PATTERN_VALUE) :
>>                    integer_zero_node;
>> a few lines before has bad formatting, both ? and : should be at the start
>> of next lines rather than on end of the previous ones.
>> 
>>        Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-09 17:19                         ` Qing Zhao
@ 2021-11-10  8:37                           ` Richard Biener
  2021-11-10 18:02                             ` Qing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2021-11-10  8:37 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Jakub Jelinek, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

On Tue, Nov 9, 2021 at 6:48 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
> So, based on the discussion so far,  is the following patch good to go?

OK.

Thanks,
Richard.

> Let me know if you have more comments on the following patch:
>
> (At the same time, I am testing this patch on both x86 and aarch64)
>
> thanks.
>
> Qing
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 0cba95411a6..e8fd16b9c21 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3059,10 +3059,10 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>        mark_addressable (lhs);
>        tree var_addr = build_fold_addr_expr (lhs);
>
> -      tree value = (init_type == AUTO_INIT_PATTERN) ?
> -                   build_int_cst (integer_type_node,
> -                                  INIT_PATTERN_VALUE) :
> -                   integer_zero_node;
> +      tree value = (init_type == AUTO_INIT_PATTERN)
> +                   ? build_int_cst (integer_type_node,
> +                                    INIT_PATTERN_VALUE)
> +                   : integer_zero_node;
>        tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
>                                      3, var_addr, value, var_size);
>        /* Expand this memset call.  */
> @@ -3073,15 +3073,17 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>        /* If this variable is in a register use expand_assignment.
>          For boolean scalars force zero-init.  */
>        tree init;
> +      scalar_int_mode var_mode;
>        if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>           && tree_fits_uhwi_p (var_size)
>           && (init_type == AUTO_INIT_PATTERN
>               || !is_gimple_reg_type (var_type))
>           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> -                               0).exists ())
> +                               0).exists (&var_mode)
> +         && have_insn_for (SET, var_mode))
>         {
>           unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> -         unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> +         unsigned char *buf = XALLOCAVEC (unsigned char, total_bytes);
>           memset (buf, (init_type == AUTO_INIT_PATTERN
>                         ? INIT_PATTERN_VALUE : 0), total_bytes);
>           tree itype = build_nonstandard_integer_type
> diff --git a/gcc/testsuite/gcc.target/i386/auto-init-6.c b/gcc/testsuite/gcc.target/i386/auto-init-6.c
> index 339f8bc2966..e53385f0eb7 100644
> --- a/gcc/testsuite/gcc.target/i386/auto-init-6.c
> +++ b/gcc/testsuite/gcc.target/i386/auto-init-6.c
> @@ -1,4 +1,6 @@
>  /* Verify pattern initialization for complex type automatic variables.  */
> +/* Note, _Complex long double is initialized to zeroes due to the current
> +   implemenation limitation.  */
>  /* { dg-do compile } */
>  /* { dg-options "-ftrivial-auto-var-init=pattern -march=x86-64 -mtune=generic -msse" } */
>
> @@ -15,6 +17,6 @@ _Complex long double foo()
>    return result;
>  }
>
> -/* { dg-final { scan-assembler-times "long\t-16843010" 10  { target { ! ia32 } } } } */
> -/* { dg-final { scan-assembler-times "long\t-16843010" 6  { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "long\t0" 8  { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "long\t-16843010" 6  } } */
>
>
>
>
> > On Nov 9, 2021, at 4:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Nov 9, 2021 at 10:10 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >>
> >> On Tue, Nov 09, 2021 at 08:13:57AM +0100, Richard Biener wrote:
> >>>> Hi, I tried both the following patches:
> >>>>
> >>>> Patch1:
> >>>>
> >>>> [opc@qinzhao-ol8u3-x86 gcc]$ git diff
> >>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> >>>> index 0cba95411a6..ca49d2b4514 100644
> >>>> --- a/gcc/internal-fn.c
> >>>> +++ b/gcc/internal-fn.c
> >>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >>>>       /* If this variable is in a register use expand_assignment.
> >>>>         For boolean scalars force zero-init.  */
> >>>>       tree init;
> >>>> +      scalar_int_mode var_mode;
> >>>>       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >>>>          && tree_fits_uhwi_p (var_size)
> >>>>          && (init_type == AUTO_INIT_PATTERN
> >>>>              || !is_gimple_reg_type (var_type))
> >>>>          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> >>>> -                               0).exists ())
> >>>> +                               0).exists (&var_mode)
> >>>> +         && targetm.scalar_mode_supported_p (var_mode))
> >>>>        {
> >>>>          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >>>>          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >>>>
> >>>> AND
> >>>>
> >>>> Patch2:
> >>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> >>>> index 0cba95411a6..7f129655926 100644
> >>>> --- a/gcc/internal-fn.c
> >>>> +++ b/gcc/internal-fn.c
> >>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >>>>       /* If this variable is in a register use expand_assignment.
> >>>>         For boolean scalars force zero-init.  */
> >>>>       tree init;
> >>>> +      scalar_int_mode var_mode;
> >>>>       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> >>>>          && tree_fits_uhwi_p (var_size)
> >>>>          && (init_type == AUTO_INIT_PATTERN
> >>>>              || !is_gimple_reg_type (var_type))
> >>>>          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> >>>> -                               0).exists ())
> >>>> +                               0).exists (&var_mode)
> >>>> +         && have_insn_for (SET, var_mode))
> >>>>        {
> >>>>          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
> >>>>          unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >>>>
> >>>> Have the same effect:
> >>>>
> >>>> 1. Resolved the ICE in gcc11;
> >>>> 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
> >>>>
> >>>> Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.
> >>>
> >>> I think zero-initialization is OK, but I'd choose Patch2 for
> >>> consistency with what we do in the memcpy
> >>> folding.
> >>
> >> Note, I think the code leaks memory (buf is never freed) and
> >> should be allocated using
> >>  unsigned char *buf = XNEWVEC (unsigned char *, total_bytes);
> >> and deallocated with XDELETEVEC (buf);
> >
> > Oops.  I think the intention was to use XALLOCAVEC
> >
> >> If lhs is SSA_NAME, I think it would be easiest to use
> >> native_interpret_expr directly instead of finding some integral
> >> mode, after memset just native_interpret_expr.
> >
> > I've removed the native_interpret_expr path, I think it would be best to not
> > rely on expand_assignment for the case of a register destination.  The
> > intent is to do an integer move into the destination and avoid issues with
> > things like XFmode or other modes with padding (and possibly trapping loads).
> > So it should become some
> >
> > (set (subreg:IntMode ...) (const_wide_int ...))
> >
> > but my RTL fu is too weak to try and so we ended up with expand_assignment
> > which doesn't like V_C_E on SSA names on the LHS (well, because that's
> > not valid GIMPLE...).
> >
> >> Should be guarded by
> >> BITS_PER_UNIT == 8 && CHAR_BIT == 8 && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> >> but I think the current code assumes at least that BITS_PER_UNIT == CHAR_BIT
> >> too anyway.
> >> That way you get directly a constant init instead of hopping through
> >> some integral mode which might not exist at all.
> >> If lhs is not SSA_NAME but var_type has COMPLEX_TYPE or VECTOR_TYPE, there
> >> is always an option to find var_mode only for the element type (i.e.
> >> TREE_TYPE (var_type), use var_size for the element size too, and after
> >> you compute init fold_build2 (COMPLEX_EXPR, ) or build_vector_from_val.
> >> For VECTOR_TYPE one would need to check if the vector mode is supported,
> >> sure (but if it isn't, it will go the BUILT_IN_MEMSET way, I'm pretty sure).
> >>
> >> Also,
> >>      tree value = (init_type == AUTO_INIT_PATTERN) ?
> >>                    build_int_cst (integer_type_node,
> >>                                   INIT_PATTERN_VALUE) :
> >>                    integer_zero_node;
> >> a few lines before has bad formatting, both ? and : should be at the start
> >> of next lines rather than on end of the previous ones.
> >>
> >>        Jakub
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different
  2021-11-10  8:37                           ` Richard Biener
@ 2021-11-10 18:02                             ` Qing Zhao
  0 siblings, 0 replies; 18+ messages in thread
From: Qing Zhao @ 2021-11-10 18:02 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc-patches Nick Alcock via, Kewen. Lin, H.J. Lu,
	andrew Pinski

Pushed the patch as:

https://gcc.gnu.org/pipermail/gcc-cvs/2021-November/356543.html

Qing


> On Nov 10, 2021, at 2:37 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Nov 9, 2021 at 6:48 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> So, based on the discussion so far,  is the following patch good to go?
> 
> OK.
> 
> Thanks,
> Richard.
> 
>> Let me know if you have more comments on the following patch:
>> 
>> (At the same time, I am testing this patch on both x86 and aarch64)
>> 
>> thanks.
>> 
>> Qing
>> 
>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> index 0cba95411a6..e8fd16b9c21 100644
>> --- a/gcc/internal-fn.c
>> +++ b/gcc/internal-fn.c
>> @@ -3059,10 +3059,10 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>       mark_addressable (lhs);
>>       tree var_addr = build_fold_addr_expr (lhs);
>> 
>> -      tree value = (init_type == AUTO_INIT_PATTERN) ?
>> -                   build_int_cst (integer_type_node,
>> -                                  INIT_PATTERN_VALUE) :
>> -                   integer_zero_node;
>> +      tree value = (init_type == AUTO_INIT_PATTERN)
>> +                   ? build_int_cst (integer_type_node,
>> +                                    INIT_PATTERN_VALUE)
>> +                   : integer_zero_node;
>>       tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
>>                                     3, var_addr, value, var_size);
>>       /* Expand this memset call.  */
>> @@ -3073,15 +3073,17 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>       /* If this variable is in a register use expand_assignment.
>>         For boolean scalars force zero-init.  */
>>       tree init;
>> +      scalar_int_mode var_mode;
>>       if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>          && tree_fits_uhwi_p (var_size)
>>          && (init_type == AUTO_INIT_PATTERN
>>              || !is_gimple_reg_type (var_type))
>>          && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>> -                               0).exists ())
>> +                               0).exists (&var_mode)
>> +         && have_insn_for (SET, var_mode))
>>        {
>>          unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>> -         unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>> +         unsigned char *buf = XALLOCAVEC (unsigned char, total_bytes);
>>          memset (buf, (init_type == AUTO_INIT_PATTERN
>>                        ? INIT_PATTERN_VALUE : 0), total_bytes);
>>          tree itype = build_nonstandard_integer_type
>> diff --git a/gcc/testsuite/gcc.target/i386/auto-init-6.c b/gcc/testsuite/gcc.target/i386/auto-init-6.c
>> index 339f8bc2966..e53385f0eb7 100644
>> --- a/gcc/testsuite/gcc.target/i386/auto-init-6.c
>> +++ b/gcc/testsuite/gcc.target/i386/auto-init-6.c
>> @@ -1,4 +1,6 @@
>> /* Verify pattern initialization for complex type automatic variables.  */
>> +/* Note, _Complex long double is initialized to zeroes due to the current
>> +   implemenation limitation.  */
>> /* { dg-do compile } */
>> /* { dg-options "-ftrivial-auto-var-init=pattern -march=x86-64 -mtune=generic -msse" } */
>> 
>> @@ -15,6 +17,6 @@ _Complex long double foo()
>>   return result;
>> }
>> 
>> -/* { dg-final { scan-assembler-times "long\t-16843010" 10  { target { ! ia32 } } } } */
>> -/* { dg-final { scan-assembler-times "long\t-16843010" 6  { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "long\t0" 8  { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "long\t-16843010" 6  } } */
>> 
>> 
>> 
>> 
>>> On Nov 9, 2021, at 4:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Nov 9, 2021 at 10:10 AM Jakub Jelinek <jakub@redhat.com> wrote:
>>>> 
>>>> On Tue, Nov 09, 2021 at 08:13:57AM +0100, Richard Biener wrote:
>>>>>> Hi, I tried both the following patches:
>>>>>> 
>>>>>> Patch1:
>>>>>> 
>>>>>> [opc@qinzhao-ol8u3-x86 gcc]$ git diff
>>>>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>>>>>> index 0cba95411a6..ca49d2b4514 100644
>>>>>> --- a/gcc/internal-fn.c
>>>>>> +++ b/gcc/internal-fn.c
>>>>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>>>      /* If this variable is in a register use expand_assignment.
>>>>>>        For boolean scalars force zero-init.  */
>>>>>>      tree init;
>>>>>> +      scalar_int_mode var_mode;
>>>>>>      if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>>>>>         && tree_fits_uhwi_p (var_size)
>>>>>>         && (init_type == AUTO_INIT_PATTERN
>>>>>>             || !is_gimple_reg_type (var_type))
>>>>>>         && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>>>>>> -                               0).exists ())
>>>>>> +                               0).exists (&var_mode)
>>>>>> +         && targetm.scalar_mode_supported_p (var_mode))
>>>>>>       {
>>>>>>         unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>>>>>>         unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>>>>> 
>>>>>> AND
>>>>>> 
>>>>>> Patch2:
>>>>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>>>>>> index 0cba95411a6..7f129655926 100644
>>>>>> --- a/gcc/internal-fn.c
>>>>>> +++ b/gcc/internal-fn.c
>>>>>> @@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>>>>      /* If this variable is in a register use expand_assignment.
>>>>>>        For boolean scalars force zero-init.  */
>>>>>>      tree init;
>>>>>> +      scalar_int_mode var_mode;
>>>>>>      if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
>>>>>>         && tree_fits_uhwi_p (var_size)
>>>>>>         && (init_type == AUTO_INIT_PATTERN
>>>>>>             || !is_gimple_reg_type (var_type))
>>>>>>         && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>>>>>> -                               0).exists ())
>>>>>> +                               0).exists (&var_mode)
>>>>>> +         && have_insn_for (SET, var_mode))
>>>>>>       {
>>>>>>         unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
>>>>>>         unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>>>>> 
>>>>>> Have the same effect:
>>>>>> 
>>>>>> 1. Resolved the ICE in gcc11;
>>>>>> 2. For _Complex long double variables, both return FALSE, as a result, for PATTERN initialization of _Complex long double variables, now they are initialization with ZEROs instead of FEs.
>>>>>> 
>>>>>> Let me know you opinion on this, If the above 2 is okay, then I might pick the above Patch 1 for the final patch to this issue.
>>>>> 
>>>>> I think zero-initialization is OK, but I'd choose Patch2 for
>>>>> consistency with what we do in the memcpy
>>>>> folding.
>>>> 
>>>> Note, I think the code leaks memory (buf is never freed) and
>>>> should be allocated using
>>>> unsigned char *buf = XNEWVEC (unsigned char *, total_bytes);
>>>> and deallocated with XDELETEVEC (buf);
>>> 
>>> Oops.  I think the intention was to use XALLOCAVEC
>>> 
>>>> If lhs is SSA_NAME, I think it would be easiest to use
>>>> native_interpret_expr directly instead of finding some integral
>>>> mode, after memset just native_interpret_expr.
>>> 
>>> I've removed the native_interpret_expr path, I think it would be best to not
>>> rely on expand_assignment for the case of a register destination.  The
>>> intent is to do an integer move into the destination and avoid issues with
>>> things like XFmode or other modes with padding (and possibly trapping loads).
>>> So it should become some
>>> 
>>> (set (subreg:IntMode ...) (const_wide_int ...))
>>> 
>>> but my RTL fu is too weak to try and so we ended up with expand_assignment
>>> which doesn't like V_C_E on SSA names on the LHS (well, because that's
>>> not valid GIMPLE...).
>>> 
>>>> Should be guarded by
>>>> BITS_PER_UNIT == 8 && CHAR_BIT == 8 && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
>>>> but I think the current code assumes at least that BITS_PER_UNIT == CHAR_BIT
>>>> too anyway.
>>>> That way you get directly a constant init instead of hopping through
>>>> some integral mode which might not exist at all.
>>>> If lhs is not SSA_NAME but var_type has COMPLEX_TYPE or VECTOR_TYPE, there
>>>> is always an option to find var_mode only for the element type (i.e.
>>>> TREE_TYPE (var_type), use var_size for the element size too, and after
>>>> you compute init fold_build2 (COMPLEX_EXPR, ) or build_vector_from_val.
>>>> For VECTOR_TYPE one would need to check if the vector mode is supported,
>>>> sure (but if it isn't, it will go the BUILT_IN_MEMSET way, I'm pretty sure).
>>>> 
>>>> Also,
>>>>     tree value = (init_type == AUTO_INIT_PATTERN) ?
>>>>                   build_int_cst (integer_type_node,
>>>>                                  INIT_PATTERN_VALUE) :
>>>>                   integer_zero_node;
>>>> a few lines before has bad formatting, both ? and : should be at the start
>>>> of next lines rather than on end of the previous ones.
>>>> 
>>>>       Jakub
>> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-11-10 18:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 20:37 Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different Qing Zhao
2021-11-05  1:34 ` Kewen.Lin
2021-11-05  6:05 ` Andrew Pinski
2021-11-05  6:53   ` Jakub Jelinek
2021-11-05 10:01     ` Richard Biener
2021-11-05 12:25       ` H.J. Lu
2021-11-05 16:11       ` Qing Zhao
2021-11-05 16:17         ` Jakub Jelinek
2021-11-05 17:37           ` Qing Zhao
2021-11-06  9:56             ` Jakub Jelinek
2021-11-08  8:41               ` Richard Biener
2021-11-08 23:47                 ` Qing Zhao
2021-11-09  7:13                   ` Richard Biener
2021-11-09  9:10                     ` Jakub Jelinek
2021-11-09 10:44                       ` Richard Biener
2021-11-09 17:19                         ` Qing Zhao
2021-11-10  8:37                           ` Richard Biener
2021-11-10 18:02                             ` Qing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).