[PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
@ 2016-01-08 20:21 Jakub Jelinek
  2016-01-08 20:28 ` H.J. Lu
  2016-01-12 13:13 ` Kirill Yukhin
  0 siblings, 2 replies; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 20:21 UTC (permalink / raw)
  To: Uros Bizjak, Kirill Yukhin; +Cc: gcc-patches

Hi!

This patch fixes
FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
regressions that were introduced recently by fixing up the masked store check for misalignment.
The problem is that for v2df/v4df/v4sf/v8sf masked stores ix86_expand_special_args_builtin
failed to set aligned_mem and thus didn't set correct memory alignment.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-08  Jakub Jelinek  <jakub@redhat.com>

	PR target/69198
	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
	aligned_mem is properly set for AVX512-VL floating point masked
	stores.

--- gcc/config/i386/i386.c.jj	2016-01-08 07:31:11.000000000 +0100
+++ gcc/config/i386/i386.c	2016-01-08 18:16:21.030354042 +0100
@@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
       memory = 0;
       break;
     case VOID_FTYPE_PV8DF_V8DF_UQI:
+    case VOID_FTYPE_PV4DF_V4DF_UQI:
+    case VOID_FTYPE_PV2DF_V2DF_UQI:
     case VOID_FTYPE_PV16SF_V16SF_UHI:
+    case VOID_FTYPE_PV8SF_V8SF_UQI:
+    case VOID_FTYPE_PV4SF_V4SF_UQI:
     case VOID_FTYPE_PV8DI_V8DI_UQI:
     case VOID_FTYPE_PV4DI_V4DI_UQI:
     case VOID_FTYPE_PV2DI_V2DI_UQI:
@@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
     case VOID_FTYPE_PV16QI_V16QI_UHI:
     case VOID_FTYPE_PV32QI_V32QI_USI:
     case VOID_FTYPE_PV64QI_V64QI_UDI:
-    case VOID_FTYPE_PV4DF_V4DF_UQI:
-    case VOID_FTYPE_PV2DF_V2DF_UQI:
-    case VOID_FTYPE_PV8SF_V8SF_UQI:
-    case VOID_FTYPE_PV4SF_V4SF_UQI:
       nargs = 2;
       klass = store;
       /* Reserve memory operand for target.  */

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:21 [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198) Jakub Jelinek
@ 2016-01-08 20:28 ` H.J. Lu
  2016-01-08 20:35   ` Jakub Jelinek
  2016-01-12 13:13 ` Kirill Yukhin
  1 sibling, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 20:28 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch fixes
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> regressions that were introduced recently by fixing up the masked store check for misalignment.
> The problem is that for v2df/v4df/v4sf/v8sf masked stores ix86_expand_special_args_builtin
> failed to set aligned_mem and thus didn't set correct memory alignment.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>

gcc.target/i386/avx512vl-vmovapd-1.c has

---
include <immintrin.h>

double *p;
volatile __m256d yy, y2;
volatile __m128d xx, x2;
volatile __mmask8 m;

void extern
avx512vl_test (void)
{
  yy = _mm256_mask_mov_pd (yy, m, y2);
  xx = _mm_mask_mov_pd (xx, m, x2);

  yy = _mm256_maskz_mov_pd (m, y2);
  xx = _mm_maskz_mov_pd (m, x2);

  yy = _mm256_mask_load_pd (yy, m, p);
  xx = _mm_mask_load_pd (xx, m, p);

  yy = _mm256_maskz_load_pd (m, p);
  xx = _mm_maskz_load_pd (m, p);

  _mm256_mask_store_pd (p, m, yy);
  _mm_mask_store_pd (p, m, xx);
}
---

'p' is misaligned.  Why should we change its alignment?

H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:28 ` H.J. Lu
@ 2016-01-08 20:35   ` Jakub Jelinek
  2016-01-08 20:39     ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 20:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 08, 2016 at 12:28:43PM -0800, H.J. Lu wrote:
> gcc.target/i386/avx512vl-vmovapd-1.c has
> 
> ---
> include <immintrin.h>
> 
> double *p;
> volatile __m256d yy, y2;
> volatile __m128d xx, x2;
> volatile __mmask8 m;
> 
> void extern
> avx512vl_test (void)
> {
>   yy = _mm256_mask_mov_pd (yy, m, y2);
>   xx = _mm_mask_mov_pd (xx, m, x2);
> 
>   yy = _mm256_maskz_mov_pd (m, y2);
>   xx = _mm_maskz_mov_pd (m, x2);
> 
>   yy = _mm256_mask_load_pd (yy, m, p);
>   xx = _mm_mask_load_pd (xx, m, p);
> 
>   yy = _mm256_maskz_load_pd (m, p);
>   xx = _mm_maskz_load_pd (m, p);
> 
>   _mm256_mask_store_pd (p, m, yy);
>   _mm_mask_store_pd (p, m, xx);
> }
> ---
> 
> 'p' is misaligned.

p is not misaligned, it has just unknown alignment.

> Why should we change its alignment?

Because the uses of these intrinsics implies the memory is aligned.
The masked loads also imply aligned memory and has been giving that
alignment for quite some time, non-masked _mm_store_pd/_mm256_store_pd,
or even masked _mm512_store_pd as well, just these two (and s/pd/ps/)
not by a mistake.
If the memory is not aligned, you'd be using _mm256_mask_storeu_p[ds]
or _mm_mask_storeu_p[ds] instead.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:35   ` Jakub Jelinek
@ 2016-01-08 20:39     ` H.J. Lu
  2016-01-08 20:44       ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 20:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 12:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 08, 2016 at 12:28:43PM -0800, H.J. Lu wrote:
>> gcc.target/i386/avx512vl-vmovapd-1.c has
>>
>> ---
>> include <immintrin.h>
>>
>> double *p;
>> volatile __m256d yy, y2;
>> volatile __m128d xx, x2;
>> volatile __mmask8 m;
>>
>> void extern
>> avx512vl_test (void)
>> {
>>   yy = _mm256_mask_mov_pd (yy, m, y2);
>>   xx = _mm_mask_mov_pd (xx, m, x2);
>>
>>   yy = _mm256_maskz_mov_pd (m, y2);
>>   xx = _mm_maskz_mov_pd (m, x2);
>>
>>   yy = _mm256_mask_load_pd (yy, m, p);
>>   xx = _mm_mask_load_pd (xx, m, p);
>>
>>   yy = _mm256_maskz_load_pd (m, p);
>>   xx = _mm_maskz_load_pd (m, p);
>>
>>   _mm256_mask_store_pd (p, m, yy);
>>   _mm_mask_store_pd (p, m, xx);
>> }
>> ---
>>
>> 'p' is misaligned.
>
> p is not misaligned, it has just unknown alignment.

And it may be 8 byte aligned.

>> Why should we change its alignment?
>
> Because the uses of these intrinsics implies the memory is aligned.
> The masked loads also imply aligned memory and has been giving that
> alignment for quite some time, non-masked _mm_store_pd/_mm256_store_pd,
> or even masked _mm512_store_pd as well, just these two (and s/pd/ps/)
> not by a mistake.
> If the memory is not aligned, you'd be using _mm256_mask_storeu_p[ds]
> or _mm_mask_storeu_p[ds] instead.
>

I think the testcase is wrong.  `p' should point to the properly
aligned memory in this case.  If the alignment is unknown,
unaligned intrinsics should be used.


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:39     ` H.J. Lu
@ 2016-01-08 20:44       ` Jakub Jelinek
  2016-01-08 20:46         ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 20:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 08, 2016 at 12:39:50PM -0800, H.J. Lu wrote:
> > p is not misaligned, it has just unknown alignment.
> 
> And it may be 8 byte aligned.

Yes.  But if you call the routine with just 8 byte aligned p,
you invoke undefined behavior.  So, there is nothing wrong on the testcase,
it tests what it means to.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:44       ` Jakub Jelinek
@ 2016-01-08 20:46         ` H.J. Lu
  2016-01-08 20:48           ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 20:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 12:44 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 08, 2016 at 12:39:50PM -0800, H.J. Lu wrote:
>> > p is not misaligned, it has just unknown alignment.
>>
>> And it may be 8 byte aligned.
>
> Yes.  But if you call the routine with just 8 byte aligned p,
> you invoke undefined behavior.  So, there is nothing wrong on the testcase,
> it tests what it means to.
>

Testing what?  Undefined behavior?


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:46         ` H.J. Lu
@ 2016-01-08 20:48           ` Jakub Jelinek
  2016-01-08 21:14             ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 20:48 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 08, 2016 at 12:46:01PM -0800, H.J. Lu wrote:
> On Fri, Jan 8, 2016 at 12:44 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Jan 08, 2016 at 12:39:50PM -0800, H.J. Lu wrote:
> >> > p is not misaligned, it has just unknown alignment.
> >>
> >> And it may be 8 byte aligned.
> >
> > Yes.  But if you call the routine with just 8 byte aligned p,
> > you invoke undefined behavior.  So, there is nothing wrong on the testcase,
> > it tests what it means to.
> >
> 
> Testing what?  Undefined behavior?

No.  Testing that if you use an intrinsic through which you tell the
compiler the memory is aligned, it doesn't ignore that and actually uses
the instruction you've asked for.
If you use the *storeu* instrinsics instead, and if the compiler can't prove
the memory is sufficiently aligned, of course it has to use the unaligned
stores.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:48           ` Jakub Jelinek
@ 2016-01-08 21:14             ` H.J. Lu
  2016-01-08 21:23               ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 21:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 12:48 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 08, 2016 at 12:46:01PM -0800, H.J. Lu wrote:
>> On Fri, Jan 8, 2016 at 12:44 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Fri, Jan 08, 2016 at 12:39:50PM -0800, H.J. Lu wrote:
>> >> > p is not misaligned, it has just unknown alignment.
>> >>
>> >> And it may be 8 byte aligned.
>> >
>> > Yes.  But if you call the routine with just 8 byte aligned p,
>> > you invoke undefined behavior.  So, there is nothing wrong on the testcase,
>> > it tests what it means to.
>> >
>>
>> Testing what?  Undefined behavior?
>
> No.  Testing that if you use an intrinsic through which you tell the
> compiler the memory is aligned, it doesn't ignore that and actually uses
> the instruction you've asked for.
> If you use the *storeu* instrinsics instead, and if the compiler can't prove
> the memory is sufficiently aligned, of course it has to use the unaligned
> stores.
>

I think the testcase should be changed to

double __attribute__ ((aligned (32))) *p;

and I am testing a different patch by removing the whole
aligned_mem stuff.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:14             ` H.J. Lu
@ 2016-01-08 21:23               ` Jakub Jelinek
  2016-01-08 21:27                 ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 21:23 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 08, 2016 at 01:14:04PM -0800, H.J. Lu wrote:
> I think the testcase should be changed to
> 
> double __attribute__ ((aligned (32))) *p;

No.
> 
> and I am testing a different patch by removing the whole
> aligned_mem stuff.

That is just wrong and will severely pessimize correct code.
Please don't waste time on that.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:23               ` Jakub Jelinek
@ 2016-01-08 21:27                 ` H.J. Lu
  2016-01-08 21:38                   ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 21:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 1:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 08, 2016 at 01:14:04PM -0800, H.J. Lu wrote:
>> I think the testcase should be changed to
>>
>> double __attribute__ ((aligned (32))) *p;

FYI,  gcc.target/i386/avx512vl-vmovaps-1.c has

float __attribute__ ((aligned (32))) *p;

> No.
>>
>> and I am testing a different patch by removing the whole
>> aligned_mem stuff.
>
> That is just wrong and will severely pessimize correct code.
> Please don't waste time on that.
>

Do you have an example to show it will pessimize correct code?


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:27                 ` H.J. Lu
@ 2016-01-08 21:38                   ` Jakub Jelinek
  2016-01-08 21:43                     ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-08 21:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 08, 2016 at 01:27:08PM -0800, H.J. Lu wrote:
> > That is just wrong and will severely pessimize correct code.
> > Please don't waste time on that.
> >
> 
> Do you have an example to show it will pessimize correct code?

Anything where the compiler can't figure out alignment info and you use the
aligned functions, starting from trivial tests like:

void foo (float *p, __m256 q)
{
  _mm256_store_ps (p, q);
}

etc.  The SSE*/AVX* docs say clearly that if the pointer argument is not
properly aligned, a #GP is generated, and you need to use the
*storeu*/*loadu* intrinsics instead.

Generally, the GCC middle-end does not infer alignment info from mere
existence of pointers, but from memory accesses - and the _mm*_{load,store}*
intrinsics count as memory accesses, but they are represented as builtins
that take a pointer argument, and only at the RTL level the memory load or
store is visible in the IL.  Which is the reason for the align_mem stuff,
there are no MEM_REFs at the GIMPLE level, the MEMs are created when
expanding those intrinsics, and for intrinsics where user asserts certain
alignment the align_mem stuff is exactly what lets the optimizers know about
the user choice.  Otherwise there would be really no difference between
using _mm256_store_ps and _mm256_storeu_ps.  The only difference between is
the assertion that the memory in correct programs is properly aligned in the
non-u versions.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:38                   ` Jakub Jelinek
@ 2016-01-08 21:43                     ` H.J. Lu
  2016-01-08 21:53                       ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 21:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 1:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 08, 2016 at 01:27:08PM -0800, H.J. Lu wrote:
>> > That is just wrong and will severely pessimize correct code.
>> > Please don't waste time on that.
>> >
>>
>> Do you have an example to show it will pessimize correct code?
>
> Anything where the compiler can't figure out alignment info and you use the
> aligned functions, starting from trivial tests like:
>
> void foo (float *p, __m256 q)
> {
>   _mm256_store_ps (p, q);
> }
>
> etc.  The SSE*/AVX* docs say clearly that if the pointer argument is not
> properly aligned, a #GP is generated, and you need to use the
> *storeu*/*loadu* intrinsics instead.
>

How is it `correct' code when alignment of p is unknown?  Do you
mean you are expecting GP.


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:43                     ` H.J. Lu
@ 2016-01-08 21:53                       ` H.J. Lu
  2016-01-08 21:59                         ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 21:53 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jan 8, 2016 at 1:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Jan 08, 2016 at 01:27:08PM -0800, H.J. Lu wrote:
>>> > That is just wrong and will severely pessimize correct code.
>>> > Please don't waste time on that.
>>> >
>>>
>>> Do you have an example to show it will pessimize correct code?
>>
>> Anything where the compiler can't figure out alignment info and you use the
>> aligned functions, starting from trivial tests like:
>>
>> void foo (float *p, __m256 q)
>> {
>>   _mm256_store_ps (p, q);
>> }
>>
>

Is a bad example:

extern __inline void __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm256_store_ps (float *__P, __m256 __A)
{
  *(__m256 *)__P = __A;
}

since it doesn't use builtin.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:53                       ` H.J. Lu
@ 2016-01-08 21:59                         ` H.J. Lu
  2016-01-08 22:02                           ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 21:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 1:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jan 8, 2016 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Jan 8, 2016 at 1:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Fri, Jan 08, 2016 at 01:27:08PM -0800, H.J. Lu wrote:
>>>> > That is just wrong and will severely pessimize correct code.
>>>> > Please don't waste time on that.
>>>> >
>>>>
>>>> Do you have an example to show it will pessimize correct code?
>>>
>>> Anything where the compiler can't figure out alignment info and you use the
>>> aligned functions, starting from trivial tests like:
>>>
>>> void foo (float *p, __m256 q)
>>> {
>>>   _mm256_store_ps (p, q);
>>> }
>>>
>>
>
> Is a bad example:
>

This is a better example:

__m256
foo (const void *p, __m256 yy, __mmask8 m)
{
  return _mm256_mask_load_ps (yy, m, p);
}


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 21:59                         ` H.J. Lu
@ 2016-01-08 22:02                           ` H.J. Lu
  0 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2016-01-08 22:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches

On Fri, Jan 8, 2016 at 1:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jan 8, 2016 at 1:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Jan 8, 2016 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Jan 8, 2016 at 1:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> On Fri, Jan 08, 2016 at 01:27:08PM -0800, H.J. Lu wrote:
>>>>> > That is just wrong and will severely pessimize correct code.
>>>>> > Please don't waste time on that.
>>>>> >
>>>>>
>>>>> Do you have an example to show it will pessimize correct code?
>>>>
>>>> Anything where the compiler can't figure out alignment info and you use the
>>>> aligned functions, starting from trivial tests like:
>>>>
>>>> void foo (float *p, __m256 q)
>>>> {
>>>>   _mm256_store_ps (p, q);
>>>> }
>>>>
>>>
>>
>> Is a bad example:
>>
>
> This is a better example:
>
> __m256
> foo (const void *p, __m256 yy, __mmask8 m)
> {
>   return _mm256_mask_load_ps (yy, m, p);
> }
>

We have to assume proper alignment in this case.


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-08 20:21 [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198) Jakub Jelinek
  2016-01-08 20:28 ` H.J. Lu
@ 2016-01-12 13:13 ` Kirill Yukhin
  2016-01-12 13:39   ` H.J. Lu
  1 sibling, 1 reply; 20+ messages in thread
From: Kirill Yukhin @ 2016-01-12 13:13 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches, HJ

Hello Jakub
On 08 Jan 21:20, Jakub Jelinek wrote:
> Hi!
> 
> This patch fixes
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> regressions that were introduced recently by fixing up the masked store check for misalignment.
> The problem is that for v2df/v4df/v4sf/v8sf masked stores ix86_expand_special_args_builtin
> failed to set aligned_mem and thus didn't set correct memory alignment.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Followed you discussion w/ HJ.
I think that metioned intrinsics should assume proper alignement and this
agrees with SDM.

So, your patch is ok for main trunk.

--
Thanks, K


> 
> 2016-01-08  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/69198
> 	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
> 	aligned_mem is properly set for AVX512-VL floating point masked
> 	stores.
> 
> --- gcc/config/i386/i386.c.jj	2016-01-08 07:31:11.000000000 +0100
> +++ gcc/config/i386/i386.c	2016-01-08 18:16:21.030354042 +0100
> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>        memory = 0;
>        break;
>      case VOID_FTYPE_PV8DF_V8DF_UQI:
> +    case VOID_FTYPE_PV4DF_V4DF_UQI:
> +    case VOID_FTYPE_PV2DF_V2DF_UQI:
>      case VOID_FTYPE_PV16SF_V16SF_UHI:
> +    case VOID_FTYPE_PV8SF_V8SF_UQI:
> +    case VOID_FTYPE_PV4SF_V4SF_UQI:
>      case VOID_FTYPE_PV8DI_V8DI_UQI:
>      case VOID_FTYPE_PV4DI_V4DI_UQI:
>      case VOID_FTYPE_PV2DI_V2DI_UQI:
> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>      case VOID_FTYPE_PV16QI_V16QI_UHI:
>      case VOID_FTYPE_PV32QI_V32QI_USI:
>      case VOID_FTYPE_PV64QI_V64QI_UDI:
> -    case VOID_FTYPE_PV4DF_V4DF_UQI:
> -    case VOID_FTYPE_PV2DF_V2DF_UQI:
> -    case VOID_FTYPE_PV8SF_V8SF_UQI:
> -    case VOID_FTYPE_PV4SF_V4SF_UQI:
>        nargs = 2;
>        klass = store;
>        /* Reserve memory operand for target.  */
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-12 13:13 ` Kirill Yukhin
@ 2016-01-12 13:39   ` H.J. Lu
  2016-01-12 13:42     ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2016-01-12 13:39 UTC (permalink / raw)
  To: Kirill Yukhin, Enkovich, Ilya; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

On Tue, Jan 12, 2016 at 5:12 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello Jakub
> On 08 Jan 21:20, Jakub Jelinek wrote:
>> Hi!
>>
>> This patch fixes
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%xmm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ \\\\t]+[^{\\n]*%ymm[0-9]+[^\\n]*\\\\){%k[1-7]}(?:\\n|[ \\\\t]+#) 1
>> regressions that were introduced recently by fixing up the masked store check for misalignment.
>> The problem is that for v2df/v4df/v4sf/v8sf masked stores ix86_expand_special_args_builtin
>> failed to set aligned_mem and thus didn't set correct memory alignment.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Followed you discussion w/ HJ.
> I think that metioned intrinsics should assume proper alignement and this
> agrees with SDM.
>
> So, your patch is ok for main trunk.
>
> --
> Thanks, K
>
>
>>
>> 2016-01-08  Jakub Jelinek  <jakub@redhat.com>
>>
>>       PR target/69198
>>       * config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
>>       aligned_mem is properly set for AVX512-VL floating point masked
>>       stores.
>>
>> --- gcc/config/i386/i386.c.jj 2016-01-08 07:31:11.000000000 +0100
>> +++ gcc/config/i386/i386.c    2016-01-08 18:16:21.030354042 +0100
>> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>>        memory = 0;
>>        break;
>>      case VOID_FTYPE_PV8DF_V8DF_UQI:
>> +    case VOID_FTYPE_PV4DF_V4DF_UQI:
>> +    case VOID_FTYPE_PV2DF_V2DF_UQI:
>>      case VOID_FTYPE_PV16SF_V16SF_UHI:
>> +    case VOID_FTYPE_PV8SF_V8SF_UQI:
>> +    case VOID_FTYPE_PV4SF_V4SF_UQI:
>>      case VOID_FTYPE_PV8DI_V8DI_UQI:
>>      case VOID_FTYPE_PV4DI_V4DI_UQI:
>>      case VOID_FTYPE_PV2DI_V2DI_UQI:
>> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>>      case VOID_FTYPE_PV16QI_V16QI_UHI:
>>      case VOID_FTYPE_PV32QI_V32QI_USI:
>>      case VOID_FTYPE_PV64QI_V64QI_UDI:
>> -    case VOID_FTYPE_PV4DF_V4DF_UQI:
>> -    case VOID_FTYPE_PV2DF_V2DF_UQI:
>> -    case VOID_FTYPE_PV8SF_V8SF_UQI:
>> -    case VOID_FTYPE_PV4SF_V4SF_UQI:
>>        nargs = 2;
>>        klass = store;
>>        /* Reserve memory operand for target.  */
>>
>>       Jakub

GCC 5 has the same issue.  This patch should be backported to GCC 5
with

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html

which supersedes:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269

OK to backport Jakub's and my patch for GCC 5?

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-12 13:39   ` H.J. Lu
@ 2016-01-12 13:42     ` Jakub Jelinek
  2016-01-12 13:45       ` Uros Bizjak
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2016-01-12 13:42 UTC (permalink / raw)
  To: H.J. Lu, Uros Bizjak; +Cc: Kirill Yukhin, Enkovich, Ilya, GCC Patches

On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
> GCC 5 has the same issue.  This patch should be backported to GCC 5
> with
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
> 
> which supersedes:
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
> 
> OK to backport Jakub's and my patch for GCC 5?

I think I'd prefer just r231269 and my patch for the branch, to make the
changes as small as possible, leave the cleanup on the trunk only.
But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

	Jakub

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-12 13:42     ` Jakub Jelinek
@ 2016-01-12 13:45       ` Uros Bizjak
  2016-01-12 19:58         ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2016-01-12 13:45 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Kirill Yukhin, Enkovich, Ilya, GCC Patches

On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>> with
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>
>> which supersedes:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
>>
>> OK to backport Jakub's and my patch for GCC 5?
>
> I think I'd prefer just r231269 and my patch for the branch, to make the
> changes as small as possible, leave the cleanup on the trunk only.
> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

I agree with Jakub.

Those two patches are OK for backport.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)
  2016-01-12 13:45       ` Uros Bizjak
@ 2016-01-12 19:58         ` H.J. Lu
  0 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2016-01-12 19:58 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, Kirill Yukhin, Enkovich, Ilya, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Tue, Jan 12, 2016 at 5:45 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>>> with
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>>
>>> which supersedes:
>>>
>>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
>>>
>>> OK to backport Jakub's and my patch for GCC 5?
>>
>> I think I'd prefer just r231269 and my patch for the branch, to make the
>> changes as small as possible, leave the cleanup on the trunk only.
>> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.
>
> I agree with Jakub.
>
> Those two patches are OK for backport.
>

This is what I checked in.

Thanks.


-- 
H.J.

[-- Attachment #2: 0001-Fix-alignment-check-in-AVX-512-masked-store.patch --]
[-- Type: text/x-patch, Size: 2858 bytes --]

From e6a6fd4b2fb4bb239fed4de6f9374f9b102e9c0f Mon Sep 17 00:00:00 2001
From: ienkovich <ienkovich@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Fri, 4 Dec 2015 14:18:58 +0000
Subject: [PATCH] Fix alignment check in AVX-512 masked store

	Backport from mainline
	2016-01-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/69198
	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
	aligned_mem is properly set for AVX512-VL floating point masked
	stores.

	2015-12-04  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/sse.md (<avx512>_store<mode>_mask): Fix
	operand checked for alignment.
---
 gcc/ChangeLog          | 15 +++++++++++++++
 gcc/config/i386/i386.c |  8 ++++----
 gcc/config/i386/sse.md |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d7bc6a2..be24722 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-01-12  H.J. Lu  <hongjiu.lu@intel.com>
+
+	Backport from mainline
+	2016-01-12  Jakub Jelinek  <jakub@redhat.com>
+
+	PR target/69198
+	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
+	aligned_mem is properly set for AVX512-VL floating point masked
+	stores.
+
+	2015-12-04  Ilya Enkovich  <enkovich.gnu@gmail.com>
+
+	* config/i386/sse.md (<avx512>_store<mode>_mask): Fix
+	operand checked for alignment.
+
 2016-01-12  James Greenhalgh  <james.greenhalgh@arm.com>
 
 	Backport from mainline r222186.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3547ba6..b0c301b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -38259,7 +38259,11 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
       memory = 0;
       break;
     case VOID_FTYPE_PV8DF_V8DF_QI:
+    case VOID_FTYPE_PV4DF_V4DF_QI:
+    case VOID_FTYPE_PV2DF_V2DF_QI:
     case VOID_FTYPE_PV16SF_V16SF_HI:
+    case VOID_FTYPE_PV8SF_V8SF_QI:
+    case VOID_FTYPE_PV4SF_V4SF_QI:
     case VOID_FTYPE_PV8DI_V8DI_QI:
     case VOID_FTYPE_PV4DI_V4DI_QI:
     case VOID_FTYPE_PV2DI_V2DI_QI:
@@ -38319,10 +38323,6 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case VOID_FTYPE_PV16QI_V16QI_HI:
     case VOID_FTYPE_PV32QI_V32QI_SI:
     case VOID_FTYPE_PV64QI_V64QI_DI:
-    case VOID_FTYPE_PV4DF_V4DF_QI:
-    case VOID_FTYPE_PV2DF_V2DF_QI:
-    case VOID_FTYPE_PV8SF_V8SF_QI:
-    case VOID_FTYPE_PV4SF_V4SF_QI:
       nargs = 2;
       klass = store;
       /* Reserve memory operand for target.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9235753..15d7188 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1022,7 +1022,7 @@
       sse_suffix = "<ssescalarsize>";
     }
 
-  if (misaligned_operand (operands[1], <MODE>mode))
+  if (misaligned_operand (operands[0], <MODE>mode))
     align = "u";
   else
     align = "a";
-- 
2.5.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-12 19:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-08 20:21 [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198) Jakub Jelinek
2016-01-08 20:28 ` H.J. Lu
2016-01-08 20:35   ` Jakub Jelinek
2016-01-08 20:39     ` H.J. Lu
2016-01-08 20:44       ` Jakub Jelinek
2016-01-08 20:46         ` H.J. Lu
2016-01-08 20:48           ` Jakub Jelinek
2016-01-08 21:14             ` H.J. Lu
2016-01-08 21:23               ` Jakub Jelinek
2016-01-08 21:27                 ` H.J. Lu
2016-01-08 21:38                   ` Jakub Jelinek
2016-01-08 21:43                     ` H.J. Lu
2016-01-08 21:53                       ` H.J. Lu
2016-01-08 21:59                         ` H.J. Lu
2016-01-08 22:02                           ` H.J. Lu
2016-01-12 13:13 ` Kirill Yukhin
2016-01-12 13:39   ` H.J. Lu
2016-01-12 13:42     ` Jakub Jelinek
2016-01-12 13:45       ` Uros Bizjak
2016-01-12 19:58         ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).