[Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data
@ 2013-12-18  0:50 thiago at kde dot org
  2013-12-18  8:33 ` [Bug target/59539] " jakub at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: thiago at kde dot org @ 2013-12-18  0:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

            Bug ID: 59539
           Summary: Missed optimisation: VEX-prefixed operations don't
                    need aligned data
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org

Consider the following code:

#include <immintrin.h>
int f(void *p1, void *p2)
{
    __m128i d1 = _mm_loadu_si128((__m128i*)p1);
    __m128i d2 = _mm_loadu_si128((__m128i*)p2);
    __m128i result = _mm_cmpeq_epi16(d1, d2);
    return _mm_movemask_epi8(result);
}

If compiled with -O2 -mavx, it produces the following code with GCC 4.9
(current trunk):
f:
        vmovdqu (%rdi), %xmm0
        vmovdqu (%rsi), %xmm1
        vpcmpeqw        %xmm1, %xmm0, %xmm0
        vpmovmskb       %xmm0, %eax
        ret

One of the two VMOVDQU are unnecessary, since the VEX-prefixed VCMPEQW
instruction can do unaligned loads without faulting. The Intel Software
Developer's Manual Volume 1, Chapter 14 says in 14.9 "Memory alignment":

> With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded,
> arithmetic and data processing instructions operate in a flexible environment regarding memory address
> alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load
> operation by default. Memory arguments for most instructions with VEX prefix operate normally without
> causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions). The instructions that
> require explicit memory alignment requirements are listed in Table 14-22.

Clang and ICC have already implemente this optimisation:

Clang 3.3 produces:
f:                                      # @f
        vmovdqu (%rsi), %xmm0
        vpcmpeqw        (%rdi), %xmm0, %xmm0
        vpmovmskb       %xmm0, %eax
        ret

Similarly, ICC 14 produces:
f:
        vmovdqu   (%rdi), %xmm0
        vpcmpeqw  (%rsi), %xmm0, %xmm1
        vpmovmskb %xmm1, %eax
        ret


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
@ 2013-12-18  8:33 ` jakub at gcc dot gnu.org
  2013-12-18  8:49 ` thiago at kde dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-18  8:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-12-18
                 CC|                            |jakub at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 31463
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31463&action=edit
gcc49-pr59539.patch

This has been improved already for the compiler generated misaligned loads in
r204219 aka PR47754, but when you explicitly use intrinsics we don't go through
the movmisalign path (and, I think we shouldn't, at least I doubt when you say
use _mm256_loadu_si256 you'd be expecting to use depending on tuning say two
misaligned 128-bit loads instead), it still forces the generation of UNSPECs.

This patch will if the compiler will emit a vmovdqu (or vmovup{s,d}) for normal
*mov<mode>_internal pattern emit that instead of the UNSPECs and allow
combining it into insns, while if you use the unaligned loads on something
known to be unaligned, it will still not combine it (it will honor the
unaligned load then, because you've requested it specially).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
  2013-12-18  8:33 ` [Bug target/59539] " jakub at gcc dot gnu.org
@ 2013-12-18  8:49 ` thiago at kde dot org
  2013-12-18  9:38 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: thiago at kde dot org @ 2013-12-18  8:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #2 from Thiago Macieira <thiago at kde dot org> ---
I have to use _mm_loadu_si128 because non-VEX SSE requires explicit unaligned
loads.

Here's more food for thought:

    __m128i result = _mm_cmpeq_epi16((__m128i*)p1, (__m128i*)p2);

For non-VEX code, so far the compiler emitted one MOVDQA and one PCMPEQW if it
could, enforcing that both sources needed to be aligned. With VEX, VPCMPEQW can
do unaligned, so should the other load also be changed to VPMOVDQU instead of
VPMOVDQA?

Similarly, if I use _mm_load_si128 (not loadu), can the compiler combine one
load into the next instruction? Performance-wise, the execution will be the
same, with one fewer instruction to be retired (so, better); but it will not
cause an unaligned fault if the pointer isn't aligned.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
  2013-12-18  8:33 ` [Bug target/59539] " jakub at gcc dot gnu.org
  2013-12-18  8:49 ` thiago at kde dot org
@ 2013-12-18  9:38 ` ubizjak at gmail dot com
  2013-12-18  9:43 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-12-18  9:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #1)
> Created attachment 31463 [details]
> gcc49-pr59539.patch

>From a quick look at the patch, the proposed patch doesn't support masked
insns. 

I think it is better to introduce corresponding sse.md expanders for
UNSPEC_STOREU and UNSPEC_LOADU insns. These expanders will either generate
UNSPEC_* pattern or "normal" (masked) move in case of TARGET_AVX.
>From gcc-bugs-return-437963-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Dec 18 09:39:40 2013
Return-Path: <gcc-bugs-return-437963-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 27095 invoked by alias); 18 Dec 2013 09:39:40 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 27043 invoked by uid 48); 18 Dec 2013 09:39:37 -0000
From: "ebotcazou at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/58668] [4.8/4.9 regression] internal compiler error: in cond_exec_process_insns, at ifcvt.c:339
Date: Wed, 18 Dec 2013 09:39:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ebotcazou at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.8.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-58668-4-k0tXy9fQWR@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg01618.txt.bz2
Content-length: 1266

http://gcc.gnu.org/bugzilla/show_bug.cgi?idX668

--- Comment #6 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> I think the problem is that ifcvt relies on consistent counting of insns,
> but the various functions count different things.

What kind of insns is responsible for the discrepancy that leads to the ICE?

> I guess best would be to count/skip/etc. the same things consistently, the
> problem is that some of the functions have other uses etc.

In any case, this would be a sensible approach.

> 1) let count_bb_insns not count insns with USE or CLOBBER PATTERNs

Agreed.

> 2) perhaps not count any JUMP_INSNs in flow_find_cross_jump if dir_p == NULL
> (i.e.
> when called from ifcvt)?
> 3) perhaps not count USE/CLOBBER insns in flow_find_head_matching_sequence if
> stop_after is non-zero?

I'd first make the functions behave the same wrt USE and CLOBBER insns.

> 4) perhaps add also skip_use argument to first_active_insn and if TRUE,
> ignore USE insns and for both {first,last}_active_insn if skip_use is TRUE,
> also ignore CLOBBER insns
>
> 5) in find_active_insn_{before,after} ignore USE/CLOBBER insns
> and document this properly?

I'm less sure about these ones: does their behavior need to be in keeping with
the insns counting?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (2 preceding siblings ...)
  2013-12-18  9:38 ` ubizjak at gmail dot com
@ 2013-12-18  9:43 ` jakub at gcc dot gnu.org
  2013-12-18  9:50 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-18  9:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Jakub Jelinek from comment #1)
> > Created attachment 31463 [details]
> > gcc49-pr59539.patch
> 
> From a quick look at the patch, the proposed patch doesn't support masked
> insns. 
> 
> I think it is better to introduce corresponding sse.md expanders for
> UNSPEC_STOREU and UNSPEC_LOADU insns. These expanders will either generate
> UNSPEC_* pattern or "normal" (masked) move in case of TARGET_AVX.

I can surely add the expanders, but don't understand the comment about masked
moves.  *maskload* is already a specialized UNSPEC, and can't really be merged
with arithmetic patterns (ok, perhaps with -mavx512f?) and handles both aligned
and unaligned operands the same.
>From gcc-bugs-return-437966-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Dec 18 09:46:59 2013
Return-Path: <gcc-bugs-return-437966-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 941 invoked by alias); 18 Dec 2013 09:46:59 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 889 invoked by uid 48); 18 Dec 2013 09:46:55 -0000
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug lto/59543] [4.9 Regression] lto1: fatal error: Cgraph edge statement index out of range
Date: Wed, 18 Dec 2013 09:46:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: lto
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hubicka at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-59543-4-9wJY2ZNUw2@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59543-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59543-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg01621.txt.bz2
Content-length: 1071

http://gcc.gnu.org/bugzilla/show_bug.cgi?idY543

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thank you,
can you, please, check that the problem appears still after
2013-12-17  Jan Hubicka  <hubicka@ucw.cz>

        * ipa-utils.h (possible_polymorphic_call_targets): Determine context of
        the call.
        * gimple-fold.c (gimple_fold_call): Use ipa-devirt to devirtualize.

this patch makes early passes to use same analysis for devirtualization as IPA
passes and thus it should prevent from devirtualization to be discovered just
after early passes but before IPA analysis.  I suspect those can bring LTO
streaming out of sync. If this is the case, we need to chase away the
unreachable code removal that should not be placed in between renumbering stmts
and streaming.

Honza


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (3 preceding siblings ...)
  2013-12-18  9:43 ` jakub at gcc dot gnu.org
@ 2013-12-18  9:50 ` ubizjak at gmail dot com
  2013-12-18 10:18 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-12-18  9:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> > From a quick look at the patch, the proposed patch doesn't support masked
> > insns. 
> > 
> > I think it is better to introduce corresponding sse.md expanders for
> > UNSPEC_STOREU and UNSPEC_LOADU insns. These expanders will either generate
> > UNSPEC_* pattern or "normal" (masked) move in case of TARGET_AVX.
> 
> I can surely add the expanders, but don't understand the comment about
> masked moves.  *maskload* is already a specialized UNSPEC, and can't really
> be merged with arithmetic patterns (ok, perhaps with -mavx512f?) and handles
> both aligned and unaligned operands the same.

Ah, sorry for being terse, I was looking at avx512f_storeu_..._mask. I assume
that this pattern can be combined as a memory operand into other SSE
instructions, in the same way as non-masked UNSPEC_* patterns. Admittedly, we
are in the middle of avx512f merge, so this assumption may be wrong.
>From gcc-bugs-return-437968-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Dec 18 09:52:00 2013
Return-Path: <gcc-bugs-return-437968-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 4766 invoked by alias); 18 Dec 2013 09:52:00 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 4714 invoked by uid 48); 18 Dec 2013 09:51:56 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/58668] [4.8/4.9 regression] internal compiler error: in cond_exec_process_insns, at ifcvt.c:339
Date: Wed, 18 Dec 2013 09:52:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.8.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-58668-4-WRvlgWU3eL@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg01623.txt.bz2
Content-length: 1692

http://gcc.gnu.org/bugzilla/show_bug.cgi?idX668

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Eric Botcazou from comment #6)
> > I think the problem is that ifcvt relies on consistent counting of insns,
> > but the various functions count different things.
>
> What kind of insns is responsible for the discrepancy that leads to the ICE?

On the given testcase the JUMP_INSN at the end of bb is returnjump_p (without
additional side effects), so it isn't counted and thus that problem isn't
present.  But the problem on the testcase is insn with USE pattern right before
the jump, count_bb_insns counts it as active insn, flow_find_cross_jump
doesn't, and we subtract from the former the latter to see how many insns we
should allow at most for flow_find_head_matching_sequence, and because of the
discrepancy it is one bigger than it should and we end up with
then_last_head being after then_end, which violates the assumptions the code
makes.

> > 3) perhaps not count USE/CLOBBER insns in flow_find_head_matching_sequence if
> > stop_after is non-zero?
>
> I'd first make the functions behave the same wrt USE and CLOBBER insns.

Perhaps we can ignore those always in flow_find_head_matching_sequence?

> > 4) perhaps add also skip_use argument to first_active_insn and if TRUE,
> > ignore USE insns and for both {first,last}_active_insn if skip_use is TRUE,
> > also ignore CLOBBER insns
> >
> > 5) in find_active_insn_{before,after} ignore USE/CLOBBER insns
> > and document this properly?
>
> I'm less sure about these ones: does their behavior need to be in keeping
> with the insns counting?

Perhaps, though I'd say it might be a ticking bomb.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (4 preceding siblings ...)
  2013-12-18  9:50 ` ubizjak at gmail dot com
@ 2013-12-18 10:18 ` jakub at gcc dot gnu.org
  2013-12-18 10:39 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-18 10:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 31464
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31464&action=edit
gcc49-pr59539.patch

So like this instead?  The half-merged AVX512f support makes any changes hard,
it isn't clear if the masked variant will be needed or not, but it can't be
handled that easily, so I've kept it in the expander name for now, just don't
optimize it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (5 preceding siblings ...)
  2013-12-18 10:18 ` jakub at gcc dot gnu.org
@ 2013-12-18 10:39 ` ubizjak at gmail dot com
  2013-12-18 11:04 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-12-18 10:39 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 5165 bytes --]

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #7 from UroÅ¡ Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Created attachment 31464 [details]
> gcc49-pr59539.patch
> 
> So like this instead?  The half-merged AVX512f support makes any changes
> hard, it isn't clear if the masked variant will be needed or not, but it
> can't be handled that easily, so I've kept it in the expander name for now,
> just don't optimize it.

Yes, but why you use misaligned_operand check? This check is used in move
patterns to generate movu insn *instead of* mova. I guess that we can skip this
check entirely.
>From gcc-bugs-return-437979-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Dec 18 10:44:18 2013
Return-Path: <gcc-bugs-return-437979-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 13869 invoked by alias); 18 Dec 2013 10:44:18 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 13818 invoked by uid 48); 18 Dec 2013 10:44:14 -0000
From: "abel at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/57422] [4.9 Regression] ICE: SIGSEGV in dominated_by_p with custom flags
Date: Wed, 18 Dec 2013 10:44:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords: ice-on-valid-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: abel at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: abel at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status assigned_to attachments.created
Message-ID: <bug-57422-4-86N9o2tRNX@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57422-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57422-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg01634.txt.bz2
Content-length: 2602

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW422

Andrey Belevantsev <abel at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |abel at gcc dot gnu.org

--- Comment #3 from Andrey Belevantsev <abel at gcc dot gnu.org> ---
Created attachment 31465
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id1465&actioníit
proposed patch

This is an issue with somewhat surprising reason.  The problem in itself is
that when we're scheduling the fence insn (actually next insn), it gets
register renamed and so its previous form stored as the fence insn no longer
corresponds to the new form.  We do not expect it so there is no code to update
the fence insn in this case.

Starting to add this code that easily fixed the issue, I stopped and wondered
why this situation arises at all -- the fence insn should always be able to be
scheduled as is.  In this case, first, the target availability bit correctly
set to true on this insn is reset because this insn form was already scheduled
on this fence and the bit might be incorrect (see
vec_target_unavailable_vinsns).  Thus we resort to the full recomputation of
the possible registers for the insn, which suddenly do not include its target
register (ax).

Now this happens because ax is incorrectly marked as unavailable due to target
reasons.  And this in turn happens because of the typo in the patch of rev.
172231:

-#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
-      for (i = hard_regno_nregs[HARD_FRAME_POINTER_REGNUM][Pmode]; i--;)
-       SET_HARD_REG_BIT (reg_rename_p->unavailable_hard_regs,
-                          HARD_FRAME_POINTER_REGNUM + i);
-#endif
+      if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
+        add_to_hard_reg_set (&reg_rename_p->unavailable_hard_regs,
+                            Pmode, HARD_FRAME_POINTER_IS_FRAME_POINTER);

Instead of HARD_FRAME_POINTER_REGNUM, the parameter passed to
add_to_hard_reg_set is HARD_FRAME_POINTER_IS_FRAME_POINTER, which is zero and
equals to ax number :-)  Thus we always mark ax as unavailable in this piece of
code, which was noticed just because for this particular insn this leads to its
renaming and for this insn it should never happen.

The obvious patch restoring HARD_FRAME_POINTER_REGNUM is attached and fixes the
testcase (for the reported revision; the trunk no longer fails).  I will also
add an assert to check that the fence insn never gets renamed.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (6 preceding siblings ...)
  2013-12-18 10:39 ` ubizjak at gmail dot com
@ 2013-12-18 11:04 ` ubizjak at gmail dot com
  2013-12-18 16:50 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-12-18 11:04 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 2880 bytes --]

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #9 from UroÅ¡ Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #8)
> Because otherwise it can generate an aligned load, and I thought when user
> explicitly writes he wants an unaligned load we should honor it, perhaps for
> some reason the alignment info can't be trusted etc.
> ix86_expand_vector_move_misalign also checks misaligned_operand and only
> emits
> the non-UNSPEC insn for TARGET_AVX if it will result in misaligned load.

Thanks, the patch looks OK then (perhaps with the comment on misaligned_operand
usage).
>From gcc-bugs-return-437983-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Dec 18 11:05:30 2013
Return-Path: <gcc-bugs-return-437983-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 25419 invoked by alias); 18 Dec 2013 11:05:30 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 25370 invoked by uid 48); 18 Dec 2013 11:05:27 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/58668] [4.8/4.9 regression] internal compiler error: in cond_exec_process_insns, at ifcvt.c:339
Date: Wed, 18 Dec 2013 11:05:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.8.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-58668-4-FyWcdGq4a5@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58668-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg01638.txt.bz2
Content-length: 405

http://gcc.gnu.org/bugzilla/show_bug.cgi?idX668

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 31466
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id1466&actioníit
gcc49-pr58668.patch

So like this?  Note, the USE/CLOBBER change for flow_find_cross_jump
has been added in 2011 for 4.7 as PR43920 fix:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02246.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (7 preceding siblings ...)
  2013-12-18 11:04 ` ubizjak at gmail dot com
@ 2013-12-18 16:50 ` jakub at gcc dot gnu.org
  2013-12-18 16:52 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-18 16:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Wed Dec 18 16:50:06 2013
New Revision: 206090

URL: http://gcc.gnu.org/viewcvs?rev=206090&root=gcc&view=rev
Log:
    PR target/59539
    * config/i386/sse.md
    (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>,
    <sse2_avx_avx512f>_loaddqu<mode><mask_name>): New expanders,
    prefix existing define_insn names with *.

    * gcc.target/i386/pr59539-1.c: New test.
    * gcc.target/i386/pr59539-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr59539-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr59539-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (8 preceding siblings ...)
  2013-12-18 16:50 ` jakub at gcc dot gnu.org
@ 2013-12-18 16:52 ` jakub at gcc dot gnu.org
  2013-12-18 17:36 ` thiago at kde dot org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-18 16:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (9 preceding siblings ...)
  2013-12-18 16:52 ` jakub at gcc dot gnu.org
@ 2013-12-18 17:36 ` thiago at kde dot org
  2013-12-19  0:07 ` thiago at kde dot org
  2013-12-19  0:14 ` thiago at kde dot org
  12 siblings, 0 replies; 14+ messages in thread
From: thiago at kde dot org @ 2013-12-18 17:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #12 from Thiago Macieira <thiago at kde dot org> ---
Thanks, rebuilding!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (10 preceding siblings ...)
  2013-12-18 17:36 ` thiago at kde dot org
@ 2013-12-19  0:07 ` thiago at kde dot org
  2013-12-19  0:14 ` thiago at kde dot org
  12 siblings, 0 replies; 14+ messages in thread
From: thiago at kde dot org @ 2013-12-19  0:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #13 from Thiago Macieira <thiago at kde dot org> ---
I can't confirm. trunk@206091:

$ ~/gcc4.9/bin/gcc -mavx -S -o - -O3 -xc - <<<'#include <immintrin.h>
int f(void *p1, void *p2)
{
    __m128i d1 = _mm_loadu_si128((__m128i*)p1);
    __m128i d2 = _mm_loadu_si128((__m128i*)p2);
    __m128i result = _mm_cmpeq_epi16(d1, d2);
    return _mm_movemask_epi8(result);
}
'
        .file   ""
        .section        .text.unlikely,"ax",@progbits
.LCOLDB0:
        .text
.LHOTB0:
        .p2align 4,,15
        .globl  f
        .type   f, @function
f:
.LFB1073:
        .cfi_startproc
        vmovdqu (%rdi), %xmm0
        vmovdqu (%rsi), %xmm1
        vpcmpeqw        %xmm1, %xmm0, %xmm0
        vpmovmskb       %xmm0, %eax
        ret
        .cfi_endproc
.LFE1073:
        .size   f, .-f
        .section        .text.unlikely
.LCOLDE0:
        .text
.LHOTE0:
        .ident  "GCC: (GNU) 4.9.0 20131121 (experimental)"
        .section        .note.GNU-stack,"",@progbits


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/59539] Missed optimisation: VEX-prefixed operations don't need aligned data
  2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
                   ` (11 preceding siblings ...)
  2013-12-19  0:07 ` thiago at kde dot org
@ 2013-12-19  0:14 ` thiago at kde dot org
  12 siblings, 0 replies; 14+ messages in thread
From: thiago at kde dot org @ 2013-12-19  0:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #14 from Thiago Macieira <thiago at kde dot org> ---
*facepalm* I had forgotten to make install!

It works:
$ ~/gcc4.9/bin/gcc -mavx -S -o - -O3 -xc - <<<'#include <immintrin.h>
int f(void *p1, void *p2)
{
    __m128i d1 = _mm_loadu_si128((__m128i*)p1);
    __m128i d2 = _mm_loadu_si128((__m128i*)p2);
    __m128i result = _mm_cmpeq_epi16(d1, d2);
    return _mm_movemask_epi8(result);
}
'
        .file   ""
        .section        .text.unlikely,"ax",@progbits
.LCOLDB0:
        .text
.LHOTB0:
        .p2align 4,,15
        .globl  f
        .type   f, @function
f:
.LFB1073:
        .cfi_startproc
        vmovdqu (%rsi), %xmm0
        vpcmpeqw        (%rdi), %xmm0, %xmm0
        vpmovmskb       %xmm0, %eax
        ret
        .cfi_endproc
.LFE1073:
        .size   f, .-f
        .section        .text.unlikely
.LCOLDE0:
        .text
.LHOTE0:
        .ident  "GCC: (GNU) 4.9.0 20131218 (experimental)"
        .section        .note.GNU-stack,"",@progbits


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-12-19  0:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-18  0:50 [Bug target/59539] New: Missed optimisation: VEX-prefixed operations don't need aligned data thiago at kde dot org
2013-12-18  8:33 ` [Bug target/59539] " jakub at gcc dot gnu.org
2013-12-18  8:49 ` thiago at kde dot org
2013-12-18  9:38 ` ubizjak at gmail dot com
2013-12-18  9:43 ` jakub at gcc dot gnu.org
2013-12-18  9:50 ` ubizjak at gmail dot com
2013-12-18 10:18 ` jakub at gcc dot gnu.org
2013-12-18 10:39 ` ubizjak at gmail dot com
2013-12-18 11:04 ` ubizjak at gmail dot com
2013-12-18 16:50 ` jakub at gcc dot gnu.org
2013-12-18 16:52 ` jakub at gcc dot gnu.org
2013-12-18 17:36 ` thiago at kde dot org
2013-12-19  0:07 ` thiago at kde dot org
2013-12-19  0:14 ` thiago at kde dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).