public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/41484]  New: Please add memory forms of pmovzx* (SSE4.1)
@ 2009-09-27 21:03 sgunderson at bigfoot dot com
  2009-09-27 21:34 ` [Bug target/41484] " pinskia at gcc dot gnu dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: sgunderson at bigfoot dot com @ 2009-09-27 21:03 UTC (permalink / raw)
  To: gcc-bugs

Hi,

SSE4.1 introduced zero-extending and sign-extending loads, such as

  pmovzxbd (%rax), %mm0

which takes four bytes from (%rax), zero-extends them to four 32-bit dwords,
and put them into %mm0. However, GCC's intrinsics support only the form

  pmovzxbd %mm1, %mm0

which take the lower 32 bits from %mm1 and does the same. This is reflected in
the definition of the intrinsic (from the GCC 4.4.1 manual):

  v4si __builtin_ia32_pmovzxbd128 (v16qi)

This makes it rather hard and indirect to load, say, 32 bits from an unaligned
char* -- especially if you're not sure that the next 96 bits are readable.
(Just casting the char* pointer to an v16qi* and dereferencing it in the
intrinsic's argument causes GCC to emit an aligned load to a register, followed
by a pmovzxbd reg/reg, at least in my program.)

Could you please add the forms that take v2qi/v4qi/v8qi/v2hi/v4hi/v2si as well,
for the entire pmovzx* and pmovsx* family?


-- 
           Summary: Please add memory forms of pmovzx* (SSE4.1)
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sgunderson at bigfoot dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
@ 2009-09-27 21:34 ` pinskia at gcc dot gnu dot org
  2010-08-22 12:31 ` baggett dot patrick at gmail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-09-27 21:34 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
          Component|c                           |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
  2009-09-27 21:34 ` [Bug target/41484] " pinskia at gcc dot gnu dot org
@ 2010-08-22 12:31 ` baggett dot patrick at gmail dot com
  2010-08-22 12:35 ` baggett dot patrick at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: baggett dot patrick at gmail dot com @ 2010-08-22 12:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from baggett dot patrick at gmail dot com  2010-08-22 12:31 -------
I can confirm that is a problem on GCC 4.4.3, though I was using the
Intel-style intrinsic found in the SSE4 manual.

Smallest testcase:

gcc -msse4 -m64 
--------------------
#include <smmintrin.h>

/* Read four bytes and extend to 4 ints in xmm reg. */
__m128i vint_zx8_32(unsigned char* m)
{
        return _mm_cvtepu8_epi32( *((__m128i*)m) );
}

--------------------
Generated Code:
--------------------
vint_zx8_32:
        movdqa  (%rdi), %xmm0
        pmovzxbd        %xmm0, %xmm0
        ret

--------------------
Expected Code:
--------------------
vint_zx8_32:
        pmovzxbd        (%rdi), %xmm0
        ret


-- 

baggett dot patrick at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |baggett dot patrick at gmail
                   |                            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
  2009-09-27 21:34 ` [Bug target/41484] " pinskia at gcc dot gnu dot org
  2010-08-22 12:31 ` baggett dot patrick at gmail dot com
@ 2010-08-22 12:35 ` baggett dot patrick at gmail dot com
  2010-08-27  9:20 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: baggett dot patrick at gmail dot com @ 2010-08-22 12:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from baggett dot patrick at gmail dot com  2010-08-22 12:34 -------
Created an attachment (id=21542)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21542&action=view)
Test case showing intrinsics not generating memory operation.

Use "gcc -msse4 -m64 -S testcase_41484.c" to compile, view resulting assembly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (2 preceding siblings ...)
  2010-08-22 12:35 ` baggett dot patrick at gmail dot com
@ 2010-08-27  9:20 ` ubizjak at gmail dot com
  2010-08-27  9:26 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2010-08-27  9:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from ubizjak at gmail dot com  2010-08-27 09:20 -------
Confirmed.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
                   |dot org                     |
             Status|UNCONFIRMED                 |ASSIGNED
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-08-27 09:20:22
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (3 preceding siblings ...)
  2010-08-27  9:20 ` ubizjak at gmail dot com
@ 2010-08-27  9:26 ` ubizjak at gmail dot com
  2010-08-27 16:16 ` hjl dot tools at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2010-08-27  9:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from ubizjak at gmail dot com  2010-08-27 09:26 -------
Created an attachment (id=21576)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view)
Patch to remove special (vec_duplicate ...) insn RTXes

This patch removes special (vec_duplicate ...) forms of zero/sign extension
instructions. This is similar to existing sse2_cvtps2pd pattern that access
full 128bit memory even if only low 64bits are used.

Also, current gcc generates:

        movdqa  (%rdi), %xmm0   # 6     *movv16qi_internal/2    [length = 4]
        pmovzxbd        %xmm0, %xmm0    # 7     sse4_1_zero_extendv4qiv4si2     

which also access full 128bit 16byte aligned value. This is no better than:

        pmovzxbd        (%rdi), %xmm0   # 7     sse4_1_zero_extendv4qiv4si2     

Patch is untested, since I don't have required HW.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (4 preceding siblings ...)
  2010-08-27  9:26 ` ubizjak at gmail dot com
@ 2010-08-27 16:16 ` hjl dot tools at gmail dot com
  2010-08-27 16:54 ` uros at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-08-27 16:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from hjl dot tools at gmail dot com  2010-08-27 16:16 -------
(In reply to comment #4)
> Created an attachment (id=21576)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view) [edit]
> Patch to remove special (vec_duplicate ...) insn RTXes
> 
> This patch removes special (vec_duplicate ...) forms of zero/sign extension
> instructions. This is similar to existing sse2_cvtps2pd pattern that access
> full 128bit memory even if only low 64bits are used.
> 
> Also, current gcc generates:
> 
>         movdqa  (%rdi), %xmm0   # 6     *movv16qi_internal/2    [length = 4]
>         pmovzxbd        %xmm0, %xmm0    # 7     sse4_1_zero_extendv4qiv4si2     
> 
> which also access full 128bit 16byte aligned value. This is no better than:
> 
>         pmovzxbd        (%rdi), %xmm0   # 7     sse4_1_zero_extendv4qiv4si2     
> 
> Patch is untested, since I don't have required HW.
> 

I tested it on Linux/ia32 and Linux/Intel64 with SSE4.1. There are no
regressions. Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (5 preceding siblings ...)
  2010-08-27 16:16 ` hjl dot tools at gmail dot com
@ 2010-08-27 16:54 ` uros at gcc dot gnu dot org
  2010-08-28 14:02 ` uros at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: uros at gcc dot gnu dot org @ 2010-08-27 16:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from uros at gcc dot gnu dot org  2010-08-27 16:54 -------
Subject: Bug 41484

Author: uros
Date: Fri Aug 27 16:53:51 2010
New Revision: 163591

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163591
Log:
        PR target/41484
        * config/i386/sse.md (sse4_1_extendv8qiv8hi2): Also accept memory
        operands for operand 1.
        (sse4_1_extendv4qiv4si2): Ditto.
        (sse4_1_extendv2qiv2di2): Ditto.
        (sse4_1_extendv4hiv4si2): Ditto.
        (sse4_1_extendv2hiv2di2): Ditto.
        (sse4_1_extendv2siv2di2): Ditto.
        (sse4_1_zero_extendv8qiv8hi2): Ditto.
        (sse4_1_zero_extendv4qiv4si2): Ditto.
        (sse4_1_zero_extendv2qiv2di2): Ditto.
        (sse4_1_zero_extendv4hiv4si2): Ditto.
        (sse4_1_zero_extendv2hiv2di2): Ditto.
        (sse4_1_zero_extendv2siv2di2): Ditto.
        (*sse4_1_extendv8qiv8hi2): Remove insn pattern.
        (*sse4_1_extendv4qiv4si2): Ditto.
        (*sse4_1_extendv2qiv2di2): Ditto.
        (*sse4_1_extendv4hiv4si2): Ditto.
        (*sse4_1_extendv2hiv2di2): Ditto.
        (*sse4_1_extendv2siv2di2): Ditto.
        (*sse4_1_zero_extendv8qiv8hi2): Ditto.
        (*sse4_1_zero_extendv4qiv4si2): Ditto.
        (*sse4_1_zero_extendv2qiv2di2): Ditto.
        (*sse4_1_zero_extendv4hiv4si2): Ditto.
        (*sse4_1_zero_extendv2hiv2di2): Ditto.
        (*sse4_1_zero_extendv2siv2di2): Ditto.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (6 preceding siblings ...)
  2010-08-27 16:54 ` uros at gcc dot gnu dot org
@ 2010-08-28 14:02 ` uros at gcc dot gnu dot org
  2010-08-28 14:28 ` uros at gcc dot gnu dot org
  2010-08-28 14:34 ` ubizjak at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: uros at gcc dot gnu dot org @ 2010-08-28 14:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from uros at gcc dot gnu dot org  2010-08-28 14:02 -------
Subject: Bug 41484

Author: uros
Date: Sat Aug 28 14:02:18 2010
New Revision: 163613

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163613
Log:
        PR target/41484
        * config/i386/sse.md (sse4_1_extendv8qiv8hi2): Also accept memory
        operands for operand 1.
        (sse4_1_extendv4qiv4si2): Ditto.
        (sse4_1_extendv2qiv2di2): Ditto.
        (sse4_1_extendv4hiv4si2): Ditto.
        (sse4_1_extendv2hiv2di2): Ditto.
        (sse4_1_extendv2siv2di2): Ditto.
        (sse4_1_zero_extendv8qiv8hi2): Ditto.
        (sse4_1_zero_extendv4qiv4si2): Ditto.
        (sse4_1_zero_extendv2qiv2di2): Ditto.
        (sse4_1_zero_extendv4hiv4si2): Ditto.
        (sse4_1_zero_extendv2hiv2di2): Ditto.
        (sse4_1_zero_extendv2siv2di2): Ditto.
        (*sse4_1_extendv8qiv8hi2): Remove insn pattern.
        (*sse4_1_extendv4qiv4si2): Ditto.
        (*sse4_1_extendv2qiv2di2): Ditto.
        (*sse4_1_extendv4hiv4si2): Ditto.
        (*sse4_1_extendv2hiv2di2): Ditto.
        (*sse4_1_extendv2siv2di2): Ditto.
        (*sse4_1_zero_extendv8qiv8hi2): Ditto.
        (*sse4_1_zero_extendv4qiv4si2): Ditto.
        (*sse4_1_zero_extendv2qiv2di2): Ditto.
        (*sse4_1_zero_extendv4hiv4si2): Ditto.
        (*sse4_1_zero_extendv2hiv2di2): Ditto.
        (*sse4_1_zero_extendv2siv2di2): Ditto.


Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/config/i386/sse.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (7 preceding siblings ...)
  2010-08-28 14:02 ` uros at gcc dot gnu dot org
@ 2010-08-28 14:28 ` uros at gcc dot gnu dot org
  2010-08-28 14:34 ` ubizjak at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: uros at gcc dot gnu dot org @ 2010-08-28 14:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from uros at gcc dot gnu dot org  2010-08-28 14:27 -------
Subject: Bug 41484

Author: uros
Date: Sat Aug 28 14:27:33 2010
New Revision: 163614

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163614
Log:
        PR target/41484
        * config/i386/sse.md (sse4_1_extendv8qiv8hi2): Also accept memory
        operands for operand 1.
        (sse4_1_extendv4qiv4si2): Ditto.
        (sse4_1_extendv2qiv2di2): Ditto.
        (sse4_1_extendv4hiv4si2): Ditto.
        (sse4_1_extendv2hiv2di2): Ditto.
        (sse4_1_extendv2siv2di2): Ditto.
        (sse4_1_zero_extendv8qiv8hi2): Ditto.
        (sse4_1_zero_extendv4qiv4si2): Ditto.
        (sse4_1_zero_extendv2qiv2di2): Ditto.
        (sse4_1_zero_extendv4hiv4si2): Ditto.
        (sse4_1_zero_extendv2hiv2di2): Ditto.
        (sse4_1_zero_extendv2siv2di2): Ditto.
        (*sse4_1_extendv8qiv8hi2): Remove insn pattern.
        (*sse4_1_extendv4qiv4si2): Ditto.
        (*sse4_1_extendv2qiv2di2): Ditto.
        (*sse4_1_extendv4hiv4si2): Ditto.
        (*sse4_1_extendv2hiv2di2): Ditto.
        (*sse4_1_extendv2siv2di2): Ditto.
        (*sse4_1_zero_extendv8qiv8hi2): Ditto.
        (*sse4_1_zero_extendv4qiv4si2): Ditto.
        (*sse4_1_zero_extendv2qiv2di2): Ditto.
        (*sse4_1_zero_extendv4hiv4si2): Ditto.
        (*sse4_1_zero_extendv2hiv2di2): Ditto.
        (*sse4_1_zero_extendv2siv2di2): Ditto.


Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/i386/sse.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
  2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
                   ` (8 preceding siblings ...)
  2010-08-28 14:28 ` uros at gcc dot gnu dot org
@ 2010-08-28 14:34 ` ubizjak at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2010-08-28 14:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from ubizjak at gmail dot com  2010-08-28 14:34 -------
Fixed.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-28 14:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-27 21:03 [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1) sgunderson at bigfoot dot com
2009-09-27 21:34 ` [Bug target/41484] " pinskia at gcc dot gnu dot org
2010-08-22 12:31 ` baggett dot patrick at gmail dot com
2010-08-22 12:35 ` baggett dot patrick at gmail dot com
2010-08-27  9:20 ` ubizjak at gmail dot com
2010-08-27  9:26 ` ubizjak at gmail dot com
2010-08-27 16:16 ` hjl dot tools at gmail dot com
2010-08-27 16:54 ` uros at gcc dot gnu dot org
2010-08-28 14:02 ` uros at gcc dot gnu dot org
2010-08-28 14:28 ` uros at gcc dot gnu dot org
2010-08-28 14:34 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).