public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins
@ 2003-06-26 12:44 kevina at gnu dot org
  2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:44 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327

           Summary: Non-optimal code when using MMX/SSE builtins
           Product: gcc
           Version: 3.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: kevina at gnu dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu

Gcc generates non optimal code when the builtin MMX/SSE functions are used.  In
particular it has a tendency to insert unnecessary movq and sometimes
unnecessary memory reads.

For example in the following code:

#include <stddef.h>

typedef int v8qi __attribute__ ((mode(V8QI)));
typedef long long unsigned int ullint;

#define peq  __builtin_ia32_pcmpeqb
#define pmin __builtin_ia32_pminub
#define por(a,b) (v8qi)__builtin_ia32_por((ullint)a, (ullint)b)
#define psubs __builtin_ia32_psubusb

void foo(v8qi * a, v8qi * b, v8qi * c, size_t s)
{
  size_t i;
  v8qi thres;
  memset(&thres, 10, 8);
  for (i = 0; i != s; ++i)
  {
    c[i] = peq( pmin( por(psubs(a[i],b[i]), psubs(b[i],a[i])),
                      thres),
                thres);
  }
}

When compiled with "-O2 -march=pentium3" Gcc generates:

        # %mm2 is the constant thres
        ...
	movq	(%ecx,%eax,8), %mm0
	psubusb	(%edx,%eax,8), %mm0
	movq	%mm0, %mm1
	movq	(%edx,%eax,8), %mm0
	psubusb	(%ecx,%eax,8), %mm0
	por	%mm0, %mm1
	movq	%mm1, %mm0
        pminub	%mm2, %mm0
	pcmpeqb	%mm2, %mm0
	movq	%mm0, (%esi,%eax,8)
        ...

which involves 2 unnecessary memory reads and 1 unnecessary movq.  An optimal
version of the above code:

	movq	(%ecx,%eax,8), %mm0
        movq    (%edx,%eax,8), %mm1
        movq    %mm0, %mm3
        psubusb %mm1, %mm0
        psubusb %mm3, %mm1
        por     %mm1, %mm0
        pminub	%mm2, %mm0
	pcmpeqb	%mm2, %mm0
	movq	%mm0, (%esi,%eax,8)

This is just a simple example.  In more complex code there are more unnecessary
movq.  Spelling out exactly what to do for the inner loop:

    m1 = a[i];
    m2 = b[i];
    m3 = m1;
    m1 = psubs(m1, m2);
    m2 = psubs(m2, m3);
    m1 = por(m1,m2);
    m1 = pmin(m1, thres);
    m1 = peq(m1,thres);
    c[i] = m1;

Does not help.  It avoids the unnecessary memory reads but adds several
unnecessary movq.

The attached files include the example code and the generated code gcc produces
with a "diff" to my optimal version.  The ineffect code is marked with a '-'
while my code is marked with a '+'.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
@ 2003-06-26 12:45 ` kevina at gnu dot org
  2003-06-26 12:46 ` kevina at gnu dot org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:45 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327



------- Additional Comments From kevina at gnu dot org  2003-06-26 12:45 -------
Created an attachment (id=4287)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=4287&action=view)
Test Code


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
  2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
@ 2003-06-26 12:46 ` kevina at gnu dot org
  2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:46 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327



------- Additional Comments From kevina at gnu dot org  2003-06-26 12:46 -------
Created an attachment (id=4288)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=4288&action=view)
Generated Code


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
  2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
  2003-06-26 12:46 ` kevina at gnu dot org
@ 2003-06-27 17:50 ` dhazeghi at yahoo dot com
  2003-06-27 17:54 ` falk at debian dot org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-06-27 17:50 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


dhazeghi at yahoo dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
          Component|c                           |optimization
           Keywords|                            |pessimizes-code


------- Additional Comments From dhazeghi at yahoo dot com  2003-06-27 17:50 -------
Checking your simpler testcase with gcc mainline (20030620), I get:

.L5:
        movq    (%ecx,%eax,8), %mm0
        movq    (%edx,%eax,8), %mm1
        psubusb (%edx,%eax,8), %mm0
        psubusb (%ecx,%eax,8), %mm1
        por     %mm1, %mm0
        pminub  %mm2, %mm0
        pcmpeqb %mm2, %mm0
        movq    %mm0, (%esi,%eax,8)
        incl    %eax
        cmpl    %ebx, %eax
        jne     .L5


This looks a lot like the optimal code you suggested, correct? Would you mind sending an example 
of the better code you'd like to see generated for foo2, and/or trying gcc cvs to see if the problem 
is fixed there? Thanks,

Dara


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (2 preceding siblings ...)
  2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
@ 2003-06-27 17:54 ` falk at debian dot org
  2003-06-27 23:22 ` kevina at gnu dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: falk at debian dot org @ 2003-06-27 17:54 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327



------- Additional Comments From falk at debian dot org  2003-06-27 17:54 -------
Be sure to check out -fnew-ra, too, since this seems to be related to
register allocation, and -fnew-ra might become default in the future.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (3 preceding siblings ...)
  2003-06-27 17:54 ` falk at debian dot org
@ 2003-06-27 23:22 ` kevina at gnu dot org
  2003-06-28  0:13 ` dhazeghi at yahoo dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-27 23:22 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327



------- Additional Comments From kevina at gnu dot org  2003-06-27 23:22 -------
The generated code in gcc mainline (20030620) is better but it still involves
two unnecessary memory reads.

The second test case should produce the exact same code as the first as both
functions are doing the exact same thing.

I tried -fnew-ra with gcc 3.3 and it didn't seam to help.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (4 preceding siblings ...)
  2003-06-27 23:22 ` kevina at gnu dot org
@ 2003-06-28  0:13 ` dhazeghi at yahoo dot com
  2003-08-23  0:34 ` dhazeghi at yahoo dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-06-28  0:13 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


dhazeghi at yahoo dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2003-06-28 00:13:44
               date|                            |


------- Additional Comments From dhazeghi at yahoo dot com  2003-06-28 00:13 -------
Just checked mainline, and foo2 has one more unnecessary movq than foo1. -fnew-ra doesn't help 
(actually makes foo1 worse). Confirmed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (5 preceding siblings ...)
  2003-06-28  0:13 ` dhazeghi at yahoo dot com
@ 2003-08-23  0:34 ` dhazeghi at yahoo dot com
  2005-01-06  6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-08-23  0:34 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


dhazeghi at yahoo dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|i686-pc-linux-gnu           |
   GCC host triplet|i686-pc-linux-gnu           |
   Target Milestone|3.4                         |---


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (6 preceding siblings ...)
  2003-08-23  0:34 ` dhazeghi at yahoo dot com
@ 2005-01-06  6:07 ` rth at gcc dot gnu dot org
  2005-01-06  6:22 ` cvs-commit at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-06  6:07 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rth at gcc dot gnu dot org
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (7 preceding siblings ...)
  2005-01-06  6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
@ 2005-01-06  6:22 ` cvs-commit at gcc dot gnu dot org
  2005-01-06  6:26 ` rth at gcc dot gnu dot org
  2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-06  6:22 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-01-06 06:22 -------
Subject: Bug 11327

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-06 06:22:36

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.c 

Log message:
	PR target/11327
	* config/i386/i386.c (BUILTIN_DESC_SWAP_OPERANDS): New.
	(bdesc_2arg): Use it.
	(ix86_expand_binop_builtin): Force operands into registers
	when optimizing.
	(ix86_expand_unop_builtin, ix86_expand_unop1_builtin,
	ix86_expand_sse_compare, ix86_expand_sse_comi,
	ix86_expand_builtin): Likewise.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7044&r2=2.7045
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.769&r2=1.770



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (8 preceding siblings ...)
  2005-01-06  6:22 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-06  6:26 ` rth at gcc dot gnu dot org
  2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-06  6:26 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2005-01-06 06:26 -------
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00331.html
Should be fixed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
  2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
                   ` (9 preceding siblings ...)
  2005-01-06  6:26 ` rth at gcc dot gnu dot org
@ 2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-07 14:16 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-01-07 14:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
2003-06-26 12:46 ` kevina at gnu dot org
2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
2003-06-27 17:54 ` falk at debian dot org
2003-06-27 23:22 ` kevina at gnu dot org
2003-06-28  0:13 ` dhazeghi at yahoo dot com
2003-08-23  0:34 ` dhazeghi at yahoo dot com
2005-01-06  6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
2005-01-06  6:22 ` cvs-commit at gcc dot gnu dot org
2005-01-06  6:26 ` rth at gcc dot gnu dot org
2005-01-07 14:16 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).