public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins
@ 2003-06-26 12:44 kevina at gnu dot org
2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:44 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
Summary: Non-optimal code when using MMX/SSE builtins
Product: gcc
Version: 3.3
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: kevina at gnu dot org
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
Gcc generates non optimal code when the builtin MMX/SSE functions are used. In
particular it has a tendency to insert unnecessary movq and sometimes
unnecessary memory reads.
For example in the following code:
#include <stddef.h>
typedef int v8qi __attribute__ ((mode(V8QI)));
typedef long long unsigned int ullint;
#define peq __builtin_ia32_pcmpeqb
#define pmin __builtin_ia32_pminub
#define por(a,b) (v8qi)__builtin_ia32_por((ullint)a, (ullint)b)
#define psubs __builtin_ia32_psubusb
void foo(v8qi * a, v8qi * b, v8qi * c, size_t s)
{
size_t i;
v8qi thres;
memset(&thres, 10, 8);
for (i = 0; i != s; ++i)
{
c[i] = peq( pmin( por(psubs(a[i],b[i]), psubs(b[i],a[i])),
thres),
thres);
}
}
When compiled with "-O2 -march=pentium3" Gcc generates:
# %mm2 is the constant thres
...
movq (%ecx,%eax,8), %mm0
psubusb (%edx,%eax,8), %mm0
movq %mm0, %mm1
movq (%edx,%eax,8), %mm0
psubusb (%ecx,%eax,8), %mm0
por %mm0, %mm1
movq %mm1, %mm0
pminub %mm2, %mm0
pcmpeqb %mm2, %mm0
movq %mm0, (%esi,%eax,8)
...
which involves 2 unnecessary memory reads and 1 unnecessary movq. An optimal
version of the above code:
movq (%ecx,%eax,8), %mm0
movq (%edx,%eax,8), %mm1
movq %mm0, %mm3
psubusb %mm1, %mm0
psubusb %mm3, %mm1
por %mm1, %mm0
pminub %mm2, %mm0
pcmpeqb %mm2, %mm0
movq %mm0, (%esi,%eax,8)
This is just a simple example. In more complex code there are more unnecessary
movq. Spelling out exactly what to do for the inner loop:
m1 = a[i];
m2 = b[i];
m3 = m1;
m1 = psubs(m1, m2);
m2 = psubs(m2, m3);
m1 = por(m1,m2);
m1 = pmin(m1, thres);
m1 = peq(m1,thres);
c[i] = m1;
Does not help. It avoids the unnecessary memory reads but adds several
unnecessary movq.
The attached files include the example code and the generated code gcc produces
with a "diff" to my optimal version. The ineffect code is marked with a '-'
while my code is marked with a '+'.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
@ 2003-06-26 12:45 ` kevina at gnu dot org
2003-06-26 12:46 ` kevina at gnu dot org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:45 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
------- Additional Comments From kevina at gnu dot org 2003-06-26 12:45 -------
Created an attachment (id=4287)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=4287&action=view)
Test Code
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
@ 2003-06-26 12:46 ` kevina at gnu dot org
2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-26 12:46 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
------- Additional Comments From kevina at gnu dot org 2003-06-26 12:46 -------
Created an attachment (id=4288)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=4288&action=view)
Generated Code
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
2003-06-26 12:46 ` kevina at gnu dot org
@ 2003-06-27 17:50 ` dhazeghi at yahoo dot com
2003-06-27 17:54 ` falk at debian dot org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-06-27 17:50 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
dhazeghi at yahoo dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Component|c |optimization
Keywords| |pessimizes-code
------- Additional Comments From dhazeghi at yahoo dot com 2003-06-27 17:50 -------
Checking your simpler testcase with gcc mainline (20030620), I get:
.L5:
movq (%ecx,%eax,8), %mm0
movq (%edx,%eax,8), %mm1
psubusb (%edx,%eax,8), %mm0
psubusb (%ecx,%eax,8), %mm1
por %mm1, %mm0
pminub %mm2, %mm0
pcmpeqb %mm2, %mm0
movq %mm0, (%esi,%eax,8)
incl %eax
cmpl %ebx, %eax
jne .L5
This looks a lot like the optimal code you suggested, correct? Would you mind sending an example
of the better code you'd like to see generated for foo2, and/or trying gcc cvs to see if the problem
is fixed there? Thanks,
Dara
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (2 preceding siblings ...)
2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
@ 2003-06-27 17:54 ` falk at debian dot org
2003-06-27 23:22 ` kevina at gnu dot org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: falk at debian dot org @ 2003-06-27 17:54 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
------- Additional Comments From falk at debian dot org 2003-06-27 17:54 -------
Be sure to check out -fnew-ra, too, since this seems to be related to
register allocation, and -fnew-ra might become default in the future.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (3 preceding siblings ...)
2003-06-27 17:54 ` falk at debian dot org
@ 2003-06-27 23:22 ` kevina at gnu dot org
2003-06-28 0:13 ` dhazeghi at yahoo dot com
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: kevina at gnu dot org @ 2003-06-27 23:22 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
------- Additional Comments From kevina at gnu dot org 2003-06-27 23:22 -------
The generated code in gcc mainline (20030620) is better but it still involves
two unnecessary memory reads.
The second test case should produce the exact same code as the first as both
functions are doing the exact same thing.
I tried -fnew-ra with gcc 3.3 and it didn't seam to help.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (4 preceding siblings ...)
2003-06-27 23:22 ` kevina at gnu dot org
@ 2003-06-28 0:13 ` dhazeghi at yahoo dot com
2003-08-23 0:34 ` dhazeghi at yahoo dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-06-28 0:13 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
dhazeghi at yahoo dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2003-06-28 00:13:44
date| |
------- Additional Comments From dhazeghi at yahoo dot com 2003-06-28 00:13 -------
Just checked mainline, and foo2 has one more unnecessary movq than foo1. -fnew-ra doesn't help
(actually makes foo1 worse). Confirmed.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (5 preceding siblings ...)
2003-06-28 0:13 ` dhazeghi at yahoo dot com
@ 2003-08-23 0:34 ` dhazeghi at yahoo dot com
2005-01-06 6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-08-23 0:34 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
dhazeghi at yahoo dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
GCC build triplet|i686-pc-linux-gnu |
GCC host triplet|i686-pc-linux-gnu |
Target Milestone|3.4 |---
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (6 preceding siblings ...)
2003-08-23 0:34 ` dhazeghi at yahoo dot com
@ 2005-01-06 6:07 ` rth at gcc dot gnu dot org
2005-01-06 6:22 ` cvs-commit at gcc dot gnu dot org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-06 6:07 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rth at gcc dot gnu dot org
|dot org |
Status|NEW |ASSIGNED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (7 preceding siblings ...)
2005-01-06 6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
@ 2005-01-06 6:22 ` cvs-commit at gcc dot gnu dot org
2005-01-06 6:26 ` rth at gcc dot gnu dot org
2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-06 6:22 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From cvs-commit at gcc dot gnu dot org 2005-01-06 06:22 -------
Subject: Bug 11327
CVSROOT: /cvs/gcc
Module name: gcc
Changes by: rth@gcc.gnu.org 2005-01-06 06:22:36
Modified files:
gcc : ChangeLog
gcc/config/i386: i386.c
Log message:
PR target/11327
* config/i386/i386.c (BUILTIN_DESC_SWAP_OPERANDS): New.
(bdesc_2arg): Use it.
(ix86_expand_binop_builtin): Force operands into registers
when optimizing.
(ix86_expand_unop_builtin, ix86_expand_unop1_builtin,
ix86_expand_sse_compare, ix86_expand_sse_comi,
ix86_expand_builtin): Likewise.
Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7044&r2=2.7045
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.769&r2=1.770
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (8 preceding siblings ...)
2005-01-06 6:22 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-06 6:26 ` rth at gcc dot gnu dot org
2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-06 6:26 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-06 06:26 -------
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00331.html
Should be fixed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/11327] Non-optimal code when using MMX/SSE builtins
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
` (9 preceding siblings ...)
2005-01-06 6:26 ` rth at gcc dot gnu dot org
@ 2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-07 14:16 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11327
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-01-07 14:15 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-26 12:44 [Bug c/11327] New: Non-optimal code when using MMX/SSE builtins kevina at gnu dot org
2003-06-26 12:45 ` [Bug c/11327] " kevina at gnu dot org
2003-06-26 12:46 ` kevina at gnu dot org
2003-06-27 17:50 ` [Bug optimization/11327] " dhazeghi at yahoo dot com
2003-06-27 17:54 ` falk at debian dot org
2003-06-27 23:22 ` kevina at gnu dot org
2003-06-28 0:13 ` dhazeghi at yahoo dot com
2003-08-23 0:34 ` dhazeghi at yahoo dot com
2005-01-06 6:07 ` [Bug rtl-optimization/11327] " rth at gcc dot gnu dot org
2005-01-06 6:22 ` cvs-commit at gcc dot gnu dot org
2005-01-06 6:26 ` rth at gcc dot gnu dot org
2005-01-07 14:16 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).