[Bug target/46716] [4.3/4.4/4.5/4.6 Regression] bad code generated with -mno-sse2 -m64

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "mathog at caltech dot edu" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/46716] [4.3/4.4/4.5/4.6 Regression] bad code generated with -mno-sse2 -m64
Date: Tue, 30 Nov 2010 17:50:00 -0000	[thread overview]
Message-ID: <bug-46716-4-GZml3BFH84@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-46716-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46716

--- Comment #5 from David Mathog <mathog at caltech dot edu> 2010-11-30 17:25:01 UTC ---
A (long) side note on how I found this bug - in partial answer to the obvious
question - why would anybody run with -mno-sse2 on an X86_64 platform?  

We have a cluster of Athlon MP machines and one of the applications that run
there is Sean Eddy's HMMER which is used to search a database call PFAMDIR. 
With version 3 of that software PFAMDIR changed format to only work with the
newer software.  HMMER 3 has a reference version (portable) and an SSE (really
SSE2 version).  I found that the reference version did not give exactly the
same answers as the SSE version, so it wasn't going to be possible to refine
the reference version for that platform and get the exact same results as
everybody else, but the SSE2 version could not run on the target since that
processor has no SSE2.  Since I need to make this work on those old machines I
wrote an SSE2 emulator which is a replacement emmintrin.h (latest version here)

  http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h

This would be used instead of the native SSE2 (in theory) by dropping it into a
directory as emmintrin.h and using "-I. -mno-sse2 -DSOFT_SSE2" on the gcc
compile line.  The idea being to get the SSE2 version running on the target and
then optimize from that version for the target, retaining the same numerical
results along the way.  

Since there was a lot of recompiling/retesting during development I used my
fastest machine, which happened to be an Opteron running an X86_64 OS.  To
exercise the software SSE2 code all of the SSE2 tests in the gcc testsuite were
run, and these triggered the present bug due to the default implicit -m64.

I also have done some preliminary work on a soft_xmmintrin.h, but have my
doubts that it is possible to use that successfully in combination with the gcc
vector extension, since many strange things happen when -mno-sse is added to
the command line.  It seems that the gcc vector extension is very much
intertwined with SSE on X86 platforms and perhaps cannot be fully separated
from it.  (A point that is not made at all clear in the documentation.) 

Additionally, with -msse -mno-sse2 -m32 and levels of optimization above -O0
complex expression like this is used in a real program (with multiple _mm
functions used, it does not show up in the testsuite with single _mm function
calls):

#define EMMMIN(a,b)      ((a)<(b)?(a):(b))
#define EMM_UINT1(a)   ((unsigned char *)&(a))
/* vector operation:  returns the minimum of each pair of the
16 8 bit unsigned integers from __A, __B */
static __inline __m128i __attribute__((__always_inline__))
_mm_min_epu8 (__m128i __A, __m128i __B)
{
  __v16qi __tmp={  EMMMIN(EMM_UINT1(__A)[ 0],EMM_UINT1(__B)[ 0]), 
                   EMMMIN(EMM_UINT1(__A)[ 1],EMM_UINT1(__B)[ 1]),
                   EMMMIN(EMM_UINT1(__A)[ 2],EMM_UINT1(__B)[ 2]),
                   EMMMIN(EMM_UINT1(__A)[ 3],EMM_UINT1(__B)[ 3]),
                   EMMMIN(EMM_UINT1(__A)[ 4],EMM_UINT1(__B)[ 4]),
                   EMMMIN(EMM_UINT1(__A)[ 5],EMM_UINT1(__B)[ 5]),
                   EMMMIN(EMM_UINT1(__A)[ 6],EMM_UINT1(__B)[ 6]),
                   EMMMIN(EMM_UINT1(__A)[ 7],EMM_UINT1(__B)[ 7]), 
                   EMMMIN(EMM_UINT1(__A)[ 8],EMM_UINT1(__B)[ 8]),
                   EMMMIN(EMM_UINT1(__A)[ 9],EMM_UINT1(__B)[ 9]),
                   EMMMIN(EMM_UINT1(__A)[10],EMM_UINT1(__B)[10]),
                   EMMMIN(EMM_UINT1(__A)[11],EMM_UINT1(__B)[11]),
                   EMMMIN(EMM_UINT1(__A)[12],EMM_UINT1(__B)[12]),
                   EMMMIN(EMM_UINT1(__A)[13],EMM_UINT1(__B)[13]),
                   EMMMIN(EMM_UINT1(__A)[14],EMM_UINT1(__B)[14]),
                   EMMMIN(EMM_UINT1(__A)[15],EMM_UINT1(__B)[15])};
  return (__m128i)__tmp;
}

often result in this sort of compiler error:

./msvfilter.c:208: error: unable to find a register to spill in class
'GENERAL_REGS'
./msvfilter.c:208: error: this is the insn:
(insn 1944 1943 1945 46 ../../easel/emmintrin.h:2348 (set
(strict_low_part (subreg:HI (reg:TI 1239) 0))
        (mem:HI (reg/f:SI 96 [ pretmp.1031 ]) [13 S2 A16])) 47
{*movstricthi_1} (nil))
./msvfilter.c:208: confused by earlier errors, bailing out

Simpler (fewer vector elements, less logic) functions did not do this, although
it may be that they would have had I been able to get past the first error. 
This is, I suspect, again related to an implicit use of SSE2 registers even
though -mno-sse2 had been specified.  This type of error shows up even when
-m32 is specified, so maybe it has a different origin.  In any case, rewriting
the expressions as follows seems to have eliminated this problem even for -O4,
and the primary change was the replacement of the vector {} notation to set the
(same) values.

typedef union {
 __m128i             vi;
 __m128d             vd;
 __m128              vf;
  double             f8[2];
  float              f4[4];
  long long          i8[2];
  int                i4[4];
  short              i2[8];
  char               i1[16];
  unsigned long long u8[2];
  unsigned int       u4[4];
  unsigned short     u2[8];
  unsigned char      u1[16];
} __uni16;
#define EMM_UINT1(a)   (((__uni16)(a)).u1)
#define EMMMIN(a,b)      ((a)<(b)?(a):(b))


/* vector operation:  returns the minimum of each pair of the
16 8 bit unsigned integers from __A, __B */
static __inline __m128i __attribute__((__always_inline__))
_mm_min_epu8 (__m128i __A, __m128i __B)
{
  __uni16 __tmp;
    __tmp.u1[ 0] =  EMMMIN(EMM_UINT1(__A)[ 0],EMM_UINT1(__B)[ 0]);
    __tmp.u1[ 1] =  EMMMIN(EMM_UINT1(__A)[ 1],EMM_UINT1(__B)[ 1]);
    __tmp.u1[ 2] =  EMMMIN(EMM_UINT1(__A)[ 2],EMM_UINT1(__B)[ 2]);
    __tmp.u1[ 3] =  EMMMIN(EMM_UINT1(__A)[ 3],EMM_UINT1(__B)[ 3]);
    __tmp.u1[ 4] =  EMMMIN(EMM_UINT1(__A)[ 4],EMM_UINT1(__B)[ 4]);
    __tmp.u1[ 5] =  EMMMIN(EMM_UINT1(__A)[ 5],EMM_UINT1(__B)[ 5]);
    __tmp.u1[ 6] =  EMMMIN(EMM_UINT1(__A)[ 6],EMM_UINT1(__B)[ 6]);
    __tmp.u1[ 7] =  EMMMIN(EMM_UINT1(__A)[ 7],EMM_UINT1(__B)[ 7]);
    __tmp.u1[ 8] =  EMMMIN(EMM_UINT1(__A)[ 8],EMM_UINT1(__B)[ 8]);
    __tmp.u1[ 9] =  EMMMIN(EMM_UINT1(__A)[ 9],EMM_UINT1(__B)[ 9]);
    __tmp.u1[10] =  EMMMIN(EMM_UINT1(__A)[10],EMM_UINT1(__B)[10]);
    __tmp.u1[11] =  EMMMIN(EMM_UINT1(__A)[11],EMM_UINT1(__B)[11]);
    __tmp.u1[12] =  EMMMIN(EMM_UINT1(__A)[12],EMM_UINT1(__B)[12]);
    __tmp.u1[13] =  EMMMIN(EMM_UINT1(__A)[13],EMM_UINT1(__B)[13]);
    __tmp.u1[14] =  EMMMIN(EMM_UINT1(__A)[14],EMM_UINT1(__B)[14]);
    __tmp.u1[15] =  EMMMIN(EMM_UINT1(__A)[15],EMM_UINT1(__B)[15]);
  return __tmp.vi;
}

next prev parent reply	other threads:[~2010-11-30 17:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-30  0:59 [Bug c/46716] New: " mathog at caltech dot edu
2010-11-30 10:01 ` [Bug target/46716] [4.3/4.4/4.5/4.6 Regression] " jakub at gcc dot gnu.org
2010-11-30 12:36 ` rguenth at gcc dot gnu.org
2010-11-30 12:59 ` jakub at gcc dot gnu.org
2010-11-30 13:49 ` hjl.tools at gmail dot com
2010-11-30 17:50 ` mathog at caltech dot edu [this message]
2010-12-05 12:03 ` rguenth at gcc dot gnu.org
2011-01-03 20:28 ` rguenth at gcc dot gnu.org
2011-06-27 13:40 ` [Bug target/46716] [4.3/4.4/4.5/4.6/4.7 Regression] wrong " rguenth at gcc dot gnu.org
2012-03-02 14:36 ` [Bug target/46716] [4.4/4.5/4.6/4.7 " ubizjak at gmail dot com
2012-03-02 17:04 ` [Bug target/46716] [4.4/4.5/4.6/4.7/4.8 " uros at gcc dot gnu.org
2012-03-13 14:08 ` [Bug target/46716] [4.5/4.6/4.7/4.8 " jakub at gcc dot gnu.org
2012-07-02 12:10 ` rguenth at gcc dot gnu.org
2013-04-12 15:16 ` [Bug target/46716] [4.7/4.8/4.9 " jakub at gcc dot gnu.org
2014-01-23  7:12 ` law at redhat dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-46716-4-GZml3BFH84@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).