From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23970 invoked by alias); 21 Apr 2009 19:05:49 -0000 Received: (qmail 23899 invoked by uid 48); 21 Apr 2009 19:05:33 -0000 Date: Tue, 21 Apr 2009 19:05:00 -0000 Subject: [Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "drepper at redhat dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-04/txt/msg01853.txt.bz2 The implementations of the SSE intrinsics for x86 and x86-64 in gcc is tied to the use of an appropriate -m option, such as -mssse3 or -mavx. This is different from what icc does and it prevents code from being written in the most natural form. This is nothing new in gcc 4.4, it has been the behavior of gcc forever, as far as I can see. But especially the introduction of AVX brings this problem to the foreground. As an example, assume I want to write a vector class with the usual operations. I can write code like this: #ifdef __AVX__ vec operator+(vec &a, vec &b) { ... use AVX intrinsics ... } #elif defined __SSE4__ vec operator+(vec &a, vec &b) { ... use SSE4 intrinsics ... } #elif defined __SSE2__ vec operator+(vec &a, vec &b) { ... use SSE2 intrinsics ... } #else vec operator+(vec &a, vec &b) { ... generic implementation ... } #endif But this means, of course, that the binary has to be compiled for every single target and the correct one has to be chosen. This is not attractive or practical. Chances are that only a generic implementation will be available. It would be better to have a self-optimizing implementation: vec operator+(vec &a, vec &b) { if (AVX is available) ... use AVX intrinsics ... else if (SSE4 is available) ... use SSE4 intrinsics ... else if (SSE2 is available) ... use SSE2 intrinsics ... else ... generic implementation ... } This is possible with icc. It is not possible with gcc in the moment. For gcc I would have to split the implementation of all the variants in individual files and then, in the template function as seen above, these implementations would have to be called. Even if as in this case it might be doable (but terribly inconvenient) there are situations where this is really impractical or impossible. The problem is that to be able to use the AVX intrinsics the compiler has to be passed -mavx (all other extensions are implied in -mavx). But this flag has another consequence: the compiler will now take advantage of the new instructions in AVX and generate for unrelated code not associated with intrinsics (e.g., an inlined memset implementation). The result is that such a binary will fail to run on anything but an AVX-enabled machine. In icc the -mavx flag exclusively controls the code generation (i.e., whether AVX is used in inlined memset etc). The SSE intrinsics and all the associated data types are _always_ defined as soon as is included. This means the exmaple code above would be compiled with an -m parameter for the minimum ISA to support and still the AVX, SSE4, ... intrinsics are available. gcc should follow icc's way of handling the intrinsics. Since all this intrinsic business comes from icc I consider this a bug in gcc's implementation instead of an enhancement request. -- Summary: Non-optimal (or wrong) implementation of SSE intrinsics Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com GCC target triplet: i?86-* x86_64-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840