From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28304 invoked by alias); 4 Nov 2008 22:40:26 -0000 Received: (qmail 18636 invoked by uid 48); 4 Nov 2008 22:39:02 -0000 Date: Tue, 04 Nov 2008 22:40:00 -0000 Subject: [Bug c/38015] New: Converting between int and vector using intrinsics goes through memory X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "jch at pps dot jussieu dot fr" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2008-11/txt/msg00320.txt.bz2 Consider the following function, which adds 1 to its argument using Intel intrinsics: #include unsigned add1(unsigned x) { __m128i a = _mm_cvtsi32_si128(x); __m128i b = _mm_add_epi32(a, _mm_set_epi32(0, 0, 0, 1)); return _mm_cvtsi128_si32(b); } GCC goes through memory no less than three times: once when converting x to a vector, once when converting 1 to a vector, and once when converting the result back to an integer: add1: pxor %xmm0, %xmm0 movq %rdi, -16(%rsp) movq -16(%rsp), %xmm1 movss %xmm1, %xmm0 paddd .LC0(%rip), %xmm0 movd %xmm0, -4(%rsp) movl -4(%rsp), %eax ret For comparison, here is the code generated by the Intel compiler: add1: movl $1, %edx movd %edi, %xmm1 movd %edx, %xmm0 paddd %xmm0, %xmm1 movd %xmm1, %eax ret -- Summary: Converting between int and vector using intrinsics goes through memory Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jch at pps dot jussieu dot fr GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38015