From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-458102-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5616 invoked by alias); 10 Aug 2014 10:20:49 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 5583 invoked by uid 48); 10 Aug 2014 10:20:43 -0000
From: "beschindler at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/62080] New: Suboptimal code generation with eigen library
Date: Sun, 10 Aug 2014 10:20:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 4.8.3
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: beschindler at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created
Message-ID: <bug-62080-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-08/txt/msg00599.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080

            Bug ID: 62080
           Summary: Suboptimal code generation with eigen library
           Product: gcc
           Version: 4.8.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: beschindler at gmail dot com

Created attachment 33281
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33281&action=edit
Source code used to get the provided assembly

I'm currently optimizing some code using the eigen library and I'm stumbling
over an interesting problem. 
I have a function, which I wrote in two different ways (the attributes are
there to provide some optimization barriers, dimEigen is a member variable of
the containing class): 


void eigenClamp(Eigen::Vector4i& vec) __attribute__((noinline, noclone))
{
    vec = vec.array().min(dimEigen.array()).max(Eigen::Array4i::Zero());
}

void eigenClamp2(Eigen::Vector4i& vec) __attribute__((noinline, noclone))
{
    vec = vec.array().min(dimEigen.array());
    vec = vec.array().max(Eigen::Array4i::Zero());
}

I'm compiling this on a core i7 920 using -O2 -fno-exceptions -fno-rtti
-std=c++11 -march=native

The first function generates this assembly, which looks great: 

movdqu    (%rsi), %xmm1
movdqu    (%rdi), %xmm0
pminsd    %xmm1, %xmm0
pxor    %xmm1, %xmm1
pmaxsd    %xmm1, %xmm0
movdqa    %xmm0, (%rsi)

The second version does this: 

movdqa    (%rsi), %xmm0
pminsd    (%rdi), %xmm0
movdqa    %xmm0, (%rsi) <-- 
pxor    %xmm0, %xmm0
movdqu    (%rsi), %xmm1 <-- 
pmaxsd    %xmm1, %xmm0
movdqa    %xmm0, (%rsi)

It seems, because there are two lines in the original source code, the result
of the first expression is written to memory and then two instructions later,
read back from memory. This makes this function almost 50% slower in what I can
measure. As I find the latter code much easier to read as the former, it would
be great if the same assembly would be generated. 

Also, I note that in the second version, the pminsd is executed directly from
the memory source, while in the first version, it is read to a register and
then pminsd is called. Thus, I'd love to see this code: 

movdqu    (%rsi), %xmm1
pminsd    (%rdi), %xmm1
pxor    %xmm1, %xmm1
pmaxsd    %xmm1, %xmm0
movdqa    %xmm0, (%rsi)

As a reference, I'm attaching the complete source code and the generated
assembly