From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5616 invoked by alias); 10 Aug 2014 10:20:49 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 5583 invoked by uid 48); 10 Aug 2014 10:20:43 -0000 From: "beschindler at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/62080] New: Suboptimal code generation with eigen library Date: Sun, 10 Aug 2014 10:20:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 4.8.3 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: beschindler at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-08/txt/msg00599.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080 Bug ID: 62080 Summary: Suboptimal code generation with eigen library Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: beschindler at gmail dot com Created attachment 33281 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33281&action=edit Source code used to get the provided assembly I'm currently optimizing some code using the eigen library and I'm stumbling over an interesting problem. I have a function, which I wrote in two different ways (the attributes are there to provide some optimization barriers, dimEigen is a member variable of the containing class): void eigenClamp(Eigen::Vector4i& vec) __attribute__((noinline, noclone)) { vec = vec.array().min(dimEigen.array()).max(Eigen::Array4i::Zero()); } void eigenClamp2(Eigen::Vector4i& vec) __attribute__((noinline, noclone)) { vec = vec.array().min(dimEigen.array()); vec = vec.array().max(Eigen::Array4i::Zero()); } I'm compiling this on a core i7 920 using -O2 -fno-exceptions -fno-rtti -std=c++11 -march=native The first function generates this assembly, which looks great: movdqu (%rsi), %xmm1 movdqu (%rdi), %xmm0 pminsd %xmm1, %xmm0 pxor %xmm1, %xmm1 pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) The second version does this: movdqa (%rsi), %xmm0 pminsd (%rdi), %xmm0 movdqa %xmm0, (%rsi) <-- pxor %xmm0, %xmm0 movdqu (%rsi), %xmm1 <-- pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) It seems, because there are two lines in the original source code, the result of the first expression is written to memory and then two instructions later, read back from memory. This makes this function almost 50% slower in what I can measure. As I find the latter code much easier to read as the former, it would be great if the same assembly would be generated. Also, I note that in the second version, the pminsd is executed directly from the memory source, while in the first version, it is read to a register and then pminsd is called. Thus, I'd love to see this code: movdqu (%rsi), %xmm1 pminsd (%rdi), %xmm1 pxor %xmm1, %xmm1 pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) As a reference, I'm attaching the complete source code and the generated assembly