From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25856 invoked by alias); 13 Aug 2014 20:21:06 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 25828 invoked by uid 48); 13 Aug 2014 20:21:02 -0000 From: "glisse at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/62128] New: Use vpalignr for AVX2 rotation Date: Wed, 13 Aug 2014 20:21:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: glisse at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-08/txt/msg00891.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128 Bug ID: 62128 Summary: Use vpalignr for AVX2 rotation Product: gcc Version: 5.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target: x86_64-linux-gnu typedef unsigned char vec __attribute__((vector_size(32))); vec f(vec x){ vec m={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,0}; return __builtin_shuffle(x,m); } We generate, with -O3 -mavx2: vpshufb .LC0(%rip), %ymm0, %ymm1 vpshufb .LC1(%rip), %ymm0, %ymm0 vpermq $78, %ymm1, %ymm1 vpor %ymm1, %ymm0, %ymm0 But unless I am mistaken, a lane swap and vpalignr should do it in 2 instructions and without reading constants from memory. There is a function expand_vec_perm_palignr but it only handles some 128 bit cases. Even for permutations that can be done with a single 256 bit vpalignr instruction, we never seem to generate it.