From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by sourceware.org (Postfix) with ESMTPS id 9C1203858D39; Tue, 28 Feb 2023 06:43:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9C1203858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x102e.google.com with SMTP id x20-20020a17090a8a9400b00233ba727724so947382pjn.1; Mon, 27 Feb 2023 22:43:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677566614; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=gfXMSyC99MV8g14eiJgnavZk4E401pPbiUtMVfb4FYQ=; b=Ctmm8Y81d7oFmJrlKORCYx52/zikbn1UNe/Tydwp4JyepuJ6Fgj7B3oEC/hlaPqOBI nLV2akOlOAeAHdgEOcxAwRRzOuU4A+Kf7LMd7G80mabFgBGBfucOLd8TQu1RIB3SxP2H obfNRwPBbux/mDeSW+xVoupgYl51v5Nrs+8yqABHeB3SM06LWUEAMLkTS955aFZfpsnu Xjptb6Bk/velw83ASO/DY10HANx8Z9c5f/fgXrHFkrwZudxbQhOgg7DjTxWzC3WdFzE2 lmAqBoL1g9MtSHB1g2N763K8CLSJwmGYBMVJYErX8KyHUxNBZxrrXmSBL0vo9mBTSS6l Iqzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677566614; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=gfXMSyC99MV8g14eiJgnavZk4E401pPbiUtMVfb4FYQ=; b=j8X6xd5qcrr0skaqTd0yYtyE9HUyWqg4enPOPZS97UtEHp16+ddAr0M3qWDTs0L/dU uUkVIA8IOJSoShP+yWvA/uTywDQgppUNs+RKmigya7Zn0bLBAQKWYxST8SgXnF1XiNSJ gf6HRaQm67pPjK0pbjccn09GME7W1t3Hemg6I2ssb2sQuGkcg7ki6wDeR73dEokWAHi3 8M37m7T/ugaWo112lOLtN0ng92Ah6SRT+2yKYgDgetovcjOG0jV8GkOdyLaiNA+46jjA cheKFi1OdDm4UswT451zwLIANHCnXU3iQ1P142YXUy2bzw4Fg9jandbPZUkty/I/d+eN W3cA== X-Gm-Message-State: AO0yUKUF//iSchE984XzhqL4sp4K7jpK8mrwDAaelHeytEVVh4kRmaJh rIipSgxda7viDYioWk2KJIE= X-Google-Smtp-Source: AK7set8gs0DqynwwonoT/lJk2mEz/Cm0F2GDt669iJSrznllW/LQPUnv4nQHn8+zEtTN3bl/xIMbdQ== X-Received: by 2002:a17:902:ed13:b0:19c:dedd:2ace with SMTP id b19-20020a170902ed1300b0019cdedd2acemr1413174pld.18.1677566614290; Mon, 27 Feb 2023 22:43:34 -0800 (PST) Received: from [192.168.255.10] ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id e5-20020a170902744500b001946a3f4d9csm5717071plt.38.2023.02.27.22.43.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Feb 2023 22:43:33 -0800 (PST) Message-ID: <2c3818c6-26bb-830a-fba5-365350e1b703@gmail.com> Date: Tue, 28 Feb 2023 14:43:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2 Subject: Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] To: Xionghu Luo , gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, linkw@gcc.gnu.org References: <20230210025952.1887696-1-xionghuluo@tencent.com> From: Xionghu Luo In-Reply-To: <20230210025952.1887696-1-xionghuluo@tencent.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Segher, Ping this for stage 4... On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote: > Resend this patch... > > v4: Update per comments. > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > Linux. > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_): Remove. > (altivec_vmrghw_direct__be): New pattern for BE. > (altivec_vmrghw_direct__le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_): Remove. > (altivec_vmrglw_direct__be): New pattern for BE. > (altivec_vmrglw_direct__le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo > --- > gcc/config/rs6000/altivec.md | 222 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 24 +-- > gcc/config/rs6000/vsx.md | 28 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++ > 4 files changed, 307 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 30606b8ab21..4bfeecec224 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrghh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_" > +(define_insn "altivec_vmrghw_direct__be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat: > @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct__le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat: > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrghw %x0,%x1,%x2 > vmrghw %0,%1,%2" > @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglb %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 2 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "vmrglh %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 2 "register_operand" "v") > + (match_operand:V8HI 1 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_" > +(define_insn "altivec_vmrglw_direct__be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat: > @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct__le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat: > + (match_operand:VSX_W 2 "register_operand" "wa,v") > + (match_operand:VSX_W 1 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > xxmrglw %x0,%x1,%x2 > vmrglw %0,%1,%2" > @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 16ca3a31757..aba6315cd5f 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > + : CODE_FOR_altivec_vmrglb_direct_le, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be > + : CODE_FOR_altivec_vmrglh_direct_le, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be > + : CODE_FOR_altivec_vmrglw_direct_v4si_le, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be > + : CODE_FOR_altivec_vmrghb_direct_le, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be > + : CODE_FOR_altivec_vmrghh_direct_le, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be > + : CODE_FOR_altivec_vmrghw_direct_v4si_le, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 0865608f94a..f8d2c316a55 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ > - : gen_altivec_vmrglw_direct_; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_ > - : gen_altivec_vmrghw_direct_; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], > + operands[1], > + operands[2])); > + else > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], > + operands[2], > + operands[1])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..c89739ecb55 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,118 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + __builtin_memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +}