From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 100205 invoked by alias); 27 Oct 2017 13:22:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 98792 invoked by uid 89); 27 Oct 2017 13:22:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f175.google.com Received: from mail-wr0-f175.google.com (HELO mail-wr0-f175.google.com) (209.85.128.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 27 Oct 2017 13:22:45 +0000 Received: by mail-wr0-f175.google.com with SMTP id o44so6133752wrf.11 for ; Fri, 27 Oct 2017 06:22:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version; bh=iJditXfOlGjd+k+tjD8QsKfdxk+TpB8CYLmfdXB0QnM=; b=F+5Do9m09LeL2LzaiQa1SQ7xgGjM8SmBLOMc27FU4Wbx+lo5X6CJpkSNyXRHizqrNb OLLEy05tQQIXRAIjHCwhS/paxjm+l20gxAZxzmfENNxezg2gidWPzrB3ygP9lTms36PI dFQWqoT1bcEKHbTH5vrCBeBzScM2LqqyOEJR8aE0BrHzTwqxhOsMeU25ighWsczIeuWh S1sms0VV0xB5rGE576SIa8ulu967O4slMU+qCayqIkaAQgwSVjmk2hSn+t1pHRhYEwWi kkYtKNXMaOWZAY5kmmF4i7+TB7872eh7NS8B/uk0GS8fUe7y2WYsUxeUxTVBLlXX/vZX 7VZQ== X-Gm-Message-State: AMCzsaXfvnYiEygVQzZhn/TLRbWwYzWxsvzaIbMyd472fKJuiJWs24Ia EOtpY25P3dVvWDIsrzcSHwDBTQ== X-Google-Smtp-Source: ABhQp+Rkr330psjCmrHsJBkxFPrABoDOWDQNQU9HZYhEY6WIcaTOwgDqhr6dCPUwYzfD692wabu2lg== X-Received: by 10.223.139.82 with SMTP id v18mr479996wra.55.1509110562947; Fri, 27 Oct 2017 06:22:42 -0700 (PDT) Received: from localhost (188.29.164.51.threembb.co.uk. [188.29.164.51]) by smtp.gmail.com with ESMTPSA id j2sm10002107wrj.82.2017.10.27.06.22.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 27 Oct 2017 06:22:42 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org,richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [01/nn] [AArch64] Generate permute patterns using rtx builders References: <873764d8y3.fsf@linaro.org> Date: Fri, 27 Oct 2017 13:23:00 -0000 In-Reply-To: <873764d8y3.fsf@linaro.org> (Richard Sandiford's message of "Fri, 27 Oct 2017 14:19:48 +0100") Message-ID: <87y3nwbu8w.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-10/txt/msg02065.txt.bz2 This patch replaces switch statements that call specific generator functions with code that constructs the rtl pattern directly. This seemed to scale better to SVE and also seems less error-prone. As a side-effect, the patch fixes the REV handling for diff==1, vmode==E_V4HFmode and adds missing support for diff==3, vmode==E_V4HFmode. To compensate for the lack of switches that check for specific modes, the patch makes aarch64_expand_vec_perm_const_1 reject permutes on single-element vectors (specifically V1DImode). 2017-10-27 Richard Sandiford Alan Hayward David Sherwood gcc/ * config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp) (aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev) (aarch64_evpc_dup): Generate rtl direcly, rather than using named expanders. (aarch64_expand_vec_perm_const_1): Explicitly check for permutes of a single element. Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2017-10-27 14:10:08.337833963 +0100 +++ gcc/config/aarch64/aarch64.c 2017-10-27 14:10:14.622293803 +0100 @@ -13475,7 +13475,6 @@ aarch64_evpc_trn (struct expand_vec_perm { unsigned int i, odd, mask, nelt = d->perm.length (); rtx out, in0, in1, x; - rtx (*gen) (rtx, rtx, rtx); machine_mode vmode = d->vmode; if (GET_MODE_UNIT_SIZE (vmode) > 8) @@ -13512,48 +13511,8 @@ aarch64_evpc_trn (struct expand_vec_perm } out = d->target; - if (odd) - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_trn2v16qi; break; - case E_V8QImode: gen = gen_aarch64_trn2v8qi; break; - case E_V8HImode: gen = gen_aarch64_trn2v8hi; break; - case E_V4HImode: gen = gen_aarch64_trn2v4hi; break; - case E_V4SImode: gen = gen_aarch64_trn2v4si; break; - case E_V2SImode: gen = gen_aarch64_trn2v2si; break; - case E_V2DImode: gen = gen_aarch64_trn2v2di; break; - case E_V4HFmode: gen = gen_aarch64_trn2v4hf; break; - case E_V8HFmode: gen = gen_aarch64_trn2v8hf; break; - case E_V4SFmode: gen = gen_aarch64_trn2v4sf; break; - case E_V2SFmode: gen = gen_aarch64_trn2v2sf; break; - case E_V2DFmode: gen = gen_aarch64_trn2v2df; break; - default: - return false; - } - } - else - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_trn1v16qi; break; - case E_V8QImode: gen = gen_aarch64_trn1v8qi; break; - case E_V8HImode: gen = gen_aarch64_trn1v8hi; break; - case E_V4HImode: gen = gen_aarch64_trn1v4hi; break; - case E_V4SImode: gen = gen_aarch64_trn1v4si; break; - case E_V2SImode: gen = gen_aarch64_trn1v2si; break; - case E_V2DImode: gen = gen_aarch64_trn1v2di; break; - case E_V4HFmode: gen = gen_aarch64_trn1v4hf; break; - case E_V8HFmode: gen = gen_aarch64_trn1v8hf; break; - case E_V4SFmode: gen = gen_aarch64_trn1v4sf; break; - case E_V2SFmode: gen = gen_aarch64_trn1v2sf; break; - case E_V2DFmode: gen = gen_aarch64_trn1v2df; break; - default: - return false; - } - } - - emit_insn (gen (out, in0, in1)); + emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1), + odd ? UNSPEC_TRN2 : UNSPEC_TRN1)); return true; } @@ -13563,7 +13522,6 @@ aarch64_evpc_uzp (struct expand_vec_perm { unsigned int i, odd, mask, nelt = d->perm.length (); rtx out, in0, in1, x; - rtx (*gen) (rtx, rtx, rtx); machine_mode vmode = d->vmode; if (GET_MODE_UNIT_SIZE (vmode) > 8) @@ -13599,48 +13557,8 @@ aarch64_evpc_uzp (struct expand_vec_perm } out = d->target; - if (odd) - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_uzp2v16qi; break; - case E_V8QImode: gen = gen_aarch64_uzp2v8qi; break; - case E_V8HImode: gen = gen_aarch64_uzp2v8hi; break; - case E_V4HImode: gen = gen_aarch64_uzp2v4hi; break; - case E_V4SImode: gen = gen_aarch64_uzp2v4si; break; - case E_V2SImode: gen = gen_aarch64_uzp2v2si; break; - case E_V2DImode: gen = gen_aarch64_uzp2v2di; break; - case E_V4HFmode: gen = gen_aarch64_uzp2v4hf; break; - case E_V8HFmode: gen = gen_aarch64_uzp2v8hf; break; - case E_V4SFmode: gen = gen_aarch64_uzp2v4sf; break; - case E_V2SFmode: gen = gen_aarch64_uzp2v2sf; break; - case E_V2DFmode: gen = gen_aarch64_uzp2v2df; break; - default: - return false; - } - } - else - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_uzp1v16qi; break; - case E_V8QImode: gen = gen_aarch64_uzp1v8qi; break; - case E_V8HImode: gen = gen_aarch64_uzp1v8hi; break; - case E_V4HImode: gen = gen_aarch64_uzp1v4hi; break; - case E_V4SImode: gen = gen_aarch64_uzp1v4si; break; - case E_V2SImode: gen = gen_aarch64_uzp1v2si; break; - case E_V2DImode: gen = gen_aarch64_uzp1v2di; break; - case E_V4HFmode: gen = gen_aarch64_uzp1v4hf; break; - case E_V8HFmode: gen = gen_aarch64_uzp1v8hf; break; - case E_V4SFmode: gen = gen_aarch64_uzp1v4sf; break; - case E_V2SFmode: gen = gen_aarch64_uzp1v2sf; break; - case E_V2DFmode: gen = gen_aarch64_uzp1v2df; break; - default: - return false; - } - } - - emit_insn (gen (out, in0, in1)); + emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1), + odd ? UNSPEC_UZP2 : UNSPEC_UZP1)); return true; } @@ -13650,7 +13568,6 @@ aarch64_evpc_zip (struct expand_vec_perm { unsigned int i, high, mask, nelt = d->perm.length (); rtx out, in0, in1, x; - rtx (*gen) (rtx, rtx, rtx); machine_mode vmode = d->vmode; if (GET_MODE_UNIT_SIZE (vmode) > 8) @@ -13691,48 +13608,8 @@ aarch64_evpc_zip (struct expand_vec_perm } out = d->target; - if (high) - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_zip2v16qi; break; - case E_V8QImode: gen = gen_aarch64_zip2v8qi; break; - case E_V8HImode: gen = gen_aarch64_zip2v8hi; break; - case E_V4HImode: gen = gen_aarch64_zip2v4hi; break; - case E_V4SImode: gen = gen_aarch64_zip2v4si; break; - case E_V2SImode: gen = gen_aarch64_zip2v2si; break; - case E_V2DImode: gen = gen_aarch64_zip2v2di; break; - case E_V4HFmode: gen = gen_aarch64_zip2v4hf; break; - case E_V8HFmode: gen = gen_aarch64_zip2v8hf; break; - case E_V4SFmode: gen = gen_aarch64_zip2v4sf; break; - case E_V2SFmode: gen = gen_aarch64_zip2v2sf; break; - case E_V2DFmode: gen = gen_aarch64_zip2v2df; break; - default: - return false; - } - } - else - { - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_zip1v16qi; break; - case E_V8QImode: gen = gen_aarch64_zip1v8qi; break; - case E_V8HImode: gen = gen_aarch64_zip1v8hi; break; - case E_V4HImode: gen = gen_aarch64_zip1v4hi; break; - case E_V4SImode: gen = gen_aarch64_zip1v4si; break; - case E_V2SImode: gen = gen_aarch64_zip1v2si; break; - case E_V2DImode: gen = gen_aarch64_zip1v2di; break; - case E_V4HFmode: gen = gen_aarch64_zip1v4hf; break; - case E_V8HFmode: gen = gen_aarch64_zip1v8hf; break; - case E_V4SFmode: gen = gen_aarch64_zip1v4sf; break; - case E_V2SFmode: gen = gen_aarch64_zip1v2sf; break; - case E_V2DFmode: gen = gen_aarch64_zip1v2df; break; - default: - return false; - } - } - - emit_insn (gen (out, in0, in1)); + emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1), + high ? UNSPEC_ZIP2 : UNSPEC_ZIP1)); return true; } @@ -13742,7 +13619,6 @@ aarch64_evpc_zip (struct expand_vec_perm aarch64_evpc_ext (struct expand_vec_perm_d *d) { unsigned int i, nelt = d->perm.length (); - rtx (*gen) (rtx, rtx, rtx, rtx); rtx offset; unsigned int location = d->perm[0]; /* Always < nelt. */ @@ -13760,24 +13636,6 @@ aarch64_evpc_ext (struct expand_vec_perm return false; } - switch (d->vmode) - { - case E_V16QImode: gen = gen_aarch64_extv16qi; break; - case E_V8QImode: gen = gen_aarch64_extv8qi; break; - case E_V4HImode: gen = gen_aarch64_extv4hi; break; - case E_V8HImode: gen = gen_aarch64_extv8hi; break; - case E_V2SImode: gen = gen_aarch64_extv2si; break; - case E_V4SImode: gen = gen_aarch64_extv4si; break; - case E_V4HFmode: gen = gen_aarch64_extv4hf; break; - case E_V8HFmode: gen = gen_aarch64_extv8hf; break; - case E_V2SFmode: gen = gen_aarch64_extv2sf; break; - case E_V4SFmode: gen = gen_aarch64_extv4sf; break; - case E_V2DImode: gen = gen_aarch64_extv2di; break; - case E_V2DFmode: gen = gen_aarch64_extv2df; break; - default: - return false; - } - /* Success! */ if (d->testing_p) return true; @@ -13796,7 +13654,10 @@ aarch64_evpc_ext (struct expand_vec_perm } offset = GEN_INT (location); - emit_insn (gen (d->target, d->op0, d->op1, offset)); + emit_set_insn (d->target, + gen_rtx_UNSPEC (d->vmode, + gen_rtvec (3, d->op0, d->op1, offset), + UNSPEC_EXT)); return true; } @@ -13805,55 +13666,21 @@ aarch64_evpc_ext (struct expand_vec_perm static bool aarch64_evpc_rev (struct expand_vec_perm_d *d) { - unsigned int i, j, diff, nelt = d->perm.length (); - rtx (*gen) (rtx, rtx); + unsigned int i, j, diff, size, unspec, nelt = d->perm.length (); if (!d->one_vector_p) return false; diff = d->perm[0]; - switch (diff) - { - case 7: - switch (d->vmode) - { - case E_V16QImode: gen = gen_aarch64_rev64v16qi; break; - case E_V8QImode: gen = gen_aarch64_rev64v8qi; break; - default: - return false; - } - break; - case 3: - switch (d->vmode) - { - case E_V16QImode: gen = gen_aarch64_rev32v16qi; break; - case E_V8QImode: gen = gen_aarch64_rev32v8qi; break; - case E_V8HImode: gen = gen_aarch64_rev64v8hi; break; - case E_V4HImode: gen = gen_aarch64_rev64v4hi; break; - default: - return false; - } - break; - case 1: - switch (d->vmode) - { - case E_V16QImode: gen = gen_aarch64_rev16v16qi; break; - case E_V8QImode: gen = gen_aarch64_rev16v8qi; break; - case E_V8HImode: gen = gen_aarch64_rev32v8hi; break; - case E_V4HImode: gen = gen_aarch64_rev32v4hi; break; - case E_V4SImode: gen = gen_aarch64_rev64v4si; break; - case E_V2SImode: gen = gen_aarch64_rev64v2si; break; - case E_V4SFmode: gen = gen_aarch64_rev64v4sf; break; - case E_V2SFmode: gen = gen_aarch64_rev64v2sf; break; - case E_V8HFmode: gen = gen_aarch64_rev64v8hf; break; - case E_V4HFmode: gen = gen_aarch64_rev64v4hf; break; - default: - return false; - } - break; - default: - return false; - } + size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode); + if (size == 8) + unspec = UNSPEC_REV64; + else if (size == 4) + unspec = UNSPEC_REV32; + else if (size == 2) + unspec = UNSPEC_REV16; + else + return false; for (i = 0; i < nelt ; i += diff + 1) for (j = 0; j <= diff; j += 1) @@ -13872,14 +13699,14 @@ aarch64_evpc_rev (struct expand_vec_perm if (d->testing_p) return true; - emit_insn (gen (d->target, d->op0)); + emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), + unspec)); return true; } static bool aarch64_evpc_dup (struct expand_vec_perm_d *d) { - rtx (*gen) (rtx, rtx, rtx); rtx out = d->target; rtx in0; machine_mode vmode = d->vmode; @@ -13901,25 +13728,9 @@ aarch64_evpc_dup (struct expand_vec_perm in0 = d->op0; lane = GEN_INT (elt); /* The pattern corrects for big-endian. */ - switch (vmode) - { - case E_V16QImode: gen = gen_aarch64_dup_lanev16qi; break; - case E_V8QImode: gen = gen_aarch64_dup_lanev8qi; break; - case E_V8HImode: gen = gen_aarch64_dup_lanev8hi; break; - case E_V4HImode: gen = gen_aarch64_dup_lanev4hi; break; - case E_V4SImode: gen = gen_aarch64_dup_lanev4si; break; - case E_V2SImode: gen = gen_aarch64_dup_lanev2si; break; - case E_V2DImode: gen = gen_aarch64_dup_lanev2di; break; - case E_V8HFmode: gen = gen_aarch64_dup_lanev8hf; break; - case E_V4HFmode: gen = gen_aarch64_dup_lanev4hf; break; - case E_V4SFmode: gen = gen_aarch64_dup_lanev4sf; break; - case E_V2SFmode: gen = gen_aarch64_dup_lanev2sf; break; - case E_V2DFmode: gen = gen_aarch64_dup_lanev2df; break; - default: - return false; - } - - emit_insn (gen (out, in0, lane)); + rtx parallel = gen_rtx_PARALLEL (vmode, gen_rtvec (1, lane)); + rtx select = gen_rtx_VEC_SELECT (GET_MODE_INNER (vmode), in0, parallel); + emit_set_insn (out, gen_rtx_VEC_DUPLICATE (vmode, select)); return true; } @@ -13972,7 +13783,7 @@ aarch64_expand_vec_perm_const_1 (struct std::swap (d->op0, d->op1); } - if (TARGET_SIMD) + if (TARGET_SIMD && nelt > 1) { if (aarch64_evpc_rev (d)) return true;