From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 66A0B3858CDB for ; Wed, 24 May 2023 09:30:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 66A0B3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-3078a3f3b5fso549315f8f.0 for ; Wed, 24 May 2023 02:30:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684920613; x=1687512613; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=XiB4EGe+RtPuyxuZ01VQOX2vXyBhwcI7Ih5/6iJsx/M=; b=i1TCuhBLFCiSAndwcBL75XMmuQ85T2QRpOXShMB1w2KKtGMeoUH/ayuEinjiO36Lkh uHTbIpcXrchJZcXvBtYd3yGNd43v3Xvp5hbV00JicmVMjGUpvWWE8IETsgaTC6xGQcq0 /IVsueNQlJbMlAEAs0QcxdZ70EDK9MNvlDXq9/aZw7WV+5K/bnpiYJJsuUAP7IgwUYqM MZm2pfOPYn53qWrS2jz09+lo/Y2+jvZDfrsMLGY978/CIzypZwZA4+UqkeSUHnXErj1P kwDHfw805PRPsWkWZqaCubJXFg5tPC53QNlXbatRvJ9f/Y4e3QQIY/SrPcH88UQaSkvy qo/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684920613; x=1687512613; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XiB4EGe+RtPuyxuZ01VQOX2vXyBhwcI7Ih5/6iJsx/M=; b=jIrdfE1S0lBipqEUpB1Gb1t30tPBWhL5LUMDKRvAFS6eZwLesursB70lZOUgrJEnNx z8YyOUJSh66j6yHn5+PI3tjDh4jnIkNEXIQeN/B7mujn/A4tsBBi7ruONYIQFSxOfKG9 iI1orvHEjLcNrDvI2m2kPmU47mcEs9yaJ3emiYqixfd+DcdELcyWTgrlYUm19CfWoTUf ojJr1YLuaxVpY6w5ywpU3CwFaXALO/HRPkCpF1w4aAWZH9kNAWJQeOXgWRP4XrcIwDix NptvaBkAZUP5WJ5y/cao3Fx6CuAa/TiYq7JcFKHGBEctMV1GFEa6dMY47pH1FFor68Cn Yz7Q== X-Gm-Message-State: AC+VfDwKFOcgm5X18HZmMpdWfOIcMr5s+CKCDdGNR4cu12MitzbWfOl8 if4fj7HlVbiB29IPepmCTQAzW84St+qPWODRuMgu7A== X-Google-Smtp-Source: ACHHUZ4gXULRRI3hWqjKGldWbGhHVe5xKDQXXPhMJYilJjEdOeT7BOhk8PGf2F7pisSWl4ehdfMj6GMnyvX4wXd1Ev4= X-Received: by 2002:adf:f787:0:b0:309:4111:d82c with SMTP id q7-20020adff787000000b003094111d82cmr11750926wrp.49.1684920612992; Wed, 24 May 2023 02:30:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Wed, 24 May 2023 14:59:36 +0530 Message-ID: Subject: Re: [aarch64] Code-gen for vector initialization involving constants To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 22 May 2023 at 14:18, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi Richard, > > Thanks for the suggestions. Does the attached patch look OK ? > > Boostrap+test in progress on aarch64-linux-gnu. > > Like I say, please wait for the tests to complete before sending an RFA. > It saves a review cycle if the tests don't in fact pass. Right, sorry, will post patches after completion of testing henceforth. > > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 29dbacfa917..e611a7cca25 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) > > return gen_rtx_PARALLEL (new_mode, vec); > > } > > > > +/* Return true if INSN is a scalar move. */ > > + > > +static bool > > +scalar_move_insn_p (const rtx_insn *insn) > > +{ > > + rtx set = single_set (insn); > > + if (!set) > > + return false; > > + rtx src = SET_SRC (set); > > + rtx dest = SET_DEST (set); > > + return is_a(GET_MODE (dest)) > > + && aarch64_mov_operand_p (src, GET_MODE (src)); > > Formatting: > > return (is_a(GET_MODE (dest)) > && aarch64_mov_operand_p (src, GET_MODE (src))); > > OK with that change if the tests pass, thanks. Unfortunately, the patch regressed vec-init-21.c: int8x16_t f_s8(int8_t x, int8_t y) { return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 }; } -O3 code-gen trunk: f_s8: adrp x2, .LC0 ldr q0, [x2, #:lo12:.LC0] ins v0.b[0], w0 ins v0.b[1], w1 ret -O3 code-gen patch: f_s8: adrp x2, .LC0 ldr d31, [x2, #:lo12:.LC0] adrp x2, .LC1 ldr d0, [x2, #:lo12:.LC1] ins v31.b[0], w0 ins v0.b[0], w1 zip1 v0.16b, v31.16b, v0.16b ret With trunk, it chooses the fallback sequence because both fallback and zip1 sequence had cost = 20, however with patch applied, we end up with zip1 sequence cost = 24 and fallback sequence cost = 28. This happens because of using insn_cost instead of set_rtx_cost for the following expression: (set (reg:QI 100) (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) set_rtx_cost returns 0 for above expression but insn_cost returns 4. This expression template appears twice in fallback sequence, which raises the cost to 28 from 20, while it appears once in each half of zip1 sequence, which raises the cost to 24 from 20, and so it now prefers zip1 sequence instead. I assumed this expression would be ignored because it looks like a scalar move, but that doesn't seem to be the case ? aarch64_classify_symbolic_expression returns SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) and thus aarch64_mov_operand_p returns false. Another issue with the zip1 sequence above is using same register x2 for loading another half of constant in: adrp x2, .LC1 I guess this will create an output dependency from adrp x2, .LC0 -> adrp x2, .LC1 and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 essentially forcing almost the entire sequence (except ins instructions) to execute sequentially ? Fallback sequence rtl, cost = 28 (set (reg:V16QI 96) (const_vector:V16QI [ (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) ])) cost = 12 (set (reg:QI 101) (subreg/s/u:QI (reg/v:SI 93 [ x ]) 0)) cost = 4 (set (reg:V16QI 96) (vec_merge:V16QI (vec_duplicate:V16QI (reg:QI 101)) (reg:V16QI 96) (const_int 1 [0x1]))) cost = 4 (set (reg:QI 102) (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) cost = 4 (set (reg:V16QI 96) (vec_merge:V16QI (vec_duplicate:V16QI (reg:QI 102)) (reg:V16QI 96) (const_int 2 [0x2]))) cost = 4 zip1 sequence rtl, cost = 24 (set (reg:V8QI 97) (const_vector:V8QI [ (const_int 7 [0x7]) (const_int 1 [0x1]) (const_int 3 [0x3]) (const_int 5 [0x5]) (const_int 7 [0x7]) (const_int 9 [0x9]) (const_int 11 [0xb]) (const_int 13 [0xd]) ])) cost = 12 (set (reg:QI 98) (subreg/s/u:QI (reg/v:SI 93 [ x ]) 0)) cost = 4 (set (reg:V8QI 97) (vec_merge:V8QI (vec_duplicate:V8QI (reg:QI 98)) (reg:V8QI 97) (const_int 1 [0x1]))) cost = 4 (set (reg:V8QI 99) (const_vector:V8QI [ (const_int 8 [0x8]) (const_int 2 [0x2]) (const_int 4 [0x4]) (const_int 6 [0x6]) (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ])) cost = 12 (set (reg:QI 100) (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) cost = 4 (set (reg:V8QI 99) (vec_merge:V8QI (vec_duplicate:V8QI (reg:QI 100)) (reg:V8QI 99) (const_int 1 [0x1]))) cost = 4 (set (reg:V16QI 96) (unspec:V16QI [ (subreg:V16QI (reg:V8QI 97) 0) (subreg:V16QI (reg:V8QI 99) 0) ] UNSPEC_ZIP1)) cost = 4 Thanks, Prathamesh > > Richard