From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F2A7C3858CDB for ; Wed, 24 May 2023 10:10:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2A7C3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA3E91042; Wed, 24 May 2023 03:11:23 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3B87E3F67D; Wed, 24 May 2023 03:10:38 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [aarch64] Code-gen for vector initialization involving constants References: Date: Wed, 24 May 2023 11:10:37 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Wed, 24 May 2023 14:59:36 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-28.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > On Mon, 22 May 2023 at 14:18, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> > Thanks for the suggestions. Does the attached patch look OK ? >> > Boostrap+test in progress on aarch64-linux-gnu. >> >> Like I say, please wait for the tests to complete before sending an RFA. >> It saves a review cycle if the tests don't in fact pass. > Right, sorry, will post patches after completion of testing henceforth. >> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> > index 29dbacfa917..e611a7cca25 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) >> > return gen_rtx_PARALLEL (new_mode, vec); >> > } >> > >> > +/* Return true if INSN is a scalar move. */ >> > + >> > +static bool >> > +scalar_move_insn_p (const rtx_insn *insn) >> > +{ >> > + rtx set = single_set (insn); >> > + if (!set) >> > + return false; >> > + rtx src = SET_SRC (set); >> > + rtx dest = SET_DEST (set); >> > + return is_a(GET_MODE (dest)) >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); >> >> Formatting: >> >> return (is_a(GET_MODE (dest)) >> && aarch64_mov_operand_p (src, GET_MODE (src))); >> >> OK with that change if the tests pass, thanks. > Unfortunately, the patch regressed vec-init-21.c: > > int8x16_t f_s8(int8_t x, int8_t y) > { > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > 7, 8, 9, 10, 11, 12, 13, 14 }; > } > > -O3 code-gen trunk: > f_s8: > adrp x2, .LC0 > ldr q0, [x2, #:lo12:.LC0] > ins v0.b[0], w0 > ins v0.b[1], w1 > ret > > -O3 code-gen patch: > f_s8: > adrp x2, .LC0 > ldr d31, [x2, #:lo12:.LC0] > adrp x2, .LC1 > ldr d0, [x2, #:lo12:.LC1] > ins v31.b[0], w0 > ins v0.b[0], w1 > zip1 v0.16b, v31.16b, v0.16b > ret > > With trunk, it chooses the fallback sequence because both fallback > and zip1 sequence had cost = 20, however with patch applied, > we end up with zip1 sequence cost = 24 and fallback sequence > cost = 28. > > This happens because of using insn_cost instead of > set_rtx_cost for the following expression: > (set (reg:QI 100) > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > set_rtx_cost returns 0 for above expression but insn_cost returns 4. Yeah, was wondering why you'd dropped the set_rtx_cost thing, but decided not to question it since using insn_cost seemed reasonable if it worked. > This expression template appears twice in fallback sequence, which raises > the cost to 28 from 20, while it appears once in each half of zip1 sequence, > which raises the cost to 24 from 20, and so it now prefers zip1 sequence > instead. > > I assumed this expression would be ignored because it looks like a scalar move, > but that doesn't seem to be the case ? > aarch64_classify_symbolic_expression returns > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > and thus aarch64_mov_operand_p returns false. Ah, I guess it should be aarch64_mov_operand instead. Confusing that they're so different... > Another issue with the zip1 sequence above is using same register x2 > for loading another half of constant in: > adrp x2, .LC1 > > I guess this will create an output dependency from adrp x2, .LC0 -> > adrp x2, .LC1 > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 > essentially forcing almost the entire sequence (except ins > instructions) to execute sequentially ? I'd expect modern cores to handle that via renaming. Thanks, Richard