From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 6EE073854801 for ; Wed, 31 Mar 2021 13:47:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6EE073854801 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=none smtp.mailfrom=hubicka@kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id A2618282CEF; Wed, 31 Mar 2021 15:47:33 +0200 (CEST) Date: Wed, 31 Mar 2021 15:47:33 +0200 From: Jan Hubicka To: "H.J. Lu" Cc: Hongtao Liu , GCC Patches , Hongyu Wang Subject: Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake Message-ID: <20210331134733.GB51111@kam.mff.cuni.cz> References: <20210322131636.58461-1-hjl.tools@gmail.com> <20210322131636.58461-2-hjl.tools@gmail.com> <20210322141027.GA11522@kam.mff.cuni.cz> <20210331080516.GA51111@kam.mff.cuni.cz> <20210331134042.GC3851@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210331134042.GC3851@kam.mff.cuni.cz> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Mar 2021 13:47:39 -0000 > > > > > > Patch is OK now. I was wondering about using avx256 for moves of known > > > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > > you take a look at the patch for Skylake: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > I was wondering, if CPU preffers rep movsb when rcx is a compile time > constant, it probably does some logic at the decode time (i.e. expands > it into some sequence) and if so, then it may require the code setting > the register to be near rep (via fusing or simlar mechanism) > > Perhaps we want to have fusing pattern for this, so we do not move them > far apart? Reading through the optimization manual it seems that mosvb is fast for small block no matter if the size is hard wired. In that case you probably want to check whetehr max_size or expected_size is known to be small rather than max_size == min_size and both being small. But it depends on what CPU really does. Honza