From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id 008933858D32 for ; Mon, 14 Nov 2022 21:49:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 008933858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-pl1-x62b.google.com with SMTP id l2so11320831pld.13 for ; Mon, 14 Nov 2022 13:49:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QKra0Tb3aq1XvWL9gKICiPswH7qKW7dbwMBRFRyBwsY=; b=bweSwSdd53IV2vdDcme5XSRiXIUGGNU3FTvSEmBVK40/FUPD2hHq2y9mULhbl/DJ4T lrBV+LliqfYF61qs/Y2rPtT1wWK+aHhDVQDwzhrjqk/1wa3NI+C69FFVB+1183U6QYSK QKkADYxDBYxF11N7Wro9rh5yixwd0s9emx6dssT+EJAwmExBKUvDVnS6Hq4Ne5wXBeCt cNK4lUIOWhw45skcJQzslvOK0U6YguNJQh6b8kxwztmH7g3+6dw9YMpaBZQz74cpdpfM enPi8aYeML4JSxM2dHW4Gn0jRxnbkUzDjlAEFSF5DZ8sQbTCnKNGQAuJ0+26UdeYClrf VZ8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QKra0Tb3aq1XvWL9gKICiPswH7qKW7dbwMBRFRyBwsY=; b=STse1R1aOCbXQzzfiwKw1fpX0t45f+lan/WYmw1AdklzfQ4RBt+37BTaFAU49f0KMy NVRNRWV4pdJ8oMpNizvn/GnRXpd6ATWy8Y7jKfBibqrpj9Ce3c++CMmuvpNpazzPprKe 8Mf9o86MUdqgOHAAekczs+vt7Cuxe5EOdTSplhy6xg0K31q6d/rrbIPsR9VMq56hKL8+ YECZKD2iK/ScxZ0dWCmTHRQmz3muC4Mca/qKpZGe2SutLaHz9yzoLyismZJlafOr8Ky/ ExRyT8yhbEEKo9V9w+b34/xRP+li9kjH1jM5sTv1ulQxr/Es1u/2bnHkKXxIgG8kBReb 2rag== X-Gm-Message-State: ANoB5plwF4y8XVsvJ4u4EHgP4d6JWXMG19Sp/qWNE8hAFhKvWPTiDDj+ fRojRrA5udYaFp7A5W6ZYATvsdMUdXgkkxqUeVy95A== X-Google-Smtp-Source: AA0mqf50w4nFC5YREB3N5ng71pDiZ/48bK1fd9F1fJ78CSgLPJGcrj1c+qMrfOUQVfAZ/v3K02XwV2h1w4T1zhJCDao= X-Received: by 2002:a17:902:db0a:b0:17c:7aaa:c67d with SMTP id m10-20020a170902db0a00b0017c7aaac67dmr964556plx.171.1668462585914; Mon, 14 Nov 2022 13:49:45 -0800 (PST) MIME-Version: 1.0 References: <20221113230521.712693-1-christoph.muellner@vrull.eu> <20221113230521.712693-8-christoph.muellner@vrull.eu> <7b26ef79-50c7-b4df-f0a7-e2fd40767d8d@gmail.com> In-Reply-To: <7b26ef79-50c7-b4df-f0a7-e2fd40767d8d@gmail.com> From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Mon, 14 Nov 2022 22:49:29 +0100 Message-ID: Subject: Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion To: Jeff Law Cc: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Vineet Gupta Content-Type: multipart/alternative; boundary="000000000000c13c8e05ed753707" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000c13c8e05ed753707 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Nov 14, 2022 at 8:28 PM Jeff Law wrote: > > On 11/13/22 16:05, Christoph Muellner wrote: > > From: Christoph M=C3=BCllner > > > > This patch implements expansions for the cmpstrsi and the cmpstrnsi > > builtins using Zbb instructions (if available). > > This allows to inline calls to strcmp() and strncmp(). > > > > The expansion basically emits a peeled comparison sequence (i.e. a peel= ed > > comparison loop) which compares XLEN bits per step if possible. > > > > The emitted sequence can be controlled, by setting the maximum number > > of compared bytes (-mstring-compare-inline-limit). > > > > gcc/ChangeLog: > > > > * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New > > prototype. > > * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper > > macros. > > (GEN_EMIT_HELPER2): New helper macros. > > (expand_strncmp_zbb_sequence): New function. > > (riscv_emit_str_compare_zbb): New function. > > (riscv_expand_strn_compare): New function. > > * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions > > for strn_compare. > > (cmpstrsi): Invoke expansion functions for strn_compare. > > * config/riscv/riscv.opt: Add new parameter > > '-mstring-compare-inline-limit'. > > Presumably the hybrid inline + out of line approach is to capture the > fact that most strings compare unequal early, then punt out to the > library if they don't follow that model? It looks like we're structured > for that case by peeling iterations rather than having a fully inlined > approach. Just want to confirm... > Yes, this was inspired by gcc/config/rs6000/rs6000-string.cc (e.g. expand_strncmp_gpr_sequence). The current implementation emits an unrolled loop to process up to N characters. For longer strings, we do a handover to libc to process the remainder there. The hand-over implies a call overhead and, of course, a well-optimized str(n)cmp implementation would be beneficial (once we have the information in user space for ifuncs). We can take this further, but then the following questions pop up: * how much data processing per loop iteration? * what about unaligned strings? Happy to get suggestions/opinions for improvement. > I was a bit worried about the "readahead" problem that arises when > reading more than a byte and a NUL is found in the first string. If > you're not careful, the readahead of the second string could fault. But > it looks like we avoid that by requiring word alignment on both strings. > Yes, aligned strings are not affected by the readahead. I wonder if we should add dynamic tests in case the compiler cannot derive XLEN-alignment so we capture more cases (e.g. character-arrays have guaranteed alignment 1, but are allocated with a higher actual alignment on the stack). > > + > > +/* Emit a string comparison sequence using Zbb instruction. > > + > > + OPERANDS[0] is the target (result). > > + OPERANDS[1] is the first source. > > + OPERANDS[2] is the second source. > > + If NO_LENGTH is zero, then: > > + OPERANDS[3] is the length. > > + OPERANDS[4] is the alignment in bytes. > > + If NO_LENGTH is nonzero, then: > > + OPERANDS[3] is the alignment in bytes. > > Ugh. I guess it's inevitable unless we want to drop the array and pass > each element individually (in which case we'd pass a NULL_RTX in the > case we don't have a length argument). > I will split the array into individual rtx arguments as suggested. > I'd like to give others a chance to chime in here. Everything looks > sensible here, but I may have missed something. So give the other > maintainers a couple days to chime in before committing. > > > Jeff > > --000000000000c13c8e05ed753707--