From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by sourceware.org (Postfix) with ESMTPS id 30CAB3858D20 for ; Fri, 31 Mar 2023 05:06:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 30CAB3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x1031.google.com with SMTP id o6-20020a17090a9f8600b0023f32869993so24266769pjp.1 for ; Thu, 30 Mar 2023 22:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680239195; x=1682831195; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=R0/cJ3XfTY4Y09vl6k5YoaEdRvWMe9fopXP+mkvP8UU=; b=aGpTepSjF+nhAWLBijoKzxUTzHivC/sr5v4KY6FrTGst4wdKAzR9HSl3R2A0DMXfRI 5eKj6/jwPYkgAwT+g/wqPFa7LHGxzeglIOsMtNKPPK6/ALScagMtdymYicN5b/oIyvHG EWLQ515oduUoZvZqR8cHF713NplVtypU8j95v3xnA3AQxq/mj4NptMHw56/+uLEgK0QT +KUJtvfCqk7h1L9gGjiXkRMtAkd5MsCzWUK4KTRK3q/wyQ8njCZFiZWxJvUctwT2/Uvd XAZ7nyB4ntNFCsQ12SIkffW46DONTumCSUY1GILBX+kWgrY45yJADur5edtl8T35aHM0 CYdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680239195; x=1682831195; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=R0/cJ3XfTY4Y09vl6k5YoaEdRvWMe9fopXP+mkvP8UU=; b=tn6BODGV2sNjV7MfPZPAshOGFUthEuLc7tyUdSGHZiEMM0OCm/XJgdqvCYN8n4WMc4 rt/89YFeC1BQvJCt4k75PaE5XUbKkyqAlOjdYsr1mFdp+XL7mcCA08156fgQ9E8tzPMz 4KGxqV/uDfzvcbDX7EBmpwk5qX8f0gXwCnSdM38n4ZOGTzkohz3wVSEPnUPjbr5AUpFy 77MHEuAYRpJ/h4I0Sug8epbP5lj0ZJnhYP2Y+UXJNXw6XNBd8IvL0uVugpwneo9k4wB3 VZTWpQyLZKeoOjqauMekDGXfEsHZt2RDdJ+ol6tM3XSzDSK7ZKaxguh9Yvc6qW1/SqVp DR+g== X-Gm-Message-State: AO0yUKU6G4ZQ8nTOkDDtXAFE6hLD4/OstxAewNJ4MMpqsESlDRCyFntw XEPTV67uu0V0BaKZeRKEt04= X-Google-Smtp-Source: AK7set8pnkwVxtLOZiip0pMFIbCE/m6bGB0LBUe4a4Lr3YAPGIlucwoxOcUHlQPr8NPFfQu+bRWKUQ== X-Received: by 2002:a05:6a20:b806:b0:da:2591:277d with SMTP id fi6-20020a056a20b80600b000da2591277dmr21938977pzb.61.1680239194998; Thu, 30 Mar 2023 22:06:34 -0700 (PDT) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id y1-20020aa78541000000b0056d7cc80ea4sm736255pfn.110.2023.03.30.22.06.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Mar 2023 22:06:34 -0700 (PDT) Message-ID: <03c55d9f-1c36-22e0-ea19-f60fa2cf4263@gmail.com> Date: Thu, 30 Mar 2023 23:06:27 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Content-Language: en-US To: =?UTF-8?Q?Christoph_M=c3=bcllner?= , Xi Ruoyao Cc: libc-alpha@sourceware.org, Palmer Dabbelt , Darius Rad , Andrew Waterman , DJ Delorie , Vineet Gupta , Kito Cheng , Philipp Tomsich , Heiko Stuebner , Adhemerval Zanella References: <20230207001618.458947-1-christoph.muellner@vrull.eu> <20230207001618.458947-17-christoph.muellner@vrull.eu> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2/7/23 07:15, Christoph Müllner wrote: > > diff --git a/sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S > > b/sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S [ ... ] > > + > > +ENTRY_ALIGN (STRCMP, 6) > > +       /* off...delta from src1 to src2.  */ > > +       sub     off, src2, src1 > > +       li      m1, -1 > > +       andi    tmp, off, SZREG-1 > > +       andi    align1, src1, SZREG-1 > > +       bnez    tmp, L(misaligned8) > > +       bnez    align1, L(mutual_align) > > + > > +       .p2align 4 > > +L(loop_aligned): > > +       REG_L   data1, 0(src1) > > +       add     tmp, src1, off > > +       addi    src1, src1, SZREG > > +       REG_L   data2, 0(tmp) So any thoughts on reducing the alignment? Based on the data I've seen we very rarely ever take the branch to L(loop_aligned). So aligning this particular label is of dubious value to begin with. As it stands we have to emit 3 full sized nops to achieve the requested alignment and they can burn most of an issue cycle. While it's highly dependent on pipeline state, there's a reasonable chance of shaving a cycle by reducing the alignment to p2align 3. I haven't done much with analyzing the rest of the code as it just hasn't been hot in any of the cases I've looked at. I'd be comfortable with this going in as-is or with the alignment adjustment. Obviously wiring it up via ifunc is dependent upon settling the kernel->glibc interface. Jeff