From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id A2DAD3858D39 for ; Thu, 9 May 2024 14:50:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A2DAD3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A2DAD3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715266212; cv=none; b=dgHHRe4nxleqeflkme7czZUqGcnKsmIXRcSJGgg/K+NycYAxJYP5QA5nYfypshtSZp9F0Hr3eHFu/8THmcZ8unXEp/lelL4/DZwPIcwkNr1t9ttM773Rn46EEXh1P+E1UMIxZjLfQTM+3D33Ogk60EPZfREm7tdkh65kYZ0GSXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715266212; c=relaxed/simple; bh=WO2or2Nrk2fpwUzLSyUpU8zNUHl/URMpgOpirYEu/c4=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=GEBlnNSyBU5pw+1gJKsL4ApA8m7heTccc2to4nSFHUjZEIWFQKYUq+6uMHw4Ajc+r63lQ7fGBrEAh9d2FEcNcQVD4/4FSuR8AMH785UBkQS9CbU9pZwyDkKmYZQJHqKqP0UJBHytO9y+HjkTroafVb5novITKEBU1MdrSiceP3Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-6f06b81676aso523128a34.3 for ; Thu, 09 May 2024 07:50:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715266210; x=1715871010; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=xW5iA1JRdQ+1I0u6y9ei/fbYzPA6WwqbzHqrMzAnyM0=; b=JGOWsyoxJmsUwGBtMs0CmFdTEO3jDe70i8e7B3azPyDjjTao8dq6fyEwq613tpf+YF VSC6XmSEecgwotmG7CKBKHjPVJpaC8P/ke6im1uqfE2/G+aEgZAlXazB0KuTCadrnjPF VGpL0THuwi3Pth2ooLV/4fYifv0QLBel7aQNL8QZIjw6onKRnc6Hicv0LvJ5xRhCHyoQ QnXiEKN6bwOkBxzmxJH4ju+boSmx4kAiMiDjZ2vhY93WKTkvaW2VVGIh7/lfkPG2AUCs GYmSS+rRBh7jPMAwdxZXd2PZekQrs2LQGE91hhDlVjZx2MXtfPx7gVFeNcDT93PotOK2 WleQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715266210; x=1715871010; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xW5iA1JRdQ+1I0u6y9ei/fbYzPA6WwqbzHqrMzAnyM0=; b=vv8IXh7xUn4z90g0hiXopgm+kifMzqD27OPY2XsH+xMSfWX1G2IrhLfAWeuKfz1PQG yxkOEsybkZKcOcrShnx3CrqAmIILulpLnC9Raa2+iWcc4WxqofZ3PvzOjWS2O9jnA2b9 b34HiEo41l2vDU7I23h9LrT72mRbM/Id4ha9V1q53iwIgmydyhZyfh2GkhaQY0RSpl1R SO+Hn51Vz5g+d8Hz0uPaYWJ6GzoMYKOkdsw4Gb618EOoaBDY202sTR7pXiJPiaE3sfZM ea+2qYvfvJ2smFYUdep0nd/dEVEhAZivpillgR2n3DVIYH2rDFbXkwI/3pN96wbMrQUd BSPg== X-Forwarded-Encrypted: i=1; AJvYcCXahOoIDphQvtqcw/k5MLu5pqgHk6aivRR22+bPUhhxVd3sRVCVNJE3sOC1wFxX7QeoLMtImi3P/N59dS0idE8+xFFIZdvJnw== X-Gm-Message-State: AOJu0YzyMTD+cjueptfdFCricMUIcd3XXmtrTDngXtTO8T7rPqiA9+Et yZC4swuvzSw8zgE3zgGWi8WaSDbjutxBQqHTQ8cdKfvzPgCeSYIQ X-Google-Smtp-Source: AGHT+IE1MCN3EEXnxYEPnASclRxczztC3U9m4Lphg+RGDZHXu3uxKn5v31Zu2XBsFRPYXK28E5+2fA== X-Received: by 2002:a05:6830:1e43:b0:6f0:5204:6470 with SMTP id 46e09a7af769-6f0b794140emr5827286a34.15.1715266207642; Thu, 09 May 2024 07:50:07 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-6f0e010f3adsm246321a34.0.2024.05.09.07.50.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 May 2024 07:50:07 -0700 (PDT) Message-ID: <704bfd3e-6273-463e-afbf-76bc81be4f45@gmail.com> Date: Thu, 9 May 2024 08:50:05 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH 2/2] RISC-V: Add cmpmemsi expansion Content-Language: en-US To: =?UTF-8?Q?Christoph_M=C3=BCllner?= , gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Vineet Gupta References: <20240508055217.2038902-1-christoph.muellner@vrull.eu> <20240508055217.2038902-2-christoph.muellner@vrull.eu> From: Jeff Law In-Reply-To: <20240508055217.2038902-2-christoph.muellner@vrull.eu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_MANYTO,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/7/24 11:52 PM, Christoph Müllner wrote: > GCC has a generic cmpmemsi expansion via the by-pieces framework, > which shows some room for target-specific optimizations. > E.g. for comparing two aligned memory blocks of 15 bytes > we get the following sequence: > > my_mem_cmp_aligned_15: > li a4,0 > j .L2 > .L8: > bgeu a4,a7,.L7 > .L2: > add a2,a0,a4 > add a3,a1,a4 > lbu a5,0(a2) > lbu a6,0(a3) > addi a4,a4,1 > li a7,15 // missed hoisting > subw a5,a5,a6 > andi a5,a5,0xff // useless > beq a5,zero,.L8 > lbu a0,0(a2) // loading again! > lbu a5,0(a3) // loading again! > subw a0,a0,a5 > ret > .L7: > li a0,0 > ret > > Diff first byte: 15 insns > Diff second byte: 25 insns > No diff: 25 insns > > Possible improvements: > * unroll the loop and use load-with-displacement to avoid offset increments > * load and compare multiple (aligned) bytes at once > * Use the bitmanip/strcmp result calculation (reverse words and > synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence) > > When applying these improvements we get the following sequence: > > my_mem_cmp_aligned_15: > ld a5,0(a0) > ld a4,0(a1) > bne a5,a4,.L2 > ld a5,8(a0) > ld a4,8(a1) > slli a5,a5,8 > slli a4,a4,8 > bne a5,a4,.L2 > li a0,0 > .L3: > sext.w a0,a0 > ret > .L2: > rev8 a5,a5 > rev8 a4,a4 > sltu a5,a5,a4 > neg a5,a5 > ori a0,a5,1 > j .L3 > > Diff first byte: 11 insns > Diff second byte: 16 insns > No diff: 11 insns > > This patch implements this improvements. > > The tests consist of a execution test (similar to > gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests > that test the expansion conditions (known length and alignment). > > Similar to the cpymemsi expansion this patch does not introduce any > gating for the cmpmemsi expansion (on top of requiring the known length, > alignment and Zbb). > > Bootstrapped and SPEC CPU 2017 tested. > > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (riscv_expand_block_compare): New > prototype. > * config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper. > (do_load_from_addr): Add support for HI and SI/64 modes. > (emit_memcmp_scalar_load_and_compare): New helper to emit memcmp. > (emit_memcmp_scalar_result_calculation): Likewise. > (riscv_expand_block_compare_scalar): Likewise. > (riscv_expand_block_compare): New RISC-V expander for memory compare. > * config/riscv/riscv.md (cmpmemsi): New cmpmem expansion. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/cmpmemsi-1.c: New test. > * gcc.target/riscv/cmpmemsi-2.c: New test. > * gcc.target/riscv/cmpmemsi-3.c: New test. > * gcc.target/riscv/cmpmemsi.c: New test. > > Signed-off-by: Christoph Müllner > --- > gcc/config/riscv/riscv-protos.h | 1 + > gcc/config/riscv/riscv-string.cc | 161 ++++++++++++++++++++ > gcc/config/riscv/riscv.md | 15 ++ > gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c | 6 + > gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c | 42 +++++ > gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c | 43 ++++++ > gcc/testsuite/gcc.target/riscv/cmpmemsi.c | 22 +++ > 7 files changed, 290 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi.c > > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h > index e5aebf3fc3d..30ffe30be1d 100644 > --- a/gcc/config/riscv/riscv-protos.h > +++ b/gcc/config/riscv/riscv-protos.h > @@ -188,6 +188,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt); > rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt); > > /* Routines implemented in riscv-string.c. */ > +extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx); > extern bool riscv_expand_block_move (rtx, rtx, rtx); > > /* Information about one CPU we know about. */ > diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc > index b09b51d7526..9d4dc0cb827 100644 > --- a/gcc/config/riscv/riscv-string.cc > +++ b/gcc/config/riscv/riscv-string.cc > @@ -86,6 +86,7 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2 */ > GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2 */ > GEN_EMIT_HELPER3(xor) /* do_xor3 */ > GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2 */ > +GEN_EMIT_HELPER2(zero_extendhi) /* do_zero_extendhi2 */ > > #undef GEN_EMIT_HELPER2 > #undef GEN_EMIT_HELPER3 > @@ -109,6 +110,10 @@ do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr) > > if (mode == QImode) > do_zero_extendqi2 (dest, mem); > + else if (mode == HImode) > + do_zero_extendhi2 (dest, mem); > + else if (mode == SImode && TARGET_64BIT) > + emit_insn (gen_zero_extendsidi2 (dest, mem)); > else if (mode == Xmode) > emit_move_insn (dest, mem); > else > @@ -610,6 +615,162 @@ riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) > return false; > } > > +static void > +emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2, > + unsigned HOST_WIDE_INT nbytes, > + rtx data1, rtx data2, > + rtx diff_label, rtx final_label) Needs a function comment. > +{ > + const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode); > + rtx src1_addr = force_reg (Pmode, XEXP (src1, 0)); > + rtx src2_addr = force_reg (Pmode, XEXP (src2, 0)); > + unsigned HOST_WIDE_INT offset = 0; > + > + while (nbytes > 0) > + { > + unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes; > + machine_mode load_mode; > + > + /* Special cases to avoid masking of trailing bytes. */ > + if (cmp_bytes == 1) > + load_mode = QImode; > + else if (cmp_bytes == 2) > + load_mode = HImode; > + else if (cmp_bytes == 4) > + load_mode = SImode; > + else > + load_mode = Xmode; > + > + rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset)); > + do_load_from_addr (load_mode, data1, addr1, src1); > + rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset)); > + do_load_from_addr (load_mode, data2, addr2, src2); So we're assuming that something simplifies x + 0 -> x for the first iteration of this loop. I'm not sure that's going to be true for -O0. Can you double check that if this expansion code is used with -O0 that it gets cleaned up? adjust_address or something else might be able to do this for you automagically. > + > +static void > +emit_memcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2) Needs function comment. > + > +static bool > +riscv_expand_block_compare_scalar (rtx result, rtx src1, rtx src2, rtx nbytes) Function comment. > + > + /* We need xlen-aligned memory. */ > + unsigned HOST_WIDE_INT align = MIN (MEM_ALIGN (src1), MEM_ALIGN (src2)); > + if (align < (xlen * BITS_PER_UNIT)) > + return false; Not needed for this submission, but this probably could be relaxed based on uarch tuning. You might also ponder if we can do the result calculation in WORD_MODE. One of the things in my queue to submit is adjustment to the inline strcmp code to do that. When done carefully we can squash out a sign extension. Again, not strictly necessary, but perhaps a followup if you're interested. So generally OK. The only technical concern is generation of x+0 as an address. Depending on what you find there take appropriate action. If nothing is needed in that space, then just fixup the missing comments and this is fine for the trunk. Jeff