From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 801353858D20 for ; Wed, 15 May 2024 21:16:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 801353858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 801353858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::531 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715807797; cv=none; b=tp+RYtfv3r8VZrw509IEtAEuvvxZwA6hfE5NRK4VqHWDSxR5W/lKGmAsbkJR8pgFC+pvYUjtJHNMZyxL02H7BQ0vMLC3ujNScUEsMRcYDMlqxPVrqQ5zb8vhXr7BdX03zXxN0UFIADKUZcQkOSakJkK8Y0l2B3LVYqFS+npi8a8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715807797; c=relaxed/simple; bh=6BYFwIPnehHuDD5tJCtv51JE9BjHqLQMsgzrznPCBD4=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=moYXH+5lpYoPXVOwqQDLG5R3n6sZayb6sG9ndA+CfbbnKCfWgQERh35J6NrBp1+L9h5Ze6cZ/EMmuCyAHxjyJxo8ngWF9Dm5cJwL/j/kYB6Hk2N/YNx4KVdy7Wj0m6vtRNn0HC5bjNBd2fgrzGraUbdfaRaU2dnqP/ZbA3+TGvI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-5f415fd71f8so5615778a12.3 for ; Wed, 15 May 2024 14:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715807794; x=1716412594; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=MgS64l0Br5pqY4+N6bmHWhqEuyAliBNMdcJnaZM/Cgg=; b=iWtAgCVl/aHXGucBYYEa4CJf3KUQ5vb5TX6OfHaDa3/cpAHKEL7CFSbq+N/HcgFjAL r/6y6/zehXiWT9a/4L6Ch7K4CcQ7qEL0wzmTU96M2JrxLzvRjl+m4u0N8kFYC9Dngj/E lLftHFWLvStJd4aneLQNpry0C/Ks/bDy4ZT/Ki4bLCREd9z4lLMRCGPfTNAtix+hObpI gcaPsK1WSMTaE40fAdUL9AmW68VDkuypC1agwdY5+Qxhxn/aonIcWq5T2tMyBN5DBYzb aY+7nkA21FT8DJp0elmKU2TpMRe+OTrVR7CwRRiYlDUg8NzsSwvi4L1jtbG64BQCXMC1 NnGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715807794; x=1716412594; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MgS64l0Br5pqY4+N6bmHWhqEuyAliBNMdcJnaZM/Cgg=; b=qT+1Bx8o3absYe2kWubjVwP/23NHWQe1WfF4ws0+OxzW3r+DpesNwkmY3Xr9M6Eq/k bSw3rrbmNajMVnI5SPLYEr+BQFy9jEPJpCC4Jff/uZyAW4faZ659uiAduYTWnlGtUc4u sgu9LL6j2MrjHE0a+i9UnfBTg3dTGgThq/746Q8Xm4rFLgf+0LKfC7C6oU+KPsZIEgQQ K4DFIWHt2jNXxaRC3+6StQ5sa42BRe5DeeWNW44v4xDCKkSMr82mbrP/kqHVQUrn29Qt kLa4+cENy/F2ThYPMGvOla1SZCt7s7PX/k3OkbpgY83qZ6+kl6w5OHJyil1CzFQx5ngL 5ocA== X-Forwarded-Encrypted: i=1; AJvYcCVgG/ZKmhUYorJK4GzqEvEpeJl0n8FAMJDVBdst3YDaqiPZoEAu9DusSd8FC3dI9ih1DZbue7pGKgUuZJN6tIQf+rBMluGf5A== X-Gm-Message-State: AOJu0YzSwGAxTU0xqXLIgHsFSgWSCvZlcq14HcoAVPc8mKj8Q9nlLzw2 R6LqXSULZpu8pkQUWHk9BOTgejuvdD4XtOZUZAy/65UFI0/SQNne X-Google-Smtp-Source: AGHT+IG3ZMuMxPcPHwHVfSrKespocGuqPzuul9xEuUCcVa7YSk7rHDosn/56GDeY0j/VdVh2vgxdEg== X-Received: by 2002:a17:90a:1f04:b0:2af:8fa4:40e with SMTP id 98e67ed59e1d1-2b6cc340faemr14473953a91.1.1715807794291; Wed, 15 May 2024 14:16:34 -0700 (PDT) Received: from [172.31.1.124] ([172.56.169.222]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2b67126ac21sm12278329a91.30.2024.05.15.14.16.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 May 2024 14:16:33 -0700 (PDT) Message-ID: <9da3cb08-7014-4c0c-8e6a-c148722de991@gmail.com> Date: Wed, 15 May 2024 15:16:30 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH v2 1/2] RISC-V: Add cmpmemsi expansion Content-Language: en-US To: =?UTF-8?Q?Christoph_M=C3=BCllner?= , gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Vineet Gupta References: <20240515064926.2441787-1-christoph.muellner@vrull.eu> From: Jeff Law In-Reply-To: <20240515064926.2441787-1-christoph.muellner@vrull.eu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_MANYTO,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/15/24 12:49 AM, Christoph Müllner wrote: > GCC has a generic cmpmemsi expansion via the by-pieces framework, > which shows some room for target-specific optimizations. > E.g. for comparing two aligned memory blocks of 15 bytes > we get the following sequence: > > my_mem_cmp_aligned_15: > li a4,0 > j .L2 > .L8: > bgeu a4,a7,.L7 > .L2: > add a2,a0,a4 > add a3,a1,a4 > lbu a5,0(a2) > lbu a6,0(a3) > addi a4,a4,1 > li a7,15 // missed hoisting > subw a5,a5,a6 > andi a5,a5,0xff // useless > beq a5,zero,.L8 > lbu a0,0(a2) // loading again! > lbu a5,0(a3) // loading again! > subw a0,a0,a5 > ret > .L7: > li a0,0 > ret > > Diff first byte: 15 insns > Diff second byte: 25 insns > No diff: 25 insns > > Possible improvements: > * unroll the loop and use load-with-displacement to avoid offset increments > * load and compare multiple (aligned) bytes at once > * Use the bitmanip/strcmp result calculation (reverse words and > synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence) > > When applying these improvements we get the following sequence: > > my_mem_cmp_aligned_15: > ld a5,0(a0) > ld a4,0(a1) > bne a5,a4,.L2 > ld a5,8(a0) > ld a4,8(a1) > slli a5,a5,8 > slli a4,a4,8 > bne a5,a4,.L2 > li a0,0 > .L3: > sext.w a0,a0 > ret > .L2: > rev8 a5,a5 > rev8 a4,a4 > sltu a5,a5,a4 > neg a5,a5 > ori a0,a5,1 > j .L3 > > Diff first byte: 11 insns > Diff second byte: 16 insns > No diff: 11 insns > > This patch implements this improvements. > > The tests consist of a execution test (similar to > gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests > that test the expansion conditions (known length and alignment). > > Similar to the cpymemsi expansion this patch does not introduce any > gating for the cmpmemsi expansion (on top of requiring the known length, > alignment and Zbb). > > Bootstrapped and SPEC CPU 2017 tested. > > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (riscv_expand_block_compare): New > prototype. > * config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper > for zero_extendhi. > (do_load_from_addr): Add support for HI and SI/64 modes. > (do_load): Add helper for zero-extended loads. > (emit_memcmp_scalar_load_and_compare): New helper to emit memcmp. > (emit_memcmp_scalar_result_calculation): Likewise. > (riscv_expand_block_compare_scalar): Likewise. > (riscv_expand_block_compare): New RISC-V expander for memory compare. > * config/riscv/riscv.md (cmpmemsi): New cmpmem expansion. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/cmpmemsi-1.c: New test. > * gcc.target/riscv/cmpmemsi-2.c: New test. > * gcc.target/riscv/cmpmemsi-3.c: New test. > * gcc.target/riscv/cmpmemsi.c: New test. [ ... ] I fixed some of the nits from the linter (whitespace stuff) and pushed both patches of this series. Jeff