From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 5ADB83858D35 for ; Wed, 8 May 2024 05:18:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5ADB83858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5ADB83858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; cv=none; b=t6kzPd9Nlq7GUj1uezqrCDKVQ88R6UTKDypcK7/nuy2FZZ7EbI6uZKy5Tnu3CkcBil7Jc+o0fqGuer0fANuk0qTm25JUOWNv0ax+Y8wGCRsMpseArO6Rb4g7TzitJdMXhqQPcBPoGSJyJD0TEIXQ6tpfqxXn7uaSq/tSgBmXwVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; c=relaxed/simple; bh=0VoduqbhACyM6hzER9TGeqrHNtBbVNINER+OLldyBQo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IPXf+P65gx4+TAbEZ+UgEE/+qnmmTD97Ywx1HnmLV6xlIWqUNsSQs+d710JAZA1Zvr75HNm4uFPHvl5a0FNXPzNiUGlkqrGK137ZbML0dIUKH+MhxCjP1Cgd7tq1pcHLd4oFzqvcaf2T7TfPDyGMf9jdUpVPs9GCdcTGFuiALUA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a59cf8140d0so731774566b.3 for ; Tue, 07 May 2024 22:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1715145492; x=1715750292; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lrYwNW2C8Dz5+L1JUWPNeWJ3ymL5g//2p5QxxHfnmts=; b=UCo943bfBN6KmrITnnJ4v/XPOYI40SR/DYeaJkIFXs5M/zGB0uZ2J/f3+4fLOmmUsD lkB25jMWhRasyYiCYUQnPvaYPbcPrJ/0BmhiJ52M6rSIt6anLuBkox9E/xUEJWq0OYxF Cy2fq6n87ZVSqQLGzSMm5ST2/JkTWgFu/r18/mF4tCkIcdekmpJxOe1c617B0KiB1ITk WeBI0LJ0cWEIFQNMHNQXgC/t4POdwjEZvWcRpXe4ir91eHvsi3boj/RZiEVxc0y+lsNe xN0U/zk0raqXXDy+sIL1scDtFfc/0dbT073XiDpmCKN1hTQnirO2ewQA2XQKLirnUQTL rYfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715145492; x=1715750292; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lrYwNW2C8Dz5+L1JUWPNeWJ3ymL5g//2p5QxxHfnmts=; b=ZPBsiBbKoYP1X5H6782K5gFOIRP4Xe8rTMBGRXmibRNSKlpl5Az8SlD4rWTlzL+T7X ZCpJQJfsnDQ3Zua1/VpVBjoA5w3FbvnQMoEeBs800RKiWD4Ktgq0Km2MpVNhY0+xkZx0 0IQHp6Kz/o7lLeklxTgiOvtvFkZtkOHow9TsvSyugknJaQC6N+5o+bFsodYUC8ByVhxR LkGn2+xXFwvBHDQIlWo1atNZ2gDAxgauhKeFPElZBCpYc7v8pne3+IYV3iYB7b6t2Myn LsAEWLjjhr5HRtwUJ1NkZFRqZgGnjsgZAcr4uHGh1myYGvqGee99OfQk7p+Jgj2ZmZ4u jchg== X-Gm-Message-State: AOJu0YwUCdBfH7AVLleikTbiYYCgEeVWyOOa6r/8aFkYMbX0hUzc4RrQ kHbA7K/ALFP3bG+vvNaRSewLWhfSaeSu4cFNLDluuTzcAb3it+i4QqdOnHHKKDnmruSFswCyBNn 6aYg= X-Google-Smtp-Source: AGHT+IFfNCBqtW2Bnet/xskF2NoXkkOpeJFwm8OVwWVFnwqwuj1erAwT2d0VNHl+CL0DVX20LaRRgg== X-Received: by 2002:a50:f681:0:b0:572:7142:4594 with SMTP id 4fb4d7f45d1cf-5731da5b7f8mr1096070a12.29.1715145492181; Tue, 07 May 2024 22:18:12 -0700 (PDT) Received: from antares.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id bo1-20020a0564020b2100b00572a8c07345sm7156555edb.54.2024.05.07.22.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 22:18:11 -0700 (PDT) From: =?UTF-8?q?Christoph=20M=C3=BCllner?= To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?UTF-8?q?Christoph=20M=C3=BCllner?= Subject: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Date: Wed, 8 May 2024 07:17:54 +0200 Message-ID: <20240508051756.3999080-3-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240508051756.3999080-1-christoph.muellner@vrull.eu> References: <20240508051756.3999080-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_MANYTO,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The RISC-V cpymemsi expansion is called, whenever the by-pieces infrastructure will not take care of the builtin expansion. The code emitted by the by-pieces infrastructure may emits code, that includes unaligned accesses if riscv_slow_unaligned_access_p is false. The RISC-V cpymemsi expansion is handled via riscv_expand_block_move(). The current implementation of this function does not check riscv_slow_unaligned_access_p and never emits unaligned accesses. Since by-pieces emits unaligned accesses, it is reasonable to implement the same behaviour in the cpymemsi expansion. And that's what this patch is doing. The patch checks riscv_slow_unaligned_access_p at the entry and sets the allowed alignment accordingly. This alignment is then propagated down to the routines that emit the actual instructions. The changes introduced by this patch can be seen in the adjustments of the cpymem tests. gcc/ChangeLog: * config/riscv/riscv-string.cc (riscv_block_move_straight): Add parameter align. (riscv_adjust_block_mem): Replace parameter length by align. (riscv_block_move_loop): Add parameter align. (riscv_expand_block_move_scalar): Set alignment properly if the target has fast unaligned access. gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access. * gcc.target/riscv/cpymem-64-ooo.c: Likewise. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv-string.cc | 53 +++++++++++-------- .../gcc.target/riscv/cpymem-32-ooo.c | 20 +++++-- .../gcc.target/riscv/cpymem-64-ooo.c | 14 ++++- 3 files changed, 59 insertions(+), 28 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index b09b51d7526..8fc0877772f 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -610,11 +610,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) return false; } -/* Emit straight-line code to move LENGTH bytes from SRC to DEST. +/* Emit straight-line code to move LENGTH bytes from SRC to DEST + with accesses that are ALIGN bytes aligned. Assume that the areas do not overlap. */ static void -riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) +riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length, + unsigned HOST_WIDE_INT align) { unsigned HOST_WIDE_INT offset, delta; unsigned HOST_WIDE_INT bits; @@ -622,8 +624,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) enum machine_mode mode; rtx *regs; - bits = MAX (BITS_PER_UNIT, - MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)))); + bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align)); mode = mode_for_size (bits, MODE_INT, 0).require (); delta = bits / BITS_PER_UNIT; @@ -648,21 +649,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) { src = adjust_address (src, BLKmode, offset); dest = adjust_address (dest, BLKmode, offset); - move_by_pieces (dest, src, length - offset, - MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN); + move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN); } } /* Helper function for doing a loop-based block operation on memory - reference MEM. Each iteration of the loop will operate on LENGTH - bytes of MEM. + reference MEM. Create a new base register for use within the loop and point it to the start of MEM. Create a new memory reference that uses this - register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ + register and has an alignment of ALIGN. Store them in *LOOP_REG + and *LOOP_MEM respectively. */ static void -riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, +riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align, rtx *loop_reg, rtx *loop_mem) { *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); @@ -670,15 +670,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, /* Although the new mem does not refer to a known location, it does keep up to LENGTH bytes of alignment. */ *loop_mem = change_address (mem, BLKmode, *loop_reg); - set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); + set_mem_align (*loop_mem, align); } /* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER - bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that - the memory regions do not overlap. */ + bytes at a time. LENGTH must be at least BYTES_PER_ITER. The alignment + of the access can be set by ALIGN. Assume that the memory regions do not + overlap. */ static void riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, + unsigned HOST_WIDE_INT align, unsigned HOST_WIDE_INT bytes_per_iter) { rtx label, src_reg, dest_reg, final_src, test; @@ -688,8 +690,8 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, length -= leftover; /* Create registers and memory references for use within the loop. */ - riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); - riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); + riscv_adjust_block_mem (src, align, &src_reg, &src); + riscv_adjust_block_mem (dest, align, &dest_reg, &dest); /* Calculate the value that SRC_REG should have after the last iteration of the loop. */ @@ -701,7 +703,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, emit_label (label); /* Emit the loop body. */ - riscv_block_move_straight (dest, src, bytes_per_iter); + riscv_block_move_straight (dest, src, bytes_per_iter, align); /* Move on to the next block. */ riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter)); @@ -713,7 +715,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, /* Mop up any left-over bytes. */ if (leftover) - riscv_block_move_straight (dest, src, leftover); + riscv_block_move_straight (dest, src, leftover, align); else emit_insn(gen_nop ()); } @@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); unsigned HOST_WIDE_INT factor, align; - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); - factor = BITS_PER_WORD / align; + if (riscv_slow_unaligned_access_p) + { + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; + } + else + { + align = hwi_length * BITS_PER_UNIT; + factor = 1; + } if (optimize_function_for_size_p (cfun) && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) @@ -739,7 +749,7 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) { - riscv_block_move_straight (dest, src, INTVAL (length)); + riscv_block_move_straight (dest, src, hwi_length, align); return true; } else if (optimize && align >= BITS_PER_WORD) @@ -759,7 +769,8 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) iter_words = i; } - riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + riscv_block_move_loop (dest, src, bytes, align, + iter_words * UNITS_PER_WORD); return true; } diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c index 33fb9891d82..946a773f77a 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c @@ -64,12 +64,12 @@ COPY_ALIGNED_N(8) /* **copy_11: ** ... -** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],0\([at][0-9]\) ** ... -** sb\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],10\([at][0-9]\) ** sb\t[at][0-9],10\([at][0-9]\) ** ... */ @@ -91,7 +91,12 @@ COPY_ALIGNED_N(11) /* **copy_15: ** ... -** (call|tail)\tmemcpy +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) ** ... */ COPY_N(15) @@ -112,7 +117,12 @@ COPY_ALIGNED_N(15) /* **copy_27: ** ... -** (call|tail)\tmemcpy +** lw\t[at][0-9],20\([at][0-9]\) +** ... +** sw\t[at][0-9],20\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) ** ... */ COPY_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c index 8e40e52fa91..08a927b9483 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c @@ -89,7 +89,12 @@ COPY_ALIGNED_N(11) /* **copy_15: ** ... -** (call|tail)\tmemcpy +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) ** ... */ COPY_N(15) @@ -110,7 +115,12 @@ COPY_ALIGNED_N(15) /* **copy_27: ** ... -** (call|tail)\tmemcpy +** ld\t[at][0-9],16\([at][0-9]\) +** ... +** sd\t[at][0-9],16\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) ** ... */ COPY_N(27) -- 2.44.0