From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id C70663858D37 for ; Wed, 24 May 2023 05:47:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C70663858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-6af873d1d8bso165216a34.3 for ; Tue, 23 May 2023 22:47:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1684907237; x=1687499237; h=mime-version:user-agent:message-id:date:errors-to:organization :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=U0o19aI0KkWpbn3DzU6ZkPAKzoO/szppbMfH4iPxYS4=; b=jkuINT5LL2hDIQaRdk/Vr789+R90PWF3Syutd9JicYEldRP4wGj6g1njx6J5rETc9+ 9hjFP4qO7NYuvRbsZnSLtS+FzDmAIJBwHj6FOLBIqnDP96JRU2xVQ2lgOnE4ZulyOBJj jctaqsoMDIy7wfK5pJM40BZlhQo9s2XSLtAPPTKTuNVRbagtBrY1aSckwYW2irbZnjkZ o6bththJJapJeyrO+55wycWTA/cYiVrJ08FmDzFOdVg3F+dEAchTbh5HacQIxsfOsPOd vB0/BlP5uqOArlN5Ct8g1wuI7z+Y4pl0v4yKym65IT/w0vuOoCFSHwbRzcmwEt17h+hZ HuDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684907237; x=1687499237; h=mime-version:user-agent:message-id:date:errors-to:organization :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=U0o19aI0KkWpbn3DzU6ZkPAKzoO/szppbMfH4iPxYS4=; b=a9OJec59Bq5gV+Y35PtdFwDzXOSanFRx1/GRDckOPGeJpplD4bMR0JyLO4lhHfGL3w TrIRJqfQqTsAmeNQ9unaWBVudkdjabPQktE9I+ReIVbJ7WwCE4Ure8pd7bM+sMYtBEEZ Kw+PlRDMq+pfzhLno2XxmO+omhRdl/qaBDOlaNLh9x6/TS8kW3kbmaZyUw4Y9Oq1wwzH Kd06554Iyk+Bkx1encZNJW5N84SX6wGJkstNKLT4V3S+rIvD85s6QiwDYAAaj2jIJt0K O907swJUXiNmvoawbVlrRjRdpYNw3IiIhulHgViM3WD99qsAv1FNAeVonF7mePcB8F01 YS9w== X-Gm-Message-State: AC+VfDwRD704qzYjLFM7Erjc953VlH1YiO2fhbF13H79A0LrNZryvtef f9zMwpEHmw0Fb021MlAhp7j57VZ7hpMq1kfRGUo= X-Google-Smtp-Source: ACHHUZ7Lp2cTHlQq2Q1CI5TWO36DWgB1m72nkFlwzB7efql1VY1DCa13DAsEuszCCkxc9GK084SE3w== X-Received: by 2002:a05:6830:17:b0:6af:95f9:7adc with SMTP id c23-20020a056830001700b006af95f97adcmr1296440otp.14.1684907237061; Tue, 23 May 2023 22:47:17 -0700 (PDT) Received: from free.home ([2804:7f1:2080:6383:46d9:ede8:ee97:8cc0]) by smtp.gmail.com with ESMTPSA id c10-20020a9d75ca000000b00697be532609sm4192397otl.73.2023.05.23.22.47.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 May 2023 22:47:16 -0700 (PDT) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 34O5l7YO3582075 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 24 May 2023 02:47:07 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org, "H.J. Lu" Cc: Jan Hubicka , Uros Bizjak Subject: [PATCH] [x86] reenable dword MOVE_MAX for better memmove inlining Organization: Free thinker, does not speak for AdaCore Errors-To: aoliva@lxoliva.fsfla.org Date: Wed, 24 May 2023 02:47:07 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: MOVE_MAX on x86* used to accept up to 16 bytes, even without SSE, which enabled inlining of small memmove by loading and then storing the entire range. After the "x86: Update piecewise move and store" r12-2666 change, memmove of more than 4 bytes would not be inlined in gimple_fold_bultin_memory_op, failing the expectations of a few tests. I can see how lowering it for MOVE_MAX_PIECES can get us better codegen decisions overall, but surely inlining memmove with 2 32-bit loads and stores is better than an outline call that requires setting up 3 arguments. I suppose even 3 or 4 could do better. But maybe it is gimple_fold_builtin_memory_op that needs tweaking? Anyhow, this patch raises MOVE_MAX back a little for non-SSE targets, while preserving the new value for MOVE_MAX_PIECES. Bootstrapped on x86_64-linux-gnu. Also tested on ppc- and x86-vx7r2 with gcc-12. for gcc/ChangeLog * config/i386/i386.h (MOVE_MAX): Rename to... (MOVE_MAX_VEC): ... this. Add NONVEC parameter, and use it as the last resort, instead of UNITS_PER_WORD. (MOVE_MAX): Reintroduce in terms of MOVE_MAX_VEC, with 2*UNITS_PER_WORD. (MOVE_MAX_PIECES): Likewise, but with UNITS_PER_WORD. --- gcc/config/i386/i386.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index c7439f89bdf92..5293a332a969a 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1801,7 +1801,9 @@ typedef struct ix86_args { is the number of bytes at a time which we can move efficiently. MOVE_MAX_PIECES defaults to MOVE_MAX. */ -#define MOVE_MAX \ +#define MOVE_MAX MOVE_MAX_VEC (2 * UNITS_PER_WORD) +#define MOVE_MAX_PIECES MOVE_MAX_VEC (UNITS_PER_WORD) +#define MOVE_MAX_VEC(NONVEC) \ ((TARGET_AVX512F \ && (ix86_move_max == PVW_AVX512 \ || ix86_store_max == PVW_AVX512)) \ @@ -1813,7 +1815,7 @@ typedef struct ix86_args { : ((TARGET_SSE2 \ && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ - ? 16 : UNITS_PER_WORD))) + ? 16 : (NONVEC)))) /* STORE_MAX_PIECES is the number of bytes at a time that we can store efficiently. Allow 16/32/64 bytes only if inter-unit move is enabled -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about