From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-f178.google.com (mail-vk1-f178.google.com [209.85.221.178]) by sourceware.org (Postfix) with ESMTPS id 753B3395A832 for ; Tue, 13 Jul 2021 14:04:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 753B3395A832 Received: by mail-vk1-f178.google.com with SMTP id d7so4841690vkf.2 for ; Tue, 13 Jul 2021 07:04:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=vBT3iyoyWkYeHyF9L3EtT+W3QR2wU1YbTz9XNvRoDrQ=; b=d2BmdOPKnoCV3L996WAtxqJjL0FFYln66bDP55JIG4rwWrIP7Mqq51fF64QYrM8FaT XwvlUD1g0FyM5WDqtd6cDBSgkDhMK7YonR2quw9hWxK/9kPQ2hrPN8a3kgV2WWA4V/Wq LRUCLEj8+qImA80D435glzSgXIPm11drW3CQsY5k+/WOJpKv5PCfBBKNVOqQpeMwvNbr 8t7Ya6FFxwnc4zXsgYRyZ/5yQDspBlsQuVEy5dWo0pe0Tz1zmEWecCi8BH0Mi5qz0HBr lH+5CwP6KI8pfy8Q3rljOz56Nclbf02Lve3UZHVYV7WpVoWICMzEYT7AVDdYb3GIOARW gjcg== X-Gm-Message-State: AOAM530IToOFXd7bgp5P7M02sZ8qYZb8mtIEAJ/s+W8iW4O9ykpiy/r4 1pcEkTZHlkKR1ME9iUGyPDwxC0pW8V6lkg== X-Google-Smtp-Source: ABdhPJw0uBy0Gkv1YXDQ2JTO8GYgRJ+AtE3IylH7pM7IE/t5BnJQcAhQuLg1yOTpsrXZ7dqdj0ZUFw== X-Received: by 2002:a1f:c644:: with SMTP id w65mr5229779vkf.12.1626185058721; Tue, 13 Jul 2021 07:04:18 -0700 (PDT) Received: from mail-ua1-f45.google.com (mail-ua1-f45.google.com. [209.85.222.45]) by smtp.gmail.com with ESMTPSA id t188sm2543374vkt.30.2021.07.13.07.04.18 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Jul 2021 07:04:18 -0700 (PDT) Received: by mail-ua1-f45.google.com with SMTP id n61so8659956uan.2 for ; Tue, 13 Jul 2021 07:04:18 -0700 (PDT) X-Received: by 2002:ab0:60d0:: with SMTP id g16mr6027890uam.69.1626185058036; Tue, 13 Jul 2021 07:04:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Tue, 13 Jul 2021 16:04:06 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Priority of builtins expansion strategies To: Richard Biener Cc: Alexandre Oliva , GCC Development , Martin Sebor Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jul 2021 14:04:21 -0000 On Tue, Jul 13, 2021 at 2:59 PM Richard Biener wrote: > > On Tue, Jul 13, 2021 at 2:19 PM Christoph M=C3=BCllner via Gcc > wrote: > > > > On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva wro= te: > > > > > > On Jul 12, 2021, Christoph M=C3=BCllner wrote= : > > > > > > > * Why does the generic by-pieces infrastructure have a higher prior= ity > > > > than the target-specific expansion via INSNs like setmem? > > > > > > by-pieces was not affected by the recent change, and IMHO it generall= y > > > makes sense for it to have priority over setmem. It generates only > > > straigh-line code for constant-sized blocks. Even if you can beat th= at > > > with some machine-specific logic, you'll probably end up generating > > > equivalent code at least in some cases, and then, you probably want t= o > > > carefully tune the settings that select one or the other, or disable > > > by-pieces altogether. > > > > > > > > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific > > > looping constructs, if any are available, so setmem takes precedence. > > > > > > My testing involved bringing it ahead of the insns, to exercise the c= ode > > > more thoroughly even on x86*, but the submitted patch only used > > > by-multiple-pieces as a fallback. > > > > Let me give you an example of what by-pieces does on RISC-V (RV64GC). > > The following code... > > > > void* do_memset0_8 (void *p) > > { > > return memset (p, 0, 8); > > } > > > > void* do_memset0_15 (void *p) > > { > > return memset (p, 0, 15); > > } > > > > ...becomes (you can validate that with compiler explorer): > > > > do_memset0_8(void*): > > sb zero,0(a0) > > sb zero,1(a0) > > sb zero,2(a0) > > sb zero,3(a0) > > sb zero,4(a0) > > sb zero,5(a0) > > sb zero,6(a0) > > sb zero,7(a0) > > ret > > do_memset0_15(void*): > > sb zero,0(a0) > > sb zero,1(a0) > > sb zero,2(a0) > > sb zero,3(a0) > > sb zero,4(a0) > > sb zero,5(a0) > > sb zero,6(a0) > > sb zero,7(a0) > > sb zero,8(a0) > > sb zero,9(a0) > > sb zero,10(a0) > > sb zero,11(a0) > > sb zero,12(a0) > > sb zero,13(a0) > > sb zero,14(a0) > > ret > > > > Here is what a setmemsi expansion in the backend can do (in case > > unaligned access is cheap): > > > > 000000000000003c : > > 3c: 00053023 sd zero,0(a0) > > 40: 8082 ret > > > > 000000000000007e : > > 7e: 00053023 sd zero,0(a0) > > 82: 000533a3 sd zero,7(a0) > > 86: 8082 ret > > > > Is there a way to generate similar code with the by-pieces infrastructu= re? > > Sure - tell it unaligned access is cheap. See alignment_for_piecewise_mo= ve > and how it uses slow_unaligned_access. Thanks for the pointer. I already knew about slow_unaligned_access, but I was not aware of overlap_op_by_pieces_p. Enabling both gives exactly the same as above. Thanks, Christoph > > > > * And if there are no particular reasons, would it be acceptable to > > > > change the order? > > > > > > I suppose moving insns ahead of by-pieces might break careful tuning = of > > > multiple platforms, so I'd rather we did not make that change. > > > > Only platforms that have "setmemsi" implemented would be affected. > > And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium) > > have a carefully tuned > > implementation of the setmem expansion. I can't imagine that these > > setmem expansions > > produce less optimal code than the by-pieces infrastructure (which has > > less knowledge > > about the target). > > > > Thanks, > > Christoph