From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by sourceware.org (Postfix) with ESMTPS id B628B3938C24 for ; Tue, 13 Jul 2021 12:18:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B628B3938C24 Received: by mail-vs1-f54.google.com with SMTP id a66so1772062vsd.10 for ; Tue, 13 Jul 2021 05:18:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=OURPKgkmHkUPP8mThXtysS82IvuLTFGWkz8d15sj0VE=; b=joKd9R4sUvjomhn0NPuS374UYLfLNLkpNZJWkSFIo0aGcrbm+vfJxlPI0L0wS8so69 T+FtpqgzEaxQCqLE0M7rtu5Gg7aCoMeQX89JOuQNH6B0aYfu1Ok2g+wXdd4LRw6Ryfhc kAFgjHE1RdGLuP7ZTyXXR3eRkxFFiMoMzYbTCjEKTtffCJ524Uy0W4I7OneVZQk2R99Y WASxZwKLAjDDoQRh/2tFZEyGOED126E0KkG21Rkj3u5Wzuqs6trOtwj+7txN8Fq/sQ8u b9ZrjfDvu/XqIGzv+JlxuUwqW0VBqLBjSq/tzgHJiCLrCbudH7ZY03jQl0/fkLAr+qDd gjMg== X-Gm-Message-State: AOAM533YW5PQSxtc/D9ZcY6eWtzhIEolvU6lKXXDEIVXPXepi/YQKUl1 +yE1jEbiXYCD06gpqi2Q5OB2NRujOal8gQ== X-Google-Smtp-Source: ABdhPJxYHLbzLhcRoBxpg7iq4FTGdFgGCSUlgpN3AJYnmRlIIO9ibGb6CxTenCI5+e790w+p19+IcQ== X-Received: by 2002:a67:fa16:: with SMTP id i22mr3433894vsq.49.1626178701109; Tue, 13 Jul 2021 05:18:21 -0700 (PDT) Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com. [209.85.217.45]) by smtp.gmail.com with ESMTPSA id g21sm259934vkd.53.2021.07.13.05.18.20 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Jul 2021 05:18:20 -0700 (PDT) Received: by mail-vs1-f45.google.com with SMTP id j8so12171607vsd.0 for ; Tue, 13 Jul 2021 05:18:20 -0700 (PDT) X-Received: by 2002:a67:ec8f:: with SMTP id h15mr5296428vsp.54.1626178700477; Tue, 13 Jul 2021 05:18:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Tue, 13 Jul 2021 14:18:09 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Priority of builtins expansion strategies To: Alexandre Oliva Cc: gcc@gcc.gnu.org, Martin Sebor Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jul 2021 12:18:23 -0000 On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva wrote: > > On Jul 12, 2021, Christoph M=C3=BCllner wrote: > > > * Why does the generic by-pieces infrastructure have a higher priority > > than the target-specific expansion via INSNs like setmem? > > by-pieces was not affected by the recent change, and IMHO it generally > makes sense for it to have priority over setmem. It generates only > straigh-line code for constant-sized blocks. Even if you can beat that > with some machine-specific logic, you'll probably end up generating > equivalent code at least in some cases, and then, you probably want to > carefully tune the settings that select one or the other, or disable > by-pieces altogether. > > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific > looping constructs, if any are available, so setmem takes precedence. > > My testing involved bringing it ahead of the insns, to exercise the code > more thoroughly even on x86*, but the submitted patch only used > by-multiple-pieces as a fallback. Let me give you an example of what by-pieces does on RISC-V (RV64GC). The following code... void* do_memset0_8 (void *p) { return memset (p, 0, 8); } void* do_memset0_15 (void *p) { return memset (p, 0, 15); } ...becomes (you can validate that with compiler explorer): do_memset0_8(void*): sb zero,0(a0) sb zero,1(a0) sb zero,2(a0) sb zero,3(a0) sb zero,4(a0) sb zero,5(a0) sb zero,6(a0) sb zero,7(a0) ret do_memset0_15(void*): sb zero,0(a0) sb zero,1(a0) sb zero,2(a0) sb zero,3(a0) sb zero,4(a0) sb zero,5(a0) sb zero,6(a0) sb zero,7(a0) sb zero,8(a0) sb zero,9(a0) sb zero,10(a0) sb zero,11(a0) sb zero,12(a0) sb zero,13(a0) sb zero,14(a0) ret Here is what a setmemsi expansion in the backend can do (in case unaligned access is cheap): 000000000000003c : 3c: 00053023 sd zero,0(a0) 40: 8082 ret 000000000000007e : 7e: 00053023 sd zero,0(a0) 82: 000533a3 sd zero,7(a0) 86: 8082 ret Is there a way to generate similar code with the by-pieces infrastructure? > > * And if there are no particular reasons, would it be acceptable to > > change the order? > > I suppose moving insns ahead of by-pieces might break careful tuning of > multiple platforms, so I'd rather we did not make that change. Only platforms that have "setmemsi" implemented would be affected. And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium) have a carefully tuned implementation of the setmem expansion. I can't imagine that these setmem expansions produce less optimal code than the by-pieces infrastructure (which has less knowledge about the target). Thanks, Christoph