From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by sourceware.org (Postfix) with ESMTPS id C3D973856DD3 for ; Mon, 16 May 2022 20:08:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3D973856DD3 Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-e93bbb54f9so21590200fac.12 for ; Mon, 16 May 2022 13:08:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=z1wceWRqLRtRHqXhQxwdqtJRsflgdE8J7EedhQU5xDQ=; b=JHDvz7pC94Rj3OUGI7zx3Vsv9NFcTjcyzbIZscYfr73kO7P5ZDo3b8LR1uYXkScGvk F+MqHng8ri/O3TgZrUUmbHtG8ftVXCkVxc5KBPq2LiLnYeJB21gncX5AEpOOe5oS3MvZ 4dzp6fI9uUPoXGne3+YiJN4+TW38PQJl6UZ+dtiF7yJMYqWzxuZEGDr5zpQtz4a3OPz7 oHKf39wJ7mwKNJ66cvbGgZaRzMEVMY+EoDEck9BaZy7xlm+ScFiLHAl2NutPJJEywMev Jbn1S8rxuNNsFOukL7rWySVGAHP6M0zPqC+cCXHCtZz7RQZL+TYYZ8zc6zaRD7Cn3Qmh OqyQ== X-Gm-Message-State: AOAM530URkg7Jw0PiDVBZAzymiisK5ncn/L5VuEodYeEXpsL1Ga3x5i9 7XdGsNzATCFbHPLIjYPYXLdMhKrqgn0Eew== X-Google-Smtp-Source: ABdhPJzaq6gkSuIbJfO7pAUwg5sBE2gxgtktzBh4VayMysBPdLkfDrpTif0LM1MxkN9F6IUe19T2yA== X-Received: by 2002:a05:6870:a54a:b0:f1:90c9:7abc with SMTP id p10-20020a056870a54a00b000f190c97abcmr4483503oal.17.1652731724192; Mon, 16 May 2022 13:08:44 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:cdd6:545a:8832:b9c2:3a47? ([2804:431:c7cb:cdd6:545a:8832:b9c2:3a47]) by smtp.gmail.com with ESMTPSA id s144-20020acaa996000000b00325cda1ff8dsm4067477oie.12.2022.05.16.13.08.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 May 2022 13:08:43 -0700 (PDT) Message-ID: Date: Mon, 16 May 2022 17:08:41 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: Alexander Monakov Cc: Noah Goldstein , GNU C Library References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> <65cda871-9f96-4792-d0d9-923573ec5abd@linaro.org> <7a2c4ab-fb44-54ec-7780-8134101480a@ispras.ru> From: Adhemerval Zanella In-Reply-To: <7a2c4ab-fb44-54ec-7780-8134101480a@ispras.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 20:08:46 -0000 On 16/05/2022 17:00, Alexander Monakov wrote: >> On 16/05/2022 16:41, Alexander Monakov via Libc-alpha wrote: >>> On Mon, 16 May 2022, Noah Goldstein wrote: >>> >>>>> The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' >>>>> to '(h*32 + h) + c' which looks fine in isolation, but significantly changes >>>>> the dependency graph in context of the whole loop. >>>> >>>> Some architecture could have a really fast integer MADD instruction that >>>> the barrier could either prevent from being emitted or add an extra ADD >>>> instruction at the end of. >>> >>> With the barrier I'd expect a shift-by-5 and two additions, no madd. Modern >>> aarch64 cores have 3-cycle madd I believe, so it's 3 cycles if the compiler >>> decides to emit madd vs. 2 cycles if it's only additions. >>> >>> Alexander >> >> How hard would to make compiler to make this very optimization? I raised >> this on weekly call because more and more it seems that tuning computation >> dependencies for loop tuning seems to be more a compiler job than libc's >> (although this not a blocker, but we have multiple smalls micro-optimizations >> in the past that turned in dead code due compiler catching up). > > Sorry, since you're responding to a discussion about multiply-add, it's unclear > to me which optimization you mean. Is your question about choosing which > sequence of additions has shorter cross-iteration chain? Indeed I was not clear, I mean the reply to [1] where you explain why you have suggested the asm to prevent compiler reassociating. [1] https://sourceware.org/pipermail/libc-alpha/2022-May/138794.html