From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by sourceware.org (Postfix) with ESMTPS id 103B53858D3C for ; Mon, 16 May 2022 19:47:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 103B53858D3C Received: by mail-ot1-x32f.google.com with SMTP id z15-20020a9d65cf000000b00605f064482cso10769533oth.6 for ; Mon, 16 May 2022 12:47:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=pp/fTIwn/o2JmITOKmR14mxIeBzW3T7c7iq0vr0Swhc=; b=aOcZ/NP3+4IHahq5ErpJSceOFym+C9WuXSoMDLZMmnsOY6PzTJ6ipSIlw4HIkLB9c8 3y1rdug2nuboUdaLMcW53fWONYFbW6arwkAYXq2IOM/uOQ98rbWBlXCWTiHQuxVarNp3 aYZUQli4XxNoR51rBkZJs6QHPzVF+V0WZbwAmMUqV9PmsbTjmP/hgQKcb+4z89U8w2/l CztEXCrWxa6psgTqNPuaVoHvTif82kbcd8NR3Y6DggogWNq2LBQsR2Rs+Tk0dPYk0ZJo VB37LQx/TPjrAAQEx7HpCpfA8LvTqYJF9TUqOHpdeLx0167nAoTAQ5RhWlTWPXTZ/Xai ifxw== X-Gm-Message-State: AOAM5302GrIirbQrFuouvq3aP+VpZFguYqjyMP3bPn6oS0AV+9rNwpoh g7Iq3/8ksZWM9Z0YJu8wUtoqQGoR3Z8N+Q== X-Google-Smtp-Source: ABdhPJzn592YxVazSSjHImkPlRpMO3u/6uUmmunfKo1gcFbGzBSqm5EpDzBbbwIlHwJP1c9dR55zUA== X-Received: by 2002:a05:6830:1153:b0:606:cc5:32d2 with SMTP id x19-20020a056830115300b006060cc532d2mr6687747otq.145.1652730424278; Mon, 16 May 2022 12:47:04 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:cdd6:545a:8832:b9c2:3a47? ([2804:431:c7cb:cdd6:545a:8832:b9c2:3a47]) by smtp.gmail.com with ESMTPSA id i18-20020aca2b12000000b00326d2cba5d3sm4252080oik.8.2022.05.16.12.47.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 May 2022 12:47:03 -0700 (PDT) Message-ID: <65cda871-9f96-4792-d0d9-923573ec5abd@linaro.org> Date: Mon, 16 May 2022 16:47:00 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: Alexander Monakov , Noah Goldstein Cc: GNU C Library References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 19:47:07 -0000 On 16/05/2022 16:41, Alexander Monakov via Libc-alpha wrote: > On Mon, 16 May 2022, Noah Goldstein wrote: > >>> The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' >>> to '(h*32 + h) + c' which looks fine in isolation, but significantly changes >>> the dependency graph in context of the whole loop. >> >> Some architecture could have a really fast integer MADD instruction that >> the barrier could either prevent from being emitted or add an extra ADD >> instruction at the end of. > > With the barrier I'd expect a shift-by-5 and two additions, no madd. Modern > aarch64 cores have 3-cycle madd I believe, so it's 3 cycles if the compiler > decides to emit madd vs. 2 cycles if it's only additions. > > Alexander How hard would to make compiler to make this very optimization? I raised this on weekly call because more and more it seems that tuning computation dependencies for loop tuning seems to be more a compiler job than libc's (although this not a blocker, but we have multiple smalls micro-optimizations in the past that turned in dead code due compiler catching up).