From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) by sourceware.org (Postfix) with ESMTPS id D89523857405 for ; Wed, 27 Apr 2022 16:23:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D89523857405 Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-2f7bb893309so24522557b3.12 for ; Wed, 27 Apr 2022 09:23:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=+Y7yaFGbQlzHVu/V0kFOS5+sA6hxd9o0EcIjV0/D/fA=; b=hAOBeV2rLYuikjkkp8THDnuDZoKWyNLkoG7w1MJxs2Xox/V+IoKJFqzDrTOghSfDJ4 Q3lgHhoxq9UNfm2ylsia66BFQ1zHrRIpBJS7Hugd9BrbWMqBwERF2N0zcsonGnrWQQAS lwFOQh+gBIqQnq1xVjr9YnCKruK07NKEtyv5YCJjZwzdK89A8GGfTjtH77e4iECBMEe7 UJ80wW2R9ECfWTAzrttnjQ/SkzpkJVBOVWNIsmY6BsVxuvRdInUKkeGy4BeXik7XZLop f8bBR3v+DIf59qCOgPoLAmSLxnINaW1U160kNLym4FDJ2awmrHB09maCLYk2A/JEXceb Rpeg== X-Gm-Message-State: AOAM5309r0MRgYMZui8iGGteQEr+EjG6TaNxM8TUeDyKbyaksCXEFPym gQNSWGOHuAhQ45FPww0vUFvkpSySvbnvN8UGPwY= X-Google-Smtp-Source: ABdhPJyUIp2vYN+X+JOUrjcZwPXmHnJ7EfUMpo8tH1XcMEliAFcRXtYDznoWXwBLaCu3Pz/vmKm2z1R2+FSU7etUhk8= X-Received: by 2002:a0d:dd90:0:b0:2f8:5459:486e with SMTP id g138-20020a0ddd90000000b002f85459486emr1819362ywe.427.1651076628338; Wed, 27 Apr 2022 09:23:48 -0700 (PDT) MIME-Version: 1.0 References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220425163601.3670626-1-goldstein.w.n@gmail.com> <20220425163601.3670626-6-goldstein.w.n@gmail.com> <991f2272-4ae6-b06b-ceb7-184e5d369118@ispras.ru> In-Reply-To: From: Noah Goldstein Date: Wed, 27 Apr 2022 11:23:37 -0500 Message-ID: Subject: Re: [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h To: Alexander Monakov , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Apr 2022 16:23:56 -0000 On Wed, Apr 27, 2022 at 11:17 AM Alexander Monakov wrote: > > On Wed, 27 Apr 2022, Noah Goldstein wrote: > > > > However, when you reroll the loop and overlap two iterations, multiplication > > > by 33*33 no longer has this nice property and runs with two 4cyc paths > > > overlapped (so effective critical path is the same as original). > > > > the 33 * c0 can still use `addl; sall; addl` so not sure what you mean by > > two 4cyc paths overlapped. Its one 4c path. > > > > `imul; addl` and `addl; sall; addl`. > > > > But it's fair that either wait its 4c of computation for 2 iterations. The > > difference is the 5c load latency being amortized over 2 iterations > > or 1 iteration. > > Right, it's one 4c path, I was thinking about something else for a moment. > I'm not sure it's correct to amortize load latency like that, I'd say the > difference is just that the original loop cannot issue two loads at once > because of the dependency in its address computation. > > I see you dropped libc-alpha@ from Cc:, was that intentional? No misclick sorry. Adding it back. I think it is the way you're doing your analysis as a loop-carried dependency. I.e really 7c per iteration with no unroll (although its fair the loads on address can speculate ahead so it will indeed be faster) vs 9c per 2x iterations. > > Alexander