From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by sourceware.org (Postfix) with ESMTPS id 2CC17386FC1B for ; Mon, 28 Jun 2021 17:11:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2CC17386FC1B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=N8xCdH0ayOU0iwwLcN7/GpC9KeGAtQxK6HyluXNHYec=; b=TZBFAh8SnqwcMvQ1YA7uePM/24 4FoZGKYAv9fHbLCyiVbs296M9LmK3RyZLXN/f5DGYubktGJMN2oZXHqNXjreRXzq7+AiQYN3f/Qod eX2jV+OSBUtLh1IUgHkoJDJPyasuCHbaWmqx8JyMvA7Yd2MchD/Pv4bav82L4qcyRgehvzFlTdbbK Wp6pQ+RjLJARnJLtDb2RKvuQoVUgQ64Y2DfCMuyPnWcpzuBwwey0wRIJIj9nJQPfSiw2SWVg9QQce WYf8l1jjapiqCnMlEFxl7EF+D/1+7D8WfLfa4T9mTun92p7mT/AKQ15gXcE2oPpwEWwS//vono0tj YCkydJ4g==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1lxumz-00Cc0B-Af; Mon, 28 Jun 2021 17:11:17 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id 395B0982D9E; Mon, 28 Jun 2021 19:11:16 +0200 (CEST) Date: Mon, 28 Jun 2021 19:11:16 +0200 From: Peter Zijlstra To: Thiago Macieira Cc: fweimer@redhat.com, "Enrico Weigelt, metux IT consult" , hjl.tools@gmail.com, libc-alpha@sourceware.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, x86@kernel.org Subject: Re: x86 CPU features detection for applications (and AMX) Message-ID: <20210628171115.GA13401@worktop.programming.kicks-ass.net> References: <22261946.eFiGugXE7Z@tjmaciei-mobl1> <2379132.fg5cGID6mU@tjmaciei-mobl1> <2094802.S4rhTtsRBG@tjmaciei-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2094802.S4rhTtsRBG@tjmaciei-mobl1> X-Spam-Status: No, score=2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2021 17:11:31 -0000 On Mon, Jun 28, 2021 at 09:13:29AM -0700, Thiago Macieira wrote: > On Monday, 28 June 2021 08:27:24 PDT Peter Zijlstra wrote: > > > That's what cpuid is for. With GCC function multi-versioning or equivalent > > > manually-rolled solutions, you can get exactly what you're asking for. > > > > Right, lots of self-modifying code solutions there, some of which can be > > linker driven, some not. In the kernel we use alternative() to replace > > short code sequences depending on CPUID. > > > > Userspace *could* do the same, rewriting code before first execution is > > fairly straight forward. > > Userspace shouldn't do SMC. It's bad enough that JITs without caching exist, > but having pure paged code is better. Pure pages are shared as needed by the > kernel. I don't feel that strongly; if SMC gets you measurable performance gains, go for it. If you're short on memory, buy more. > All you need is a simple bit test. You can then either branch to different > code paths or write to a function pointer so it'll go there directly the next > time. You can also choose to load different plugins depending on what CPU > features were found. Both bit tests and indirect function calls suffer the extra memory load, which is not free. > Consequence: CPU feature checking is done *very* early, often before main(). For the linker based ones, yes. IIRC the ifunc() attribute is particularly useful here.