From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-f174.google.com (mail-il1-f174.google.com [209.85.166.174]) by sourceware.org (Postfix) with ESMTPS id E6674385E007 for ; Thu, 20 May 2021 15:36:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E6674385E007 Received: by mail-il1-f174.google.com with SMTP id l15so10457498iln.8 for ; Thu, 20 May 2021 08:36:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8VfQrs6a4Y7ppU0xIKRfb7qnrLFvac9XNiiNfh1302E=; b=q+dKYzy2oqy4J39cMcHHaRSZN2sZs8LQ4hQxDHpkwaW5hVTcBx5Ug1Vz5vvQknVior aO5bzYjeunIHXmim/cgm6n+cbrkQFW+hXsf4/MAfZ/CQooreotK5vhNjXak4QX3gUFzL 2wc/8hV29KRq9FkKREd45I/Vi1siE+LC+ENzQaTnM2XEe48/wp5UoMX57FRvGcv0jb67 PV4ezC7rXF+4bJy6P8aKLx/3nC6/fQgCtCOb1lp1nNCbazbJSTtMcdwTQRd8/vb4g5c2 6NxrGxXGa/bMzhHE6JPrvMH4bneef1hqD1iIzialxJoIwCBBRAC/RmZgYZEm3aFPiTBc bEvQ== X-Gm-Message-State: AOAM531BUM38JD0xL7fXXrIDyxhvEvGMKtdN85O+C8HI3UwWf8iXVR1C TakKMpCUhmKZ9qWoh3jIUNQQnJ+4z2Or/Fx7BfLhF6n+fFM= X-Google-Smtp-Source: ABdhPJzs3Emj3Jb4mK3XgTEj+frqBNXJkhzs91AdHGmWunPoKGHENUnqY0pTOgY4+Tr8BS10NN6kSMIEa1Bv5UCTJQ8= X-Received: by 2002:a92:cf45:: with SMTP id c5mr6267683ilr.182.1621524989346; Thu, 20 May 2021 08:36:29 -0700 (PDT) MIME-Version: 1.0 References: <20210415044258.GA6318@zn.tnic> <20210415052938.GA2325@1wt.eu> <20210415054713.GB6318@zn.tnic> <20210419141454.GE9093@zn.tnic> <20210419191539.GH9093@zn.tnic> <20210419215809.GJ9093@zn.tnic> <874kf11yoz.ffs@nanos.tec.linutronix.de> In-Reply-To: <874kf11yoz.ffs@nanos.tec.linutronix.de> From: Len Brown Date: Thu, 20 May 2021 11:35:53 -0400 Message-ID: Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Thomas Gleixner Cc: Borislav Petkov , Willy Tarreau , Andy Lutomirski , Florian Weimer , "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , Linux API , "libc-alpha@sourceware.org" , Rich Felker , Kyle Huey , Keno Fischer , Arjan van de Ven Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2021 15:36:31 -0000 Hi Thomas, On Mon, May 17, 2021 at 5:45 AM Thomas Gleixner wrote: > AMX (or whatever comes next) is nothing else than a device and it > just should be treated as such. The fact that it is not exposed > via a driver and a device node does not matter at all. TMM registers are part of the CPU architectural state. If TMM registers exist for one logical CPU, they exist for all CPUs -- including HT siblings. (Intel supports only homogeneous ISA) Ditto for the instructions that access and operate on TMM registers. One can reasonably predict, that like Intel has done for all other registers, there will be future instructions added to the ISA to operate on TMM registers, including in combination with non-TMM registers that are also part of the architectural state. It is an unfortunate word choice that some documentation calls the TMUL instruction an "accelerator". It isn't. It is part of the ISA, like any other instruction. I agree that a device interface may make sense for real accelerators that don't run x86 instructions, I don't see long term viability for attempting to carve a sub-set of x86 instructions into a device, particularly when the set of instructions will continue to evolve. > Not doing so requires this awkward buffer allocation issue via #NM with > all it's downsides; it's just wrong to force the kernel to manage > resources of a user space task without being able to return a proper > error code. The hardware #NM support for fault on first use is a feature to allow the OS to optimize space so that pages do not have to be dedicated to back registers unless/until they are actually used. There is absolutely no requirement that a particular OS take advantage of that feature. If you think that this optimization is awkward, we can easily delete/disable it and simply statically allocate buffers for all threads at initialization time. Though you'll have to convince me why the word "awkward" applies, rather than "elegant". Regarding error return for allocation failures. I'm not familiar with the use-case where vmalloc would be likely to fail today, and I'd be interested if anybody can detail that use-case. But even if there is none today, I grate that Linux could evolve to make vmalloc fail in the future, and so an interface to reqeust pre-allocation of buffers is reasonable insurance. Chang has implemented this prctl in v5 of the TMUL patch series. > It also prevents fine grained control over access to this > functionality. As AMX is clearly a shared resource which is not per HT > thread (maybe not even per core) and it has impact on power/frequency it > is important to be able to restrict access on a per process/cgroup > scope. AMX is analogous to the multiplier used by AVX-512. The architectural state must exist on every CPU, including HT siblings. Today, the HT siblings share the same execution unit, and I have no reason to expect that will change. I thought we already addressed the FUD surrounding power/frequency. As with every kind of instruction -- those that use more power will leave less power for their peers, and there is a mechanism to track that power budget. I acknowledge that the mechanism was overly conservative and slow to recover in initial AVX-512 systems, and that issue persists even with the latest publically available hardware today. I acknowledge that you do not trust that Intel has addressed this (for both AVX-512 and AMX) in the first hardware that supports AMX. > Having a proper interface (syscall, prctl) which user space can use to > ask for permission and allocation of the necessary buffer(s) is clearly > avoiding the downsides and provides the necessary mechanisms for proper > control and failure handling. > > It's not the end of the world if something which wants to utilize this > has do issue a syscall during detection. It does not matter whether > that's a library or just the application code itself. > > That's a one off operation and every involved entity can cache the > result in TLS. > > AVX512 has already proven that XSTATE management is fragile and error > prone, so we really have to stop this instead of creating yet another > half baked solution. We fixed the glibc ABI issue. It is available now and production release is this summer. Yes, it should have been addressed when AVX-512 was deployed. thanks Len Brown, Intel Open Source Technology Center