From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by sourceware.org (Postfix) with ESMTPS id CC2EF383F40A for ; Fri, 21 May 2021 23:31:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CC2EF383F40A Received: by mail-ej1-f41.google.com with SMTP id lz27so32733385ejb.11 for ; Fri, 21 May 2021 16:31:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wV8Jn7oT1/R7b2roeYS63X9RldUmuefktohm7BDX3HU=; b=KcSDeGaoPMtpnKqbBQTHGfvW3v6OUAZbrg90TqFvO4uIDjnfI8pZqSjhiFgLO4cliJ MEb9iK+SpzXsuEbcy5xp55tGwmIE/H0pijxPROVH3fuyMgSb8liCsAtK88ELdkl1OHza qqCzRFw14tAqdo08FjlViE7eDaP7YGiYew2LypvGjzKQWU705wcXLmeaHlFvQohFmVtU GBRWpOnAySQFrvLRkk1YqCkrqGqNn2LQITM065Qy1a75Sfgv5Fejxod/pR03ohGDbblQ DWQwnNaFJJvcn9TCJ5mmHA6CNyUUejF5lw2c2MtMYsHTR6JN1dXNqArsijAheMXe5FQo i3FQ== X-Gm-Message-State: AOAM533XbfMtgbhBarVXELI+eNzGygCqrNUr2Bfv3Cp9Ektd2McWB1sT 4emkzSAjoZKwBdrEBsBcEBCgBDGS9njy3yaR0DA= X-Google-Smtp-Source: ABdhPJz4C5ZOddqS3tbZaqexGV5+LYrWfAAC+etCAF9RcobtSRavqwUwpidy3DCbGV2yMd2mlkf/1V/4C40J4ZVhZ60= X-Received: by 2002:a17:906:1dd1:: with SMTP id v17mr12370205ejh.31.1621639907917; Fri, 21 May 2021 16:31:47 -0700 (PDT) MIME-Version: 1.0 References: <20210415044258.GA6318@zn.tnic> <20210419141454.GE9093@zn.tnic> <20210419191539.GH9093@zn.tnic> <20210419215809.GJ9093@zn.tnic> <874kf11yoz.ffs@nanos.tec.linutronix.de> <87k0ntazyn.ffs@nanos.tec.linutronix.de> <37833625-3e6b-5d93-cc4d-26164d06a0c6@intel.com> <9c8138eb-3956-e897-ed4e-426bf6663c11@intel.com> <87pmxk87th.fsf@oldenburg.str.redhat.com> <939ec057-3851-d8fb-7b45-993fa07c4cb5@intel.com> <87r1i06ow2.fsf@oldenburg.str.redhat.com> <263a58a9-26d5-4e55-b3e1-3718baf1b81d@www.fastmail.com> <87k0nraonu.ffs@nanos.tec.linutronix.de> <878s47aeni.ffs@nanos.tec.linutronix.de> In-Reply-To: <878s47aeni.ffs@nanos.tec.linutronix.de> From: Len Brown Date: Fri, 21 May 2021 19:31:36 -0400 Message-ID: Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Thomas Gleixner Cc: Andy Lutomirski , Florian Weimer , Dave Hansen , Dave Hansen via Libc-alpha , Rich Felker , Linux API , "Bae, Chang Seok" , "the arch/x86 maintainers" , Linux Kernel Mailing List , Kyle Huey , Borislav Petkov , Keno Fischer , Arjan van de Ven , Willy Tarreau Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2021 23:31:50 -0000 With this proposed API, we seem to be combining two requirements, and I wonder if we should be treating them independently. Requirement 1: "Fine grained control". We want the kernel to be able to prohibit a program from using AMX. The foundation for this is a system call that the kernel can say "No". It may deny access for whatever reason it wants, including inability to allocate a buffer, or some TBD administer-invoked hook in the system call, say membership or lack of membership of the process in an empowered cgroup. Requirement 2: Ability to synchronously fail upon buffer allocation. I agree that pthread_create() returning an error code is more friendly way to kill a program rather than a SIGSEGV when touching AMX state for the first time. But the reality is, that program is almost certainly going to exit either way. So the 1st question is if the system call requesting permission should be on a per-process basis, or a per-task basis. A. per-task. If we do it this way, then we will likely wind up mandating a GET at the start of every routine in every library that touches AMX, and potentially also a PUT. This is because the library has no idea what thread called it. The plus is that this will address the "used once and sits on a buffer for the rest of the process lifetime' scenario. The minus is that high performance users will be executing thousands of unnecessary system calls that have zero value. B. per-process. If we do it this way, then the run time linker can do a single system call on behalf of the entire process, and there is no need to sprinkle system calls throughout the library. Presumably the startup code would query CPUID, query XCR0, query this system call, and set a global variable to access by all threads going forward. The plus is that permission makes more sense on a process basis than on a task basis. Why would the kernel give one thread in a process permission, and not another thread -- and if that happened, would a process actually be able to figure out what to do? If we do per-process, I don't see that the PUT call would be useful, and I would skip it. Neither A or B has an advantage in the situation where a thread is created long after initialization and faces memory allocation failure. A synchronously fails in the new system call, and B synchronously fails in pthread_create. The 2nd question is if "successful permission" implies synchronous allocation, or perhaps it allows "please enable on-demand dynamic allocation" X. Synchronous Allocation results in allocation failures returning a synchronous error code, explaining why the program needs to exit. The downside is that it is likely that in both case A and B, every thread in the program will allocate a buffer, if they ever use it or not. Indeed, it is possible that the API we have invented to manage AMX buffer use will actually *increase* AMX buffer use... a Y. Enable on-demand allocation. Here the system call enables XFD to not kill the process, but on first use to allocate a buffer for a thread that is actually touching AMX. The benefit is if you have a program with many threads, only the ones that actually use AMX will allocate buffers. Of course the down side is that this program is exposed to a SIGSEGV if vmalloc fails in that run-time allocation, rather than a friendly pthread_create -1 return code killing the program. And, of course, we can have our cake and eat it too, by having a the syscall tell the kernel if it wants (X) or (Y). The question is if it is worth the complexity of having two options. thoughts? -Len