From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by sourceware.org (Postfix) with ESMTPS id 9BF14385700D for ; Fri, 4 Sep 2020 14:02:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9BF14385700D Received: by mail-lj1-x242.google.com with SMTP id w14so8113176ljj.4 for ; Fri, 04 Sep 2020 07:02:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=s5UC83kEP27ycFDI+gJHoS/qPMKm0RBmGOsYjk7UgNg=; b=KpIr9tbBE6nYey4rb8ic8i1jBFj/2w9/KmOJAVeqjVQvpekUqrRVAhN61KvgFVqdp3 a0+3hnP3IE3U30qSWVIu36Gw9zhePwwUB8eum7fD7aqdAN3YDCoG53+Is5GNe1866wYn FGGdwvk5/O+jikCSWNbE3YGrb4vU7QjXBsCU3NH3VE8Uze6URsf8RgUWvvk0DG5p4x/s V3Clnrq4FbGOgX+iP0Y5DZIIvydhDICGsQcTfuhlvp7j2quq8je7jWh5kElQyI4TGkiO bHJ/7Q5nlCpAV4tWGZdTJu1DaY3xfPXlEu8T/JakhHH6dQ/DTqGbLI/ZoP1r1xDYdveq sJSQ== X-Gm-Message-State: AOAM531uOLUKlr4g67QEQacJXMgaNQF2bl/Zm5TIdR80iaoCKN8sosRo Z4gTTXO6iJ7EUO5BwaNOXdo= X-Google-Smtp-Source: ABdhPJxPnOLs0nWV28RjzyKUe83oqvjPPWS2NI0A7Es5pgNjHFX6tiLOfxIyOqbiz/fUBWUdt6y/3w== X-Received: by 2002:a2e:b00c:: with SMTP id y12mr4050719ljk.18.1599228123208; Fri, 04 Sep 2020 07:02:03 -0700 (PDT) Received: from kyukhin ([95.163.248.222]) by smtp.gmail.com with ESMTPSA id b20sm1317263lfg.57.2020.09.04.07.02.01 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 04 Sep 2020 07:02:02 -0700 (PDT) Date: Fri, 4 Sep 2020 17:01:59 +0300 From: Kirill Yukhin To: "H.J. Lu" Cc: Hongyu Wang , Uros Bizjak , GCC Patches Subject: Re: [PATCH] Enable GCC support for AMX Message-ID: <20200904140159.r6biwwb74qnj7dhc@kyukhin> References: <20200903150743.gfzofhl3huifeq4x@kyukhin> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Sep 2020 14:02:08 -0000 Hello, On 03 сен 08:17, H.J. Lu wrote: > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches > wrote: > > > > Hello, > > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote: > > > Hi: > > > > > > This patch is about to support Intel Advanced Matrix Extensions (AMX) > > > which will be enabled in GLC. > > > > > > AMX is a new 64-bit programming paradigm consisting of two > > > compo nents: a set of 2-dimensional registers (tiles) representing > > > sub-arrays from a larger 2-dimensional memory image, > > > and an accelerator able to operate on tiles > > > > > > Supported instructions are > > > > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud > > > AMX-BF16:tdpbf16ps > > > > > > The intrinsics adopts constant tile register number as its input parameters. > > > > > > For detailed information, please refer to > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf > > > > > > Bootstrap ok, regression test on i386/x86 backend is ok. > > > > > > OK for master? > > > > I was trying to apply your patch to recent master and got > > compilation error: > > > > g++ -std=gnu++11 -fno-PIE -c -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi > > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn > > o-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor > > t/kyukhin/gcc/src/gcc/../include -I/export/kyukhin/gcc/src/gcc/../libcpp/include -I/export/kyukhin/gcc/src/gcc/../libdecnumber > > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber -I/export/kyukhin/gcc/src/gcc/../libbacktrace -o i386-opti > > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_ > > options*)’: > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: ‘PTA_AMX_TILE’ was not declared in this scope > > if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0) > > ^ > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: ‘PTA_AMX_INT8’ was not declared in this scope > > if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0) > > ^ > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: ‘PTA_AMX_BF16’ was not declared in this scope > > if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0) > > > > Could you please fix that? > > Here is the rebased patch against > > commit 3c219134152f645103f2fcd50735b177ccd76cde > Author: Jonathan Wakely > Date: Thu Sep 3 12:38:50 2020 +0100 > > libstdc++: Optimise GCD algorithms > > Thanks. > > -- > H.J. > diff --git a/gcc/config.gcc b/gcc/config.gcc > index 797f0ad5edd..d0e59e86a5c 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -412,7 +412,7 @@ i[34567]86-*-*) > waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h > avx512bf16intrin.h enqcmdintrin.h serializeintrin.h > avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h > - tsxldtrkintrin.h" > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h" Line more than 80 chars. > ;; > x86_64-*-*) > cpu_type=i386 > @@ -447,7 +447,7 @@ x86_64-*-*) > waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h > avx512bf16intrin.h enqcmdintrin.h serializeintrin.h > avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h > - tsxldtrkintrin.h" > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h" Ditto. > diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h > new file mode 100644 > index 00000000000..df0e2262d50 > --- /dev/null > +++ b/gcc/config/i386/amxbf16intrin.h > @@ -0,0 +1,25 @@ > +#if !defined _IMMINTRIN_H_INCLUDED > +#error "Never use directly; include instead." > +#endif > + > +#ifndef _AMXBF16INTRIN_H_INCLUDED > +#define _AMXBF16INTRIN_H_INCLUDED > + > +#if !defined(__AMX_BF16__) > +#pragma GCC push_options > +#pragma GCC target("amx-bf16") > +#define __DISABLE_AMX_BF16__ > +#endif /* __AMX_BF16__ */ > + > +#if defined(__x86_64__) && defined(__AMX_BF16__) > +#define _tile_dpbf16ps(dst,src1,src2) \ > + __asm__ volatile\ > + ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::) > +#endif I hope in future we'll replace it with unspecs at least... > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index c9f7195d423..9389dc24948 100644 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index bca8c856dc8..a46e31f5862 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options. > -mvpclmulqdq -mavx512bitalg -mmovdiri -mmovdir64b -mavx512vpopcntdq @gol > -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol > -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol > +-mamx-tile -mamx-int8 -mamx-bf16@gol Add space please. > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c > new file mode 100644 > index 00000000000..605a44df3f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c > @@ -0,0 +1,4 @@ > +/* { dg-do assemble { target { ! ia32 } } } */ > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */ > +/* { dg-require-effective-target amx_bf16 } */ > +#include"amxbf16-asmintel-1.c" I didn't get it. We ususally use second tescase to actually execute it and (well, a little) verify that semantics is ok. E.g. that operands order is correct. Could you please do that? This applies to all *-2.c cases. I've checked and looks like public SDE simulator supports AMX. -- K