From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2083.outbound.protection.outlook.com [40.107.249.83]) by sourceware.org (Postfix) with ESMTPS id F14913858D1E for ; Tue, 20 Jun 2023 07:07:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F14913858D1E Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SVliaA2fESn7lKNygxWYhMsSNl2AJBVa9n5M47efHp4hd59VkddZ8UQIdPvi7buscbXcC7Do6Doxkk1DIqFxdvjua6GtOx3o2f23c+IMDCkAFDZY/Gh5n783Qod7uwdVXn1qUn8tJRf/Y0a+3Vg6iV0Zzk/DFTOhyj9gGWZV/Fdxuky7d5hC3UfByhTEd3sMhizRt/DicdRo9vimlVBV3MHFDlYTZlhgSAMRVOd+ZSYG4oS6RmhqHux2AspkAMnDhSBwIKCFrT6w9nrqSVYVZy7OKOwhkIVXM4M1jFpatsd3/Ze+iXX+/hjTlkW8/qKok6gNvrzxpYf2HT1Cd0Z3jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lfvJEojE2zlhKpGPQJgNH/lWVgvWFIXdZyWZ+hVYHlU=; b=HJUEhTPS31rKj2kAHH+zJ+MhIJ+CpIYmqL98QxxBfarQ3q9gr6p/nEpVydNrIlVs7EC3qzCwKcQZeQsidyFBXZZcJBjl4B1uxqKCMINd+RSgbNY/DjY5H0iA4gJYm28+VbTb9O7i5IorMTJd8e/SYyPemT88pIxPkXl+zD5IS52RWg1UxZUNbtmugULSpqJHE26BxkuH5e9JmXlKjINP0oBiUrzI+S1Zrz7TEikvXgbU3J21rldoGvrp4xpBIW8xQr3fqd9HPxE6s9XdVSWzwKt1aF3Hn7sli24b8c2NRchj9hAcqkpj8YUL0K8DlvqtYCkB0mR7bGY5uJQPhS2XsA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lfvJEojE2zlhKpGPQJgNH/lWVgvWFIXdZyWZ+hVYHlU=; b=vF/iRN0eZHxh+BVQ43h74wUzMneI9tVGOZDBw0SuacF8KsopT+J8GI7ttEKsSy4gRHzXCxPl2/ZUTGI1hgp2wuzbuClrrFh0k30XMDfcwDtQup1AjW9iMDw7hB1qNAfol5qSBRr/yDDya/E91lHcXCgNZi63uRumy4PI3zbG1g4L6UEtnOuD3MCSMz9UcVVXkSqMySd9Z4i1aX7q8nRmu1rKh0umJr7vw0yJC/ELyMNaMfeC47uV5w/v/eYJ7rCqPHVHl+xV4pGm23QfjMdIj0mjc+w7v6do2n2Jxl4oDClTiWm0+D2RimrIYH0Tw+nFU2lIUHmSArl8qEIf/YYKjg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by DU0PR04MB9633.eurprd04.prod.outlook.com (2603:10a6:10:311::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36; Tue, 20 Jun 2023 07:06:58 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6500.036; Tue, 20 Jun 2023 07:06:57 +0000 Message-ID: <169ca252-3828-b466-4d47-a8fe720ec4ef@suse.com> Date: Tue, 20 Jun 2023 09:06:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 From: Jan Beulich Subject: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F To: "gcc-patches@gcc.gnu.org" Cc: Kirill Yukhin , Hongtao Liu Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR3P281CA0064.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:4b::12) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|DU0PR04MB9633:EE_ X-MS-Office365-Filtering-Correlation-Id: a141a522-314c-43ba-6c26-08db715cefca X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: COOsCQQBzMXQJC71CgMgLrYols7b4tv9AD1N5DtHZK1O4ws4fChz1SDhIzNaLYLX+0D6uG9ktcwnD2ZBBQL1FFg78HyLVKCwNJqEm0IFIPvj0R8oHk6HbEvvyiZZ9WRcvHH8dO7xh8O/epJIRD4fL/SV0C/UTqNLKAS5pdItHjVmxKm0+x0LEa0liFJ5E/D9lZTkhGi6UTh4hZC2g9DTAK90LO4DhLlXjkuko1pmQ76jnFF+PbhI0jF04KCrc88AKH/oXutFRFM8E/avQe447oNNC/g1ahVJei3muzkU3bZeVzvNqFxzi+eFdh2Y077E9gLQfQXSNO+zTV7uAsmWop4hoR3P8IbcBlWqyj0Y/qNkXh1QPOfOa6t7nj/Yb3feXhBDPrNtbpKHXk8NDakEqMD5xvy9nTHRYOjpxBN9fghvrQik/7oCqItvfai7+weZeUwT9L+y00LEGm62j2xuBqLFAtywvj0VYjktWrUljRWc2eiT8MW69mjIJrRlFhH1TRCrL+CnLDRuQ+b969bxmed1D3Nb5PxpiZSNvVoLvUc3fUl72raCxFVl13oNuLkyytD1iW8e47X+G5gXTkJ+aOwhLK5ytG8vUssjvPa7sS/j1Mo62qkBpcEXVAKIf4LNkjVYMxFSBk7Dqwjy1FMT8w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VE1PR04MB6560.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(366004)(396003)(376002)(346002)(39860400002)(136003)(451199021)(2616005)(86362001)(83380400001)(31696002)(2906002)(84970400001)(31686004)(5660300002)(4326008)(54906003)(66556008)(66946007)(186003)(8676002)(8936002)(66476007)(478600001)(36756003)(41300700001)(6916009)(6512007)(6506007)(38100700002)(26005)(6486002)(316002)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?amkzalpjL3kzY2VvTC80enRYL0Nzdi9OekU3Q3VQUGFhU2dGODBBK2JKQWpB?= =?utf-8?B?aGR4U3JLOWRBVWJzU3UrTW0xY1VrK1FUMXBOQ3NyTnQwNWs0RXA5ZjFiekdK?= =?utf-8?B?dUZSdy95T1NmOXBQN3RURlJSUkhnbkdtMGg5WWo2eFlnMEJiK2RuSW9lOEVm?= =?utf-8?B?am1sd0ZJQ2o1a3hXQUZEOU1WNWk0ejZiRVJBN2NUeVYrSDVJYnNUWldSRFFH?= =?utf-8?B?NlE0U1BqTGVSdkpBdmJsY29EVkJ2SUZwcXdjT0JhUG9tNVB2ZGt5RDJpMWpI?= =?utf-8?B?RDkwcFRneGN5TnN3LzJRNk9ZQktSaXUvbTNHRGxMSHNjbWVtVUZXMTJOVCtj?= =?utf-8?B?bmxHejFXL1Jqa0thcDFNb1BFRGFoaWRBYWV4S2NjdkxxY1hLTWJIY2VoVTVt?= =?utf-8?B?Q0NUaXluL01nMWJyMjhSQVdiYlNienkzaTR0WFlXbGM0NGhVSHNHU0dhZmF6?= =?utf-8?B?U0Rxakxwbi82N2RiTDZ4NHZ3S3NEWlYzMWswRzJLMW53dVV0WG9wTC9meUNw?= =?utf-8?B?ZVdvblFZTmhhUGQ0a3Jjdm5EWWhEb3QyTnFmOGxUTEJqU0cvZW9lWkxVRTdQ?= =?utf-8?B?aCtIRnBPZCsxc2ZDMy9nNTdSY2FienIzVjFKaktNVW1OSW80aU1HRWlZdS9Y?= =?utf-8?B?Tjhqekh4aFUweHI5Z1JmVlJqWkJwMmhkNTBaTHZCZG9obzl0YUlVRENCV1B4?= =?utf-8?B?VHgzSDFwaktnbkZhME1tM3RwZDNCWXFLRTNCeElqcDRCUHlDN1g1R3pjT3lV?= =?utf-8?B?NmJRWW1ldEFzK3FPMDdjUnFuMUV4bWxIRURFbXA4eTFpY1dKWlZ0R21rYXlm?= =?utf-8?B?c3RiU1JLbVJ6S0J6MlZUYkM1ZkdZeS9TUDRVVG16MmZIS01PeVVoUXMyVlFm?= =?utf-8?B?aVF3eVBPbDlGeXREbU1Ja0kwMHlFbW5neWlzZnVDUXl4R2dkR25aMTM0UnNB?= =?utf-8?B?TUJmSGlFdytocXAzUnJZcmNpTGY1UjNybHZpUkRTdlRqN3F5YlI4YTg4Rk1Y?= =?utf-8?B?NWtvRWhONDZXWUZEL0FHbU05S21TVEJRWTl2TGd1NGpnTEczRGo4aG4rclZu?= =?utf-8?B?bHgzWEhGVHJSUVMybjNCamIzZDE1WS9Ba2FaSHYrQm54UVBoc1dDSlF4RXNy?= =?utf-8?B?eWRDRC9KYks2UjduNEdRcWoxaGZNZy9CZHBobjlFMUFMUFdMV3VwWXNtdmV4?= =?utf-8?B?eEJvYVZBUmNVRm9TQTRtd0Q4QWJ1NzNNajFWS0dVWDVuTmxPZlUycW53Z1hM?= =?utf-8?B?cmFJc2lEaXFsY0Y4NEtRbGNzQ3BWWnZObTdrUmR0MXRvOWJjMDFZLzVYektS?= =?utf-8?B?cUMxTEYyVWErcW1wa282K1Mza05xbU45bFl5REJJMjR3eHBiakh5S29YWHBz?= =?utf-8?B?UDd4Si84UzR3K1FlTTBoQU9HMThmNHduWHpnU05tYTVQaXp3NWZJa0dzTEg3?= =?utf-8?B?S2l5WUh4M2ZJbkx2bytZS1V6NDNNWmorUVlNeTUyV252KzlTMVdJK21JWGYw?= =?utf-8?B?Rmp3dlI3TGJNM0pqWWdZSk9Sb3UrU0pFY0pOVTF2RlZKOEg1ZEdtNWhxL04r?= =?utf-8?B?OFdWbU1rMS9FdDhBVkVkV3VKVWcveW1mL2tCVk1lbGpxQW9ZazVVKzIyTHl6?= =?utf-8?B?bkZDMEI0eHJRZXZzOUpGMEV2dEZVRStYNUUrZTB3eXhWV0E1aC9oV3BjQjhE?= =?utf-8?B?NWRGMml1MWxpZUJGa3ZvZysxYkRuRFJiV2JCcXVRNGNqeEwrU1hIdnpiTGtE?= =?utf-8?B?WEFqTU91cGk5RGJ1eXZGdHpLLzU3QUVVT3NSeDcwZ2Q1ZDU1MGlaMXFLSWx5?= =?utf-8?B?MjZWMXNIRFVnQ1lRcnRZVXpqUlM4cXQrL3I5dkNBTW9TWmRtS04rZEhoeFVm?= =?utf-8?B?YjNyWXNDUUl5UkFzUFk1dlRWc09QakVpM1hHbm5IbGlzVkVoc2NGVG9tWEph?= =?utf-8?B?M2NpWVhWK2I1a3Q3dlYwZFI1VVRWb1NUejhOSHg0bVdUeG1JbUc5SXY3a2xl?= =?utf-8?B?NkF6eUp2VmxlN0RJbWVaT3d2Z1JxOVl5MnNHbk0vMTRuOEhzVTF2MTZ3VjJK?= =?utf-8?B?bG5lNEFQYXJGb3pxY1hBNU9PbnFHRllWT3crdUY1cUxhek8zZS9GelA5dEVM?= =?utf-8?Q?iRMoz0MKk8YnwAFSx5Kpy0o/X?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: a141a522-314c-43ba-6c26-08db715cefca X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jun 2023 07:06:57.8190 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DI2m+XpsFEHbHQyCiWY+KJZIWMSdI3Wm8rZUR8ADaCqJnzhjGNRl/3OnzqrYuAswbWJsy/SQmG6hDzYoJqw8SA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR04MB9633 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG*). While there also bring *_vternlog_all's in sync with that of the three splitters. Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata.*, filled with zeros which are never read. gcc/ * config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (*_vternlog_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (*_vpternlog_1): Also permit when TARGET_AVX512F && !TARGET_PREFER_AVX256. (*_vpternlog_2): Likewise. (*_vpternlog_3): Likewise. gcc/testsuite/ * gcc.target/i386/avx512f-copysign.c: New test. --- I haven't been able to find documentation on the dejagnu(?) regex syntax (?:...). With ordinary (...) failing (producing twice as many matches), I could only derive this from other scan-assembler patterns. I guess the underlying pattern, going along the lines of what one_cmpl2 uses, can be applied elsewhere as well. HFmode could use embedded broadcast too for copysign and alike, but that would need to be V2HF -> V8HF (for which I don't think there are any existing patterns). --- v3: Adjust insn conditional as well. Add testcase. v2: Respect -mprefer-vector-width=. --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12399,22 +12399,35 @@ (set_attr "mode" "")]) (define_insn "*_vternlog_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] - "TARGET_AVX512F + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) /* Disallow embeded broadcast for vector HFmode since it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12443,7 +12456,8 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12527,7 +12541,8 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12610,7 +12625,8 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1" --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-copysign.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -mno-avx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$(?:216|228|0xd8|0xe4)," 5 } } */ + +double cs_df (double x, double y) +{ + return __builtin_copysign (x, y); +} + +float cs_sf (float x, float y) +{ + return __builtin_copysignf (x, y); +} + +typedef double __attribute__ ((vector_size (16))) v2df; +typedef double __attribute__ ((vector_size (32))) v4df; +typedef double __attribute__ ((vector_size (64))) v8df; + +v2df cs_v2df (v2df x, v2df y) +{ + return __builtin_ia32_copysignpd (x, y); +} + +v4df cs_v4df (v4df x, v4df y) +{ + return __builtin_ia32_copysignpd256 (x, y); +} + +v8df cs_v8df (v8df x, v8df y) +{ + return __builtin_ia32_copysignpd512 (x, y); +}