From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2055.outbound.protection.outlook.com [40.107.8.55]) by sourceware.org (Postfix) with ESMTPS id 393F23857342 for ; Fri, 16 Jun 2023 06:22:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 393F23857342 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jRUW6HnqsqRgGget0cMizzcxS8njDEBQ9J4kD86MLVgZ1StdSkEh2dbJ74FweyM8JC7P0m0kfJJ8vHg0Ys+faFy52/e/nDeGv2RMBBz8+BbL8mlJL9dxMciXR4Jw+np005LZudCZWvMZ1JFj+ipqEM6cKnooXKyd6var6WNv1gdAwmhPPrMLEv5KmbVHgALAi8aiUwWXYfCgkYp9m3M3t9F/TrDWQlm++1kP8dy6O7R5MVksg1UyLB42sXfdzGH6wz3+BPijVTTdH3DcwsVNILKldD8VgUlwlJ8qcGpgaGad0wu1xYnlpm6jxqDqMeNDBzQqVs/65bmPiFNP+BpSKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EB9uOVaRw2pVlO+KUJbP0iz9aCJrUAuLOX3UCO5aatE=; b=LU+SStli1sbLdFNageuAdNbND/usuQ8BvETOaZlvrOSsd0ojcWAKTou/WQ1fyAomEMucrDg/3k0xRfbTlO/858SfAfqnjb3ielyhQkHcZNa6KDVQlKLnSXp46pKCHLfZNTBR76isfFSnK4M8SSPIwjPP2Ga+wDMCTxSUkkwLj4LOxOg6h4DtOgrGQcKocIAWnFF27PT/KWZ9sq8k4AVuOjShlUlcG+7ciSHks2lH32iJairEWJkk2NKhjut6U4Gp/qayM0ewQRxaBfHp/LsWS5LCg/hn+j0Jg+i6P1Xl9rj+CiS5QGSNbNv7qnJ7x5H07EXXNFEgoiPid47gOZ3cWw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EB9uOVaRw2pVlO+KUJbP0iz9aCJrUAuLOX3UCO5aatE=; b=mdzVsnRMUM4hyynizPMIaRVaXymd0KSFea0V2t/WYII/JQFZZhW31zvh6Ns5jgm2E5xsJjkLsxnF0w+Vt0NEtSEjGsJ8QrePqmYROKveKLDZ0edLREh4VC7IPWAN/GTks8bVy+X0OAmGxLPOgXHB/2L1J4pXh56S2n+cwzUi1OtyO6zpSxUhKR1ZscSEYfXj/kJm05qjdya86rpCGUK3+F8BtHi+YFA7ig98rToSHjXqBSlKsz9GYUAnAtmenuoG7/3+aM4tGQ7ZEDvhNVZcj/hLHbLw/cNJ8u0Lm4LJMlGH3wl9vaTvNGQHI+oJvoe1yZqEjym6Mrd3zDo864SSfA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by VI1PR04MB7008.eurprd04.prod.outlook.com (2603:10a6:803:13b::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.29; Fri, 16 Jun 2023 06:22:19 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6455.039; Fri, 16 Jun 2023 06:22:19 +0000 Message-ID: Date: Fri, 16 Jun 2023 08:22:16 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 From: Jan Beulich Subject: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F To: "gcc-patches@gcc.gnu.org" Cc: Kirill Yukhin , Hongtao Liu Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR0P281CA0088.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1e::8) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|VI1PR04MB7008:EE_ X-MS-Office365-Filtering-Correlation-Id: c74202c8-785e-4521-85a5-08db6e3209a9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Y3f6+kCJEJJbMUYP0CW45j1Vx+hLN95A6UNfDTjF1JJ6ohyCLuKQXWeHVkdKpHNMm/w0lxrXxTFHa84tCQarOrLYsQphicpq3zOnaPCq92VsrYgdaQAgBqu+uy1YVJtIvjY0IAnpsU/lXhrCqul0FNeUTzEY2bFIaaowVDhCaf1a2Wk5c0Qc+8YCxCdNSn0AXGdxIvWB1rmqPE+XML3zd2C5f6OTxUhEkMbp+WtF3CD2pW+Rxdm0+hi+61PgSvg1PmkyB2SIWFDiBoyVKHFZmNVyfucsvB4leKdZPa4YqSzNMotqHe/IE+cFKGGD0HecLp1RrfttrUK97UUWqQXgJEZpLQzFNDM/dxkZqBYMITafpXNFy9nj3YBX2G3BqzrrZAIzDbvf6j9ZQK+nBEd99jYnWEszRH+QSMwVq5IatsvP+IS+utRxZyNhokvuZ8sBmfR+Wi2EDUmioGoYcUvbRAjHBVxUF455VTnAFB6OkE1at/Q2HSDh3tYySdNwqb5i7XbYM72aDURzxs8b7aLnG3nSRxv4GTKvi3cJmAJAKBVDoKoYy7vkkh8PvumJy1qi2t//NMg0tESdUxi0JZc+Nn5UM3hdKas4+MCQDUIwEHfDiWbwaSTDHGHLXPQ+6M3S9mBFVVjZHlqDt3zVcuLNSQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VE1PR04MB6560.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(396003)(366004)(136003)(346002)(376002)(39860400002)(451199021)(2906002)(41300700001)(5660300002)(8676002)(8936002)(316002)(4326008)(6916009)(66476007)(66556008)(66946007)(36756003)(83380400001)(2616005)(6506007)(6512007)(26005)(186003)(31696002)(38100700002)(31686004)(86362001)(478600001)(54906003)(6486002)(6666004)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?S1Jqd3cxdHlkNE5ncjQrT0pPVkwzd1FFOURzcEhWSjhNaDVTOFljNE1QNllP?= =?utf-8?B?MitZbUFnai8vZVVQZkxvc3I3Rnp3UXJ2REY2cm1YYXBLU1VsdFNSdkpXRXhP?= =?utf-8?B?K2wva3N5N2ZjUTFKS215Q01TVCs5aXRKanNCK21MTGRxdFhKcVNhQ3ZXd1RZ?= =?utf-8?B?blZjOVRrZEN1TjZkSHN6NjdZejZySlN3U0tva29yalg1MnhFK1R4aGF5cjZv?= =?utf-8?B?SjFDMEJya3Z6c2xhU1BKcHdMZzM2Rk5VdFVrUXJTUGJITGVPcktFWHE4b250?= =?utf-8?B?SEpVVkZmV1l1eUo3Q0tFUXZtQ2xtbG10VHNIY2huQ3hjZWVvaktTNzQ1a3l0?= =?utf-8?B?ZEJSSmg0NW5JU01oMWNVZmFYTUY4MkxvU3EwaEVCMk1HeHJqaC9QVk9yb3NN?= =?utf-8?B?cDh5cWp2eEt6R0JUdjErK2h5N056OXFyeGZVUklEVzF3ZWg4UFNreFlBaXlj?= =?utf-8?B?a2hXN2I3U0dsSFZ3NFBUVDNoRFVxNFZsV1VhenNzdmUvYXJ4NVFQbUVsTFBV?= =?utf-8?B?TDhIMUhSZWdtNkRLMkR4bDRxc3lVenRuOURPMy9iTDdwZHNJTTQzMy9DTVUz?= =?utf-8?B?U3JlTkFmZ3pjTUdWcWtrTUE4VHZTNTQ4NGF6UmkvVWh6ZUNtNUpsU1puL0Nh?= =?utf-8?B?aDRSWFB6YWZlVUZ4Q1F1aUIyS0Q2dFdKN2NLa1RSVE9ydEFsVXlqOFJnZTE0?= =?utf-8?B?VVZSbGROWllMOFZncVBhdmQ4bDBuNkp4WUdFUEovZzRVWlpTbUdEcG4rQmo1?= =?utf-8?B?MG82cUgyZlZ2dmQybjVZaUNOeUNtRmFvNjJXUjgxYW9LMkZ2RkhOOTdlOWFE?= =?utf-8?B?Q1R1YXFoaVpXL0FzeDZydmJDQ0RmSnNJU0hJZXlsQit0TFpiVElhbElUTmhI?= =?utf-8?B?eldzQjZmL2o2ZW5QKzlHVkZobHVOR1RINHl3RlNvSnMzemdQS0p5Q1RhRFFj?= =?utf-8?B?cms3OGtYTUQ4eU5icHlHMHZNUE5nWWRKOGo0bm1ySFpMd0ZtVjZVcVN4ODEv?= =?utf-8?B?V3dScjBNZEZXbzBNVHlwbFA1d25VQjdDb04zVGhKZlZMVlV3c0ZYUTlURGxZ?= =?utf-8?B?UmZQUWtaUmRyMXYwcjZPT3NHSzJiMU5KV2U1NEl6ME5GOUR1VTFUb1lOdWE2?= =?utf-8?B?WkVER3U5T3V2WmlZQzByUnJab0d0NXZLeS80UzVlTDhwbWNXcExwOTZFd0ly?= =?utf-8?B?MDNzRmR3WDN6QXFRNHNMV09EUjd4SDNQZE1tUi81dlRyb2ZRRUZxL0lRRnZW?= =?utf-8?B?dHdCM1liZGR3ekhDSlQvL1A5WkprU3VlNytlVEVQWGl4Si9acGs0ZWRMdHV3?= =?utf-8?B?UzdNdW1sTndkdjY3MVlYSXZWZkFWKzhwQ0lBZ1c5b09kRHhtK2tHRXdtSkpw?= =?utf-8?B?WWZwakFNYlRLUHl2WENTanhaQ2F6eWNVVGo4RkJ2RGV1UnJqbmJkWWQwZER4?= =?utf-8?B?RCsrVG42ZW5ReUplZzF2RDM4UXUzR3F6VW13SGhVOUpKZGVYZmFSTCsycmZt?= =?utf-8?B?TjhBVmVpd2lmbGxOcDR2SHFwS1hvSnZOd2hoWnQwU0czR2hEMGFqSGVsdHdj?= =?utf-8?B?UUhPMkkvemJSZGlLUXZiME9YSjhUcjJablRveU02Tk16SitrM2tzNjBoNi96?= =?utf-8?B?ckozazVGK2hOSW1OelNjKzdCUDQxOVBRek5tOGU1WVBrbnY4TTZDMW0xZHJs?= =?utf-8?B?MnREWXpoNjRMNzBXRjBHS3dHMjBGc2N2VlBTd2tlZjVCNVg5OU9aU3QyTUJC?= =?utf-8?B?Vlk0OXhJS2p4azlzNm5wc0dlUU5PZHIwNUNYMmlORzhZK0NxM1dhRjlYQzNu?= =?utf-8?B?elNIYnI5d2cwYThENEp5dGVFN2IxdHF1cnA0c0luUzdyVU9Kanc5MFJ2ZW10?= =?utf-8?B?VkE4NkJ4K1d2aFVTK3MwaW9valhLRk9USy9tMW1WOGtjU3RlNFpWVGZHMStk?= =?utf-8?B?SUVvYXZmdFdkQVZwdXFTeFBDMlFSUXkzYnBhMWdqQ2E0NFBUQzZKYnhJQXVJ?= =?utf-8?B?ZnhiRkw0TTdXS05hWmRnOTByYy9hRytPZFRSUkN5MEVGVzFGbEtvdldCbVpv?= =?utf-8?B?aGhQeFdCZEh5aUhneEtLdkxMWUowTHNJTnlrTUlWRjVYR0JYZDRPN3ZzWFRE?= =?utf-8?Q?apIK/P7hw5QgztP4xDlw+x+90?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: c74202c8-785e-4521-85a5-08db6e3209a9 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2023 06:22:19.3893 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CpsvmzFh83ljLvTBRlXzTkwxZ3vz/RDD9od8/Rfdf/+B/ZUJq7ROVY2f84pNMkHBITJn6oDK5AlwSmFSp+R8EA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR04MB7008 X-Spam-Status: No, score=-3027.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG*). Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata.*, filled with zeros which are never read. gcc/ * config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (*_vternlog_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (*_vpternlog_1): Also permit when TARGET_AVX512F && !TARGET_PREFER_AVX256. (*_vpternlog_2): Likewise. (*_vpternlog_3): Likewise. --- I guess the underlying pattern, going along the lines of what one_cmpl2 uses, can be applied elsewhere as well. HFmode could use embedded broadcast too for copysign and alike, but that would need to be V2HF -> V8HF (for which I don't think there are any existing patterns). --- v2: Respect -mprefer-vector-width=. --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12597,11 +12597,11 @@ (set_attr "mode" "")]) (define_insn "*_vternlog_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "TARGET_AVX512F @@ -12609,10 +12609,22 @@ it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12641,7 +12653,8 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12725,7 +12738,8 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12808,7 +12822,8 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1"