From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2115.outbound.protection.outlook.com [40.107.94.115]) by sourceware.org (Postfix) with ESMTPS id 742D53858D38 for ; Thu, 9 Nov 2023 09:49:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 742D53858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 742D53858D38 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.94.115 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1699523395; cv=pass; b=Wu5ZkGmxXzlXeapQHnzracjKR0gOJAv3GIW538HoP8kBEPdMEToa3H1Bhq25/P2CbvyiWCH0/4kFJRkC9M3CzU3WjuI6mM45pBiMtGDK0bTiucmjiuccWa1Zi+dJ49WmB+CIBywfgVTtEizYnV5YYe5geVaUeMHB8y4/yajmp2Y= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1699523395; c=relaxed/simple; bh=b/o2n+lOkvqKFY3iGrf2nlzGNWjHmkdU2b2OGRuVjTk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=coLlHirBuhY5yDMqwb6SQBs3JXmkzPWW1+Csa5qO2/Zgn1wF1CGMjLDAWVYYI1WitPev+LIwX5gYcOJHYetOTpHiiWf094hR2/E/qAnc4C6VeiYDOh09Z8RJW19m1lLdaLEtcNt+oCT1bDKlkR4PsNAQJCk6uLbsLI8pFOLpCjM= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eV+B5arjb+ldZ1fcqFpBZDu5vwqHYZkkYv+CA/NJIaj1O7+gsUO3M6ZIXX1j5F34GEaVoWBh9OvObjlzrnwXzIayiqR5YghNplzVBE2uRwWzf2LyP5gYjpJnfWGYnVaUU6E1QQoSrX71V1/RYe/T3yRrmOC656+WYo4wpJP4qbAuo9MnSwtVjo5elkqnz/TRpAPelc+kpBi/0G9FdWScfUwsWEIYxhZZEs8xRJq5YcNcmA52KAd0uXEh4SnNacG8NqNVgGV18gC/vNQIZq6t1lG2dsquOfptrHzyWjI5xMzxlSPbZOlqG9HqSpSVDfToHsUULzvJfFuF4oPyWSfF8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=D8DwzJcDrH2VbVceUgOEH50kQ7IrInqKm/en05MICoo=; b=NhBJZO86ydbs4vN7xTcbaNuwTEr+hDesUskxg3Tikc6flvTerximqo1CTSSQCTQa1EBsprzNGZEC0gyW4UQRKvnrVafUpxHv55Ub/ZwgPZHyPvOFz7JcUGLdG1zshhzQIcuChNMxM40DkPEzSW7NiNYfJRPze0Z5L0Y5dz7Hy+AeQI1Cg8mnS5nVr1nUOAYKxFk1/vSVCaOejyB018ug/Y6llj9N8nruUQMXeKEw8OW7Z3BPo6VcC1J1jRUwADewW5FWLm9LlK2lYDXzubhSau10Xl1Plyikl6raTA/w8UkTQGC4yvSWex1WeOJlcHwcg35gg4npydfw9XEEsweONg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D8DwzJcDrH2VbVceUgOEH50kQ7IrInqKm/en05MICoo=; b=ribRCC9Zg+6uYdCEa74o3mXcgs6WhxdsgHS/RlHTsolIxs8Ds3n8SPbN1WKZ3F63kJChyovb9efpAkDMJi4xC0fxlUwFR6BcVJZCH+mp9BBeTL/oZyfY7FlzK7+/Pqq0KLkeWr6B+OdYrYokxZ9wKMNskshxg3RPm8y33FLo3V8= Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by BN0PR01MB6991.prod.exchangelabs.com (2603:10b6:408:16e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.18; Thu, 9 Nov 2023 09:49:49 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::d18e:b054:2996:ee85]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::d18e:b054:2996:ee85%4]) with mapi id 15.20.6954.028; Thu, 9 Nov 2023 09:49:48 +0000 From: Hao Liu OS To: "gcc@gcc.gnu.org" Subject: Questions about vectorizing a simple loop by inferring the range from array Thread-Topic: Questions about vectorizing a simple loop by inferring the range from array Thread-Index: AQHaEuzGzGe/k9tlSk2XvWnJ0FNhCQ== Date: Thu, 9 Nov 2023 09:49:47 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-11-09T09:49:45.658Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|BN0PR01MB6991:EE_ x-ms-office365-filtering-correlation-id: 687fd06c-4309-47c6-9fd5-08dbe10935b3 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: P/gxORA5BoZEE3d8g5qz+kqXqARiqGYqJt4IXLkRQqq8+pj2cfx1eNHXTAI2RxlGnzYBhe6Sxrb4OZVnmKb6VdEdBp6p8NLnkNVbrzKM+/ENIhS0fXROCQcinvEPiWnIacwu2X8ni8tjEYerSMApQBNlFkduqKk2jsBKjboek9Gys840BUP4ucGxk7EB0CXJ+b9J/+FvPwCBAGFIIqJmu0gyBCy9BaVS00CJYhSLOCRKiAyttHdjt/iUjyBtu+7ObVmVxbfIY0mQDjkRbOIPi6sFS9+icIpTcZqs0QYYrpAm7eiJ+AEyfs3i8PoH6IMb4ZB5XhvWyuDAC7MB+7o/Lh7xaRR0gbBqfy0mwRgojd+n1CivtZJ+LG4JFIzpEy9SheiSRZEFgSvOS8fM2KN6ozT8CFwyT/NyEj5jjpY+iPFZehY+/4V5w/ZiDnmv5FlK+ubxgNTV6a7V13G1cj1Vn2gOT+hDLWptGcgg3KfxIb3kyybDbPPkiOjMdnbIYaMFdZNkO6j6r8Jrnyhqlt/DdmoeTDir9bT1gnM2Hybst7J1K/rTale5SyehrH1ikkolp+M7x0Ct+pf0ii2bNbjjKLYA1iBxWwsxz8BwdED016V/aUqloG5zRq5QLxUwnzgv x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ2PR01MB8635.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(39850400004)(376002)(346002)(366004)(396003)(230922051799003)(186009)(1800799009)(451199024)(64100799003)(2906002)(64756008)(316002)(9686003)(26005)(91956017)(6916009)(66946007)(66446008)(66476007)(66556008)(76116006)(55016003)(478600001)(5660300002)(52536014)(8676002)(83380400001)(8936002)(122000001)(41300700001)(86362001)(38100700002)(6506007)(7696005)(71200400001)(33656002)(38070700009);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?yQ96NlLIIqGEyN24KePHU2DkMP85Bb6OTuPYdCHDQhIionYN/OooMiyuJB?= =?iso-8859-1?Q?XOxOHd58lo4b7g6c7JOY7TxeJNA/Lj+HDHw8SYNOPWP5wdnBoS0nbR/7on?= =?iso-8859-1?Q?ciVoG7CkZAMBViyC+sKOWxMONMRSHlTnBRYGimuWcl4DyFuOSahF4TUbTa?= =?iso-8859-1?Q?bgcKgaej+6NUKIBRp7jVqnkUtCb9bWhyBwvy12WrZ2Y6Z3wr9mtvj2/Bw9?= =?iso-8859-1?Q?XRCIuVKAoscU7Mzx9dN/zzWa6FeH3eny470WQlrlA74wfwo7ZKFT76kyHd?= =?iso-8859-1?Q?ctYHNV+jYYYJQYjclcU5juF9J3QfP76WvAZvdjB8SV59cwOtg+TV+UolSg?= =?iso-8859-1?Q?25UpuWyzOHk7G9qyCqQVV+dHdqO/nkGsS0dFUA2Y/LxldhMdCr50Dih/L4?= =?iso-8859-1?Q?clXeZHMVL9fvLR6FxXg/FP1hYJqV6JeFc0t3mmYX0orqSVfj+0XDBAPHok?= =?iso-8859-1?Q?IKXz/BvWc1ajnYTzFVoV/gLU2WaJkJTSdFejDuFFDJx0U+wz2oagBFkyiN?= =?iso-8859-1?Q?DJoMrdsUxhByp9wSIC7U3X4yts2/eXV1LYj91MjqkAnp2lAVWQWPapkNed?= =?iso-8859-1?Q?rub6xrT0pfTo95auPBPH+Wynbm6RUW8V263/sqVa7TYJMFdv0JRKWCoRRN?= =?iso-8859-1?Q?RE4DTjUH6/VsM7nTzIYgDe14jejGI6Rs+YGO/aYq4yMFN1cQRztGbaDaiQ?= =?iso-8859-1?Q?/s6fvp7BY3wL4OJumFu96S+lktzpkTldA4FANxRawhKaxdegygUlzJqJgf?= =?iso-8859-1?Q?1uM1Nb8htbbiY1GPd2LVA+WXtxOgCx5jTER123Kl3Y0cAnDHp0/1V1wlNA?= =?iso-8859-1?Q?EWDVrR80ohzRxBOjS3SW+f/8pNVWrxfD5HDaOkkVsJ8XtlHN6Co23rbmlv?= =?iso-8859-1?Q?XXI86hR432zLNJZH7Zsq3dJWSAvr2PkpSew19xlzwN4RNs6o/plLovfqfm?= =?iso-8859-1?Q?QDxVtgpG8dTR+hJBOgjW0CY+fu6HyDwi50S9JhnXQzeRMVdXOW/ukaEDn/?= =?iso-8859-1?Q?xx1iqIN42G/RdbeRk7m6I4rS3qgUsmeu7YBKgU8c134NfeLocgYi1aN7t/?= =?iso-8859-1?Q?sw9MqerlM21zRU7flv7UrU/4IAJbhrc2ssdXlZcRZ7g9+EtMYWQQBPtGQA?= =?iso-8859-1?Q?8UIpDufR3ofRgvF8VUBb+BZvg498EMrGTIKMUkyGlNY8NE0Ae1FYSJXRJl?= =?iso-8859-1?Q?sDxGIKyBIA8/9thQ6xIfIif5/II0GL0f1TaVoF0Prt3ky3Ms0yhoziDtSM?= =?iso-8859-1?Q?UvrzqNQPMmx/Odiv0Gb4iSW54l/n7lLtUV1N0RfkdKwtgIBG7+VNkR59bo?= =?iso-8859-1?Q?r9R99kcSwT8YCBHud1ZLXqiZWHjTU9j8nk1F4eICQSRC5WOZUQVq0mX3Fy?= =?iso-8859-1?Q?j92A1e4RXACNh2AGheJO88qzOzG1mezyAkmOh99vm3YbuXd0JePbfeKhib?= =?iso-8859-1?Q?Ocj3Y2GzTVx67hhmo+msgfU5N1kySf81+9bLaDfZVpI3V7IaEbqEoU8uEW?= =?iso-8859-1?Q?RZep/it9xUX7BI+uqCkDbbot8OnBIDyOOa98eXwxMqGp1n1rE3tEV6cgtn?= =?iso-8859-1?Q?bJybkCvtz/tHtoA9nIWp04MlFg/VKmf7TDlSSdhZX7BWsqGNPmEMr/xW2p?= =?iso-8859-1?Q?VlCubdYjhJQv6uTvEgvu5+qz09BTTZlhLwPQ8iUM/gTFE+D91Eswkb9Q?= =?iso-8859-1?Q?=3D=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 687fd06c-4309-47c6-9fd5-08dbe10935b3 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Nov 2023 09:49:47.4430 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Nol7sutiyP6hESoZNTfg/wVtezl6dzG1q2N1v6m9fY6KQa65CtmyAZPfLzuPEO8Mfnav2jcz6VQVoj6kP0HtoCuNy+ztg+EuDYCUCpDko+2BKvZfnPuz2SqLXioX2MDC X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN0PR01MB6991 X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi,=0A= =0A= I'm investigating how to vectorize the following simple case:=0A= =0A= int A[1024 * 2];=0A= int foo1 (unsigned offset) {=0A= int sum =3D 0;=0A= for (unsigned i =3D 0; i < 1024; i++)=0A= sum +=3D A[i + offset];=0A= return sum;=0A= }=0A= =0A= The loop body and loop vectorizer dumps are:=0A= =0A= # i_13 =3D PHI =0A= # ivtmp_14 =3D PHI =0A= _1 =3D offset_7(D) + i_13;=0A= _2 =3D A[_1];=0A= sum_8 =3D _2 + sum_11;=0A= i_9 =3D i_13 + 1;=0A= ...=0A= Creating dr for A[_1]=0A= ...=0A= (chrec =3D (sizetype) {offset_7(D), +, 1}_1)=0A= (res =3D (sizetype) {offset_7(D), +, 1}_1)=0A= case1.c:7:13: missed: failed: evolution of offset is not affine.=0A= =0A= As SCEV thinks {offset,+,1} may overflow, it can not propagate the sizetype= and=0A= fails. The call-stack is:=0A= =0A= dr_analyze_innermost() -> =0A= simple_iv() -> ... ->=0A= convert_affine_scev() -> =0A= scev_probably_wraps_p() -> =0A= scev_var_range_cant_overflow()=0A= loop_exits_before_overflow()=0A= =0A= =0A= BTW. If we add a stmt like "if (offset <=3D 1024)" before the loop, GCC can= =0A= vectorized it successfully, as scev_var_range_cant_overflow() knows the ran= ge=0A= of _1.=0A= =0A= For the original case, I think GCC is able to infer the range from the arra= y=0A= length and do the vectorization. There is already functions to infer this i= nfo=0A= from undefined behavior like array-ref and record nonwrappiong IVs. We can = see=0A= the dumps of passes like evrp, cunroll, vrp1, etc:=0A= =0A= Induction variable (unsigned int) offset_8(D) + 1 * iteration does not = wrap in statement _2 =3D A[_1];=0A= in loop 1.=0A= Statement _2 =3D A[_1];=0A= is executed at most 2047 (bounded by 2047) + 1 times in loop 1.=0A= =0A= The call-stack is:=0A= =0A= estimate_numbers_of_iterations() ->=0A= infer_loop_bounds_from_undefined() ->=0A= infer_loop_bounds_from_array() -> ... ->=0A= record_nonwrapping_iv()=0A= =0A= =0A= So, I have two questions:=0A= =0A= 1. is it legal to vectorize such case by inferring the no wrap info from ar= ray=0A= ref (I'm wondering if there is any corner case that can not do)?=0A= 2. If the 1st question is true, then how could we implement this in GCC?=0A= =0A= For the 2nd question, I think there may be several possible solutions:=0A= - Could we re-use the existing nonwrapping IV information found by niter?= =0A= - Or, could we implement a function called infer_nowrapping_from_array() (l= ike=0A= infer_loop_bounds_from_array)? For this way, there are also different pos= sible=0A= places to call it:=0A= - in the very beginning (i.e., dr_analyze_innermost()), where we knows it= is=0A= analyzing an array-ref A[_1].=0A= - in scev_probably_wraps_p(), where it doesn't know it's analyzing an arr= ay=0A= subscript, so we have to find if the user of _1 is array-ref (i.e., A[_= 1]).=0A= =0A= As SCEV is the fundamental pass for loop analysis, I think we'd be very car= eful=0A= about the correctness. Do you have any comments?=0A= =0A= Thanks!=0A= -Hao=0A=