From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-0010f301.pphosted.com (mx0b-0010f301.pphosted.com [148.163.153.244]) by sourceware.org (Postfix) with ESMTPS id 7107B3858C50 for ; Thu, 6 Apr 2023 02:21:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7107B3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=rice.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rice.edu Received: from pps.filterd (m0102859.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3360kOCB015696 for ; Wed, 5 Apr 2023 21:21:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=mime-version : from : date : message-id : subject : to : content-type; s=ricemail; bh=ZCji8hyWRUIJ3bTlcIqm5kMq5oh9G9Ebzj64QQaR5bc=; b=SvJwKJsWJSEFDDOlRUB44dZwkbbOkuNLg438yamLGe4F8aV6Lc+7ikpNPwcHN9QCZfum 2lfBk/VHxEr+L/a+QllZL6rmOljY7hnq9hdScgJnJ0uWrgCHqyfllzacNbq+s9YbXCrl g0kxT+N7Ou1uDrIEize7YyJnKOv26Q6HdokAxezyHcxLcBiNdTsQKGGYDeSMXC1ohJiD IQwq2/dlgD0e6OuxTjoM6CuTq6JYb5vONu3cNSpSDeC2FNDkbE3ACXPn5r1b5FEyfwep gQlxwZK6YtGySzZrRJtpSsi3E01t6sjGruZRAlIgdmJ8GTL5xbxR8VZS6eGHIOqw2KOo WQ== Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) by mx0b-0010f301.pphosted.com (PPS) with ESMTPS id 3ppg01058u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 05 Apr 2023 21:21:15 -0500 Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-54c060d7cdfso19719957b3.13 for ; Wed, 05 Apr 2023 19:21:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680747674; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ZCji8hyWRUIJ3bTlcIqm5kMq5oh9G9Ebzj64QQaR5bc=; b=lJ+2Q2sbeV/FfD+W274TG8AgXODKr0x/eN17AKykbaiv/8B6S6jOEbIxExFzJb9g6d Q+mPFHbd5Wf3ofY2k7QHGbRsnhH/2jYePzFnV1gIYTWMii+GdpRf4c/4HouC1zXfWl1/ iTqRDQ6bWMAcmPMwFoh83E/g55OlYqg+8Oohwef6EQb2viyJwF9Y0lNY0lLUUZb9yQze qVDgJCJaZs4BDr29qIIUfmgKeCzo6+XgN+crXrirJv6vtpRajZVkBR+E2MSCY5kOASgm GIE4U1TXppKQZAmLM5DrvZAWTeTLP1EX8+RT6FSt8EMx52GMTa1n0td9tK757wRVmg2B H7AA== X-Gm-Message-State: AAQBX9e2ZY8cYFGW+11rtIACHjKx5GtZQFZEE2ccnznA3neXekYxg1yT d14r6VSLJoNStdf5w245fegDBgFDej385ui3cv8Umjf/zAtmrjZvKYGvzJJBtWt4hXLomKAkoh1 AaPgb7CqluRXXj02X7teNDmbejzQIOq9BUm7o/gOkVM7idtDKi2tiZ4VR5UN8MUbwOOg= X-Received: by 2002:a0d:ec4a:0:b0:541:7f7b:a2ff with SMTP id r10-20020a0dec4a000000b005417f7ba2ffmr5128035ywn.8.1680747673797; Wed, 05 Apr 2023 19:21:13 -0700 (PDT) X-Google-Smtp-Source: AKy350aFAZooYHiDGgHYkQFMn5xgj22HS8SJcsmWDWXZrIfR3NHc8+BXBIMDv5e5bKFH7tAjXhCDxcqWHbAhcaBPRGo= X-Received: by 2002:a0d:ec4a:0:b0:541:7f7b:a2ff with SMTP id r10-20020a0dec4a000000b005417f7ba2ffmr5128026ywn.8.1680747673147; Wed, 05 Apr 2023 19:21:13 -0700 (PDT) MIME-Version: 1.0 From: Charlie Hernandez Date: Wed, 5 Apr 2023 22:20:36 -0400 Message-ID: Subject: [GSoC] gcc-rs - Unicode Support or Metadata To: gcc@gcc.gnu.org, gcc-rust@gcc.gnu.org Content-Type: multipart/alternative; boundary="00000000000003f11e05f8a190c4" X-Proofpoint-DLP: Gmail-Outbound X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-05_15,2023-04-05_01,2023-02-09_01 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_05,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,SPF_HELO_NONE,SPF_PASS,TXREP,URI_DOTEDU autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --00000000000003f11e05f8a190c4 Content-Type: text/plain; charset="UTF-8" Dear GCC members, I understand that I am late in submitting this proposal. However, I found out about gcc-rust and Google of Code three hours ago, and instead of doing nothing, I decided that it is in my best interest to apply nonetheless. I'm interested in Rust and the GCC frontend for many reasons, and I would like to be considered for this involvement. I can be fully committed to the project if any of my proposals are accepted. # General Information Name: Carlos "Charlie Cruz" Hernandez Email: cjh16@rice.edu University: Rice University '2026 Major/Focus: Mathematics and Linguistics Country/Timezone: United States / Eastern Standard Time What is your Open Source Experience so far? Online I go by "SeniorMars," (https://github.com/SeniorMars), and I have contributed to the following significant projects: Rust-analyzer, Neovim, Coc-rust-analyzer, and the Rust compiler for documentation. I'm highly active in the Neovim, Latex community and working on several Neovim plugins for the Typst markup language. Additionally, at Rice, I taught. https://lazy.rice.edu/ (website is outdated due to University policies -- for now) that aims to teach open source concepts to students. Finally, I have a youtube channel dedicated to open-source concepts: https://www.youtube.com/@SeniorMarsTries. For the sake of this project, I have taken my University's programming class as a Freshman. Also, notably, I'm working on a tree-sitter parser for the Typst markup language that deals with Unicode. In Neovim, I'm also trying to tackle "concealed text" with virtual text. Although I have yet to work with gcc-rs, I'm confident I can help. # Project Information I wish to tackle one of the three projects suggested in the gcc-rust section: Unicode support, Metadata, or Improving user errors. ## Unicode support While working on the Typst tree-sitter project, I've learned how extensive Unicode is and the difficulty of correctly parsing such a language. In particular, I learned how to work with all the weird cases of Unicode, i.e., emojis, different types of Whitespace, and identifiers. My main goal is to apply all the concepts I've learned with Typst to gcc-rs. Thus, the main difficulties will be dealing with modifying the lexer to handle \p{Whitespace}, \p{XID_Start}, and \p{XID_Continue} properly without introducing complications in parsing in other areas of the project. Reusing code from libcpp/ucnid.h from the CPP frontend may help with this part. Finally, we must introduce a new Rust::String class that represents rust identifiers, strings, and `create_name` instead of the old implementation. Of course, I also need to define the v0 mangling scheme that Rust uses to parse Unicode correctly. I can take a lot of inspiration from Tree-sitter. The timeline is very close to the two proposals before me. However, I would first start implementing punycode earlier as it would give me a checklist on everything I must test to make the lexer fully support Unicode. As the rest is then shifted, it makes it easier to implement tests for cases I know will be difficult to deal with. # Metadata While working on the typst.nvim, I decided to use Rust to communicate to Neovim's API and Lua by linking binary to something neovim can use. This piqued my interest, and from the looks of it, the work I would be doing in this project would porting all the requirements of `rustc_metadata::rmeta::CrateRoot` to `rust-export-metadata.cc`, whose spec is detailed in `src/rustc_metadata/rmeta/encoder.rs`. In particular, I would ensure that we support Strict Version Hash (SVH), Stable Crate Id, and encoded MIR. My timeline then is based on modifying and implementing the fields in `CrateRoot.` However generally: Week 1-2: - Modify rust-export-metadata.cc to include the "basic" fields in CrateRoot, such as edition, panic_in_drop_strategy - MetaItem Week 3: - Implement a testing method to load only specific metadata in case of identical hashes correctly. - Document all the functions I created Week 4-5: - Implement CrateDep - Implement Strict Version Hash, which also needs: - proper StableCrateId, which needs - proper basic metadata support Week 5-7: - Implment `SourceFile`, `ForeignModule`, `NativeLib`, and the rest. Week 8: - Testing and documentation plus start a write-up. Week 9-10: - Pipelining and Crate loading Week 11-12: - Modify our rlib and add dylib support with compression I would appreciate any mentor. I understand I am still late, and this email could be more robust; however, I would love to work on gcc-rs this summer. Thank you, Charlie -- Charlie Cruz -- Going through a name change! Math & Linguistics @ Rice University '26 --00000000000003f11e05f8a190c4--