[libc++] Do not remove a root-name followed by ".." in `path::lexically_normal()` (#201261)
For Windows paths like `"C:.."`, `path::lexically_normal()` should
preserve both the root-name and the `".."` component. In
[fs.path.generic]p6, clause (5) prescribes removing `".."` only after
non-dot-dot filenames. Since a root-name is not a filename, this clause
does not apply. Clause (6) only applies when there is a root-directory,
which is absent in this case. Therefore, the root-name and these `".."`
components should not be removed.
This change aligns `libc++` with MSSTL.
Fixing the testcase failure in arm64 darwin systems (#206223)
For apple targets, the string EnableSplitLTOUnit is not emitted in
fulllto. Marking the test unsupported for this targets.
[libc++][test] XFAIL `text/text_encoding/environment.pass.cpp` test on Armv7/Linux Ubuntu targets. (#206188)
The test gets failed on Armv7/Linux Ubuntu board during remote execution
when cross-compiling on the Windows build host with the following
message:
Environment mismatch: Expected ID 3, received: {106,UTF-8}
Temporary XFAIL this test till investigation of the problem.
See #141312 for details
AMDGPU: Migrate unittests to subarch triples
Replace specifying a processor name with the triple
subarch.
The register-limit helpers in AMDGPUUnitTests.cpp that enumerate every
valid CPU via fillValidArchListAMDGCN still pass the CPU explicitly, as
does the MC Disassembler smoke test (its C disassembler API derives the
subtarget from the CPU, not the triple subarch).
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
clang/AMDGPU: Stop passing redundant -target-cpu to cc1
Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch
Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.
clang: Start using new amdgpu subarch triples
Fixup invocations using --target=amdgcn + -mcpu to introduce
the subarch in the triple.
For offload toolchains, a single toolchain is constructed for the
top level amdgpu architecture, and the effective triple is used for
target specific tool invocations.
The specifics of the resource directory layout are tbd. This does
try to find resources in the subarch named directory. The paths
are searched at toolchain creation time, so that does not work
when there are multiple subarches.
Fixes #154925
AMDGPU: Introduce amdgpu triple arch
Move towards using the triple for representing incompatible
ISA changes. Use the subarch field to represent the various
incompatible cases. Previously we pretended a single triple arch
was universally compatible, and only distinguished by function
level subtargets. Move towards using distinct triples to enable
more sophisticated toolchain handling in the future, like proper
runtime library linking.
Introduce a new subarch per unique ISA, but also introduce
"major subarches" which are compatible by a set of covered
minor ISA versions. These map to the existing generic targets.
There are a few placeholder subarch entries, which currently
have missing backing generic arches for codegen.
This should be the preferred triple arch name going forward,
but is treated as an alias of amdgcn. This does not yet change
clang to emit the new triples.
[2 lines not shown]
[CIR][AArch64] Lower vfma and scalar FMA lane builtins (#204819)
Lower additional AArch64 NEON fused multiply-accumulate builtins in CIR.
This covers:
* `BI__builtin_neon_vfma_v`: `vfma_f16`, `vfma_f32`, `vfma_f64`
* `BI__builtin_neon_vfma_lane_v`: `vfma_lane_f16`, `vfma_lane_f32`,
`vfma_lane_f64`
* `BI__builtin_neon_vfmah_lane_f16`: `vfmah_lane_f16`
* `BI__builtin_neon_vfmah_laneq_f16`: `vfmah_laneq_f16`
* `BI__builtin_neon_vfmas_lane_f32`: `vfmas_lane_f32`
* `BI__builtin_neon_vfmas_laneq_f32`: `vfmas_laneq_f32`
`vfma_v` and `vfma_lane_v` reuse the corresponding quad FMA lowering
paths. Vector lane forms splat the selected vector element, while scalar
lane/laneq forms extract the selected lane before emitting `llvm.fma`.
Represent NEON `Poly128` as a 16-byte integer vector in CIR, matching
[9 lines not shown]
[CIR] Rewrite `ConstRecordBuilder` to be based on layout (#206137)
Note: this is a pretty sizable change, and for that I apologize.
Fortunately most of it is test changes, and the actual generation code
is fairly managable. Unfortunately this couldn't really get any smaller,
as the individual actual differences (unions, arrays, etc) all cascade.
The existing implementation of the ConstRecordBuilder mirrors the
ConstStructBuilder from classic codegen. In both cases, this is a type
that attempts to do a layout of the record/struct type in a
byte-compatible way with the actual struct layout for the purposes of a
constant.
This has a few problems:
First: it is another layout that we have to keep in sync with the main
one.
Second: it adds an additional layer of complexity to the IR in a way
[43 lines not shown]
[RISCV] Don't run insertVSETMTK if +xsfmmbase isn't present (#206426)
I'm not familiar with xsfmmbase but IIUC we don't need to run those
passes if the extension isn't present, so this saves some compile time
when the feature isn't enabled.
Some tests were missing the sifive-O0-ATM-ATK.ll extension so I've added
the feature, which caused one of the vsetvlis to turn into a sf.vsettnt
[libc++] Implement LWG4125: `move_iterator`'s default constructor should be constrained (#206427)
- The default constructor is not yet defaulted as indicated by the
resolution of LWG4125, because the effects of defaulted-ness is not
strictly observable. Also, it's arguable not feasible to use default
member initializer `= _Iter()` because it might cause non-conforming
behavior in pre-C++17 modes, due to lack of guaranteed RVO.
- In the test file, `NoDefaultCtr` is replaced with a proper iterator
type `cpp20_input_iterator<int*>` to make the test more pedantically
conforming.
[TableGen] Add support for sparse direct-lookup tables (#201158)
This change is motivated by the recent AMDGPU patch (#200241) where a
sparse direct-lookup table is generated manually to improve VOPD
eligibility lookup. The `GenericTable` only generates a direct lookup
for continuous tables, defaulting to binary search for non-continuous
spaces. This patch extends the `GenericTable` to support sparse tables.
Currently it is implemented as an opt-in feature, however the long-term
vision is to heuristically decide the best lookup scheme. Setting
`DisallowSparseTable = false` opts a table in: TableGen will emit a
sparse array as long as certain conditions are met (key space <= 4K
entries, single primary `bits<>` key, no secondary search indices,
etc.). For example:
```
def VOPDXYTable : GenericTable {
let FilterClass = "VOPDXY";
let CppTypeName = "VOPDXYInfo";
let Fields = ["VOPDXYKey", "IsX", "IsY"];
[62 lines not shown]
[mlir][build] Pass extra compiler/linker flags to `add_mlir_python_modules` (#204230)
`add_mlir_python_modules()` is used to build the nanobind runtime
library and dialect extension DSOs with a default set of compile and
link options. Projects using these modules may need to augment these
options to handle the warnings of this specific part of the build
differently or to add specific libraries to the shared objects. This
patch introduces four optional CMake variables that allow callers to
provide additional compiler and linker flags:
```
MLIR_BINDINGS_PYTHON_EXTRA_NANOBIND_COMPILE_OPTIONS
MLIR_BINDINGS_PYTHON_EXTRA_NANOBIND_LINK_LIBS
MLIR_BINDINGS_PYTHON_EXTRA_EXTENSION_COMPILE_OPTIONS
MLIR_BINDINGS_PYTHON_EXTRA_EXTENSION_LINK_LIBS
```
NFC if the new cmake variables are unset.
[flang] Treat a bare carriage return as whitespace outside literals (#206171)
Free-form (and fixed-form) source files authored or transferred on
Windows can contain a stray carriage return (0x0d or \r) in the interior
of a line, for example between a token and a trailing '&' continuation
marker:
COMMON/CM1/<CR> &
flang rejected such files with "bad character (0x0d) in Fortran token",
whereas the other compilers accept them. Carriage returns that form a
clean CRLF line ending were already tolerated; only a CR in the
significant part of a line tripped the error.
The fix is to skip over the stray <CR> outside of character literals.
Assisted-by: AI
[libc++] LWG4557: Remove `constexpr` from `owner_less` and `owner_before` (#191534)
Closes #189885.
The implementation was already conformant. Adding status tracking
updates.