[CIR] Fix getNewInitValue on string-literal arrays
`getNewInitValue` in `CIRGenModule.cpp` rebuilds a global's initializer when
`replaceGlobal` fixes up references after a global's type changes -- for
example when an `extern` array is referenced while still incomplete and then
completed. Its `ConstArrayAttr` branch cast `getElts()` to an `mlir::ArrayAttr`,
but a `ConstArrayAttr` built from a string literal stores its elements as a
`StringAttr`. A struct global that both points at the replaced global and has a
`char` array member therefore aborted on a failed `cast<ArrayAttr>` during
CIRGen.
`ConstArrayAttr::verify` allows only two element kinds: an `ArrayAttr` or a
`StringAttr`. A `StringAttr` holds raw 8-bit bytes and references no globals, so
there is nothing to rewrite. The fix returns the initializer unchanged for the
`StringAttr` case and `cast`s on the `ArrayAttr` path, so a future third element
kind asserts rather than silently passing through.
This surfaced compiling CPython's deep-frozen module data (SPEC CPU 2026
714.cpython_r), where frozen objects cross-reference each other and carry string
payloads. The benchmark advances past this abort to a const-record type-identity
issue that is tracked separately.
NAS-141457 / 26.0.0-RC.1 / V-series rear-bay enclosure support via bifurcated NTG SES partition
Adds enclosure2.query support for V-series rear bays (V140, V160, V260,
V280) served by the bay-serving half of the bifurcated PEX89032 NTG
chip. The other half has no drives and is dropped from enclosure2.query
— discriminated by Array Device Slot descriptor labels ('slot01'..'slot04'
identifies the bay-serving partition; '<empty>' identifies the no-drives
half). Both halves share the same vendor / product / encid, so descriptor
labels are the only discriminator.
Changes:
- ses_enclosures2 adds _VSERIES_REAR_PRODUCTS, the
_vseries_rear_partition_owns_bays discriminator, and
_initialize_v_series_rear_enclosures (picks the bay-serving partition,
tags it slot_designation='REAR', drops the no-drives partition).
get_ses_enclosures grows a deferred_rear bucket and dispatches via the
unified _initialize_v_series_enclosures wrapper.
[21 lines not shown]
NAS-141457 / 26.0.0-RC.1 / V-series V2xx front-bay enclosure support
Adds enclosure2.query support for V2xx (V260/V280) front-bay drives,
which are served by a single Broadcom PEX89088 PCIe switch chip
partitioned into two SES VirtualSES enclosures (replacing V1xx's dual
9600w-12i4e SAS HBAs).
Key V2xx differences handled here:
- The two PEX89088 SES partitions advertise the SAME encid (SAS
address), so the V1xx encid-comparison disambiguation fails.
ses_enclosures2 now falls back to inspecting Array Device Slot element
descriptor labels ('slot01'..'slot12' identifies the NVME0 partition;
'slot13'..'slot24' identifies NVME8).
- V2xx slot indexing differs from V1xx: each partition exposes its 12
owned slots at libsg3 element keys 1-12 (NVME0) or 13-24 (NVME8),
with sysfs slot files matching the key 1:1. slot_mappings now branches
by enc.product so the V2xx table is picked for ECStream 4IXGA-SWp/s.
[12 lines not shown]
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
The barrier ID is derived from the address via (addr >> 4) & 0x3F, so a
byte offset that does not land on a 16-byte barrier boundary is still
valid: it simply selects the containing barrier. No alignment assertion
is needed, and such offsets must not crash the compiler (see the
misaligned test).
Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
[OpenACC] append triples when materializing reduction destroy recipes (#207034)
Append the triples if they exist when materializing the destroy region.
[LLVM][Tablegen] Add Default arguments support for Intrinsics in TableGen (#198557)
This patch adds LLVM infrastructure to support default argument values
for intrinsics, modeled after C++ default arguments.
The motivation is to simplify the evolution of intrinsic signatures by
removing the hand-written AutoUpgrade boilerplate that intrinsic authors
must repeat for every new trailing parameter.
This feature extends the existing TableGen `ImmArg<>` property with an
optional `DefaultValue<V>`, written as `ImmArg<ArgIndex<N>,
DefaultValue<V>>`, so an intrinsic author can declare a default value
alongside an immediate argument. When older .ll or .bc IR is loaded,
AutoUpgrade fills in the missing trailing arguments from the
TableGen-generated default-value table ; no per-intrinsic hand-written
upgrade case is required.
Link to the RFC, where this feature will be discussed:
https://discourse.llvm.org/t/rfc-default-argument-support-for-llvm-intrinsics/90839
Fix MSVC build after #206326 (#206993)
MSVC defaults to a non-utf8 encoding, and we added unicode characters in
some test files, which causes CI failures
zfs: fix SIMD defines to match OpenZFS HAVE_SIMD() macro
The OpenZFS merge 80aae8a3f8aa introduced HAVE_SIMD() which checks for
HAVE_TOOLCHAIN_* defines via simd_config.h. The kernel module Makefile
was updated, but kern.pre.mk (static kernel build) and the libzpool/libzfs
Makefiles were missed, still using the old HAVE_SSE2 etc. names. This
caused all vectorized raidz, fletcher, and blake3 implementations to be
compiled out.
[AMDGPU] Fix SIFoldOperands crash on REG_SEQUENCE with physical register (#206976)
The getRegSeqInit function crashes on REG_SEQUENCE instructions with
physical register inputs.
Since both optimizations that use getRegSeqInit do not need to handle
such REG_SEQUENCE instructions, this patch changes the function to
return nullptr which signals that the optimization should not happen.
[Flang][OpenMP] Emit TODO for non-rectangular collapsed loop nests (#205558)
Non-rectangular loop nests (where an inner loop bound depends on an
outer IV) with collapse currently generate incorrect code that segfaults
at runtime, since OpenMPIRBuilder's collapseLoops assumes rectangular
iteration spaces.
Added a check during lowering to detect the non-rectangular case and
emit a clear "not yet implemented" error instead of silently producing
broken code.
minimal reproducer :
```
program repro
implicit none
integer, parameter :: N = 10
integer :: arr(N,N), i, j
arr = 0
[23 lines not shown]
[Clang] Add FileCheck lines to atomic-auto-type codegen test (#206749)
Follow-up to #197874. The codegen test was missing FileCheck lines,
making it a no-op. Add checks that verify both __auto_type _Atomic and
_Atomic __auto_type produce correct allocas and stores.
Follow up to comment from:
https://github.com/llvm/llvm-project/pull/197874#discussion_r3496368126
[Support] Fix VersionTuple DenseMapInfo conformance (#206872)
In C++ standard library (and many other programming languages standard
libraries), if two values are equal, they should return the same hash.
This requirement is pretty common so associative containers can quickly
find values that might be equal by calculating the hash, and if that
requirement is not hold, associative containers might not work as
expected.
The documentation of `DenseMapInfo` does not specify the same
requirement, as far as I can see, but its usage on `DenseMap` relies on
it, or objects that compare equal might end up in different buckets and
will not be correctly found.
`DenseMapInfo<VersionTuple>` implementation of `getHashValue` was
implementing its own logic for hashing, but delegating to `VersionTuple`
for equality. `VersionTuple` equality partially compares its scalar
member variables, skipping some boolean member variables, but the
`getHashValue` implementation was using those boolean member variables
[11 lines not shown]
clang/AMDGPU: Stop passing redundant -target-cpu to cc1
Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch
Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.