[flang] Fix missed access group attribute when converting FIR to LLVM dialect. (#195376)
Apply group access attribute to memcpy when lowering fir.load/fir.store
of a box if an original FIR operation had it.
[asan] Change error to note when poison record is not found (#195669)
When `CheckPoisonRecords` fails to find a record, it's often due to the
history buffer being too small rather than a functional error in the
logic.
[GIsel] Add combine (sub a, (mul x, C)) -> (add a, (mul x, -C)) (#194282)
Copy this canonicalization from InstCombine so it can run on
post-legalized expansions. This is especially useful if the sub is a
neg.
CI: FreeBSD 15.1 PRERELEASE (#18490)
Update freebsd15-0s builder to freebsd15-1s and point it at the
15.1-PRERELEASE tag. The previous freebsd-15.0-STABLE images are
no longer available.
Additionally, add a freebsd15-0r stanza for the RELEASE.
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
[MLIR][AMDGPU] Add amdgpu.global_transpose_load op for gfx1200+ global memory transpose loads (#195287)
Adds a new `amdgpu.global_transpose_load` op to the AMDGPU dialect that
wraps the `global_load_tr` family of instructions introduced in RDNA4
(gfx1250+). Each thread reads a column of a matrix from global memory
and receives the corresponding transposed row in its result register.
The op is kept separate from the existing `amdgpu.transpose_load` (which
targets LDS via `ds_read_tr` on gfx950+) because the two variants target
different GPU architecture families, have different chipset
requirements, and differ in their valid (element size, num elements)
combinations — in particular the 16-bit case produces a 128-bit
(8-element) result via `global_load_tr.b128` rather than the 64-bit
(4-element) result from `ds_read_tr16.b64`.
Lowering to the existing ROCDL `global.load.tr{4,6,.}.b{64,96,128}`
intrinsics added for gfx1200+.
---------
[2 lines not shown]
execve: Add guard pages around execve KVA buffers
This helps ensure that overflows will trigger a panic instead of
silently corrupting adjacent buffers, as happened in SA-26:13.exec.
Extend kmap_alloc_wait() to support allocation of guard pages on both
sides of a KVA allocation. Modify the exec_map setup accordingly. Add
the "vm.exec_map_guard_pages" tunable to provide control over the guard
page allocations.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D56711
[mlir][MathToLLVM] Fix vector type checks in math.absi lowering. (#195360)
For vector types, the lowered type is LLVMArrayType not VectorType. We
should use the original result type to guide if we can do the lowering
for vectors or not.
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
[mlir][SPIRV] Add named-barrier type and OpNamedBarrierInitialize / OpMemoryNamedBarrier (#195664)
Adds the SPIR-V named-barrier object (TypeNamedBarrier) along with
NamedBarrierInitialize and MemoryNamedBarrier ops, gated on the
NamedBarrier capability and SPIR-V 1.1+.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Fix long POSIX_FADV_DONTNEED for single block files
dbuf_whichblock() is not made to handle offsets beyond the block
end for single-block objects. Handle it in dmu_evict_range(),
similar to dmu_prefetch_by_dnode().
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18399
Closes #18489
[flang][semantics] Add a flag to relax some of the semantic constraints on C_LOC (#195112)
This PR adds a flag that downgrades some of the semantic constraints on
C_LOC so that it can be used more like LOC. Without the flag behavior is
unmodified, with the flag the constraint that the address be object
pointer or target is removed. There are other constraints we might
consider relaxing, but I think this is a start.
[clang][NFC] Mark CWG2785 as implemented and add a test (#195547)
[CWG2785](https://wg21.link/cwg2785) clarifies that a
*requires-expression* is never type-dependent, it always has type
`bool`. That means that in a snippet like this:
```cpp
void g(void *);
template <typename T>
void f() {
g(requires { T(); });
}
```
The call to `g` should be diagnosed as invalid (`bool` is not
convertible to `void *`) even if the template is never instantiated.
Clang does the right thing since version 10:
https://godbolt.org/z/s61rEbsfz
[flang][CUDA] Only apply implicit managed attribute when CUDA Fortran is enabled (#195353)
The implicit-managed tagging added in #175648 was intended for CUDA
Fortran allocatables. However, the gate was just
LanguageFeature::CudaManaged, so the tagging also fires on
non-CUDA-Fortran translation units when -gpu=mem:managed is in effect.
This patch adds a LanguageFeature::CUDA check so the implicit tagging
only fires for CUDA Fortran TUs (driver-set -fcuda or .cuf/.CUF source).
Adds a regression test that bbc -gpu=managed without -fcuda on a .f90
source must not produce any cuf.* ops or #cuf.cuda<managed> attributes.