[clang][bytecode] Diagnose copying empty mutable unions (#195529)
We had a special case for copy/move ctors of empty unions. Remove that.
Everything else is just so we don't regress diagnostics.
[X86] vector-reduce-* - add 32-bit test coverage to the minmax tests (#195617)
The horizontal-reduce-* tests already have 32-bit coverage but they will be retired soon.
[IR] Add require-logical-module module flag (#193502)
This module flag is optional and can be set to require the use of
logical alloca/gep instructions.
This flag will have 2 usages:
- tell optimization which flavor of GEP/alloca to emit
- fail loudly if a GEP/alloca is emitted in a module targeting logical.
[CIR] Replace nsw/nuw unit attrs with OverflowFlags BitEnum
Combine the separate `no_signed_wrap` and `no_unsigned_wrap` unit
properties on arithmetic ops into a single `OverflowFlags` BitEnum
(`nsw`, `nuw`). This allows combined flags to be written as
`nsw|nuw` in assembly, replaces the per-flag verification traits
with a single `OverflowFlagsRequireIntType` predicate, and folds
the two `HasAtMostOneOfAttrs` checks into one
`SatExclusiveWithOverflowFlags` predicate.
The bit layout matches `mlir::LLVM::IntegerOverflowFlags`, so
lowering casts the value directly and asserts the layout via
static_assert.
Updates IncOp/DecOp/MinusOp builders, CIRGenExprScalar, and
LowerItaniumCXXABI to the new API. Adds round-trip and
verification tests in clang/test/CIR/IR/.
[AMDGPU] Support Wave Reduction for true-16 types - 3
Supporting true-16 versions of the reduction intrinsics
Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
[flang] avoid introducing iteration dependencies in WHERE and FORALL temporaries (#195053)
This patch improves the addressing of temporaries created when needed for simple FORALL or WHERE as below to not introduce iteration dependencies.
```
subroutine foo(p1, p2, mask)
real, pointer :: p1(:), p2(:)
logical :: mask(:)
where (mask) p1 = p2
end subroutine
```
Instead of using a stack like temporary that uses a counter to push and fetch elements, the loop IVs are directly used to address the temporaries. This makes it easier to later vectorize or parallelize those loops.
This is only done when:
- This is not a FORALL with array expressions
- The dynamic type is the same at each iterations
- The WHERE and FORALL do not create loops of depth more than 15.
- If there are FORALLs, their strides are constants 1 or -1.
[3 lines not shown]
[clang][SYCL] Handle cdecl variadic functions for SYCL device (#194922)
SYCL doesn't allow variadic functions to be called from device code.
Since SYCL device compilation mostly uses targets that don't natively
support variadic fucntions, we now issue an error even if the variadic
function is never called from the device if it has cdecl calling
convention attribute. We also now don't issue an error if a variadic
function is called from the device code. This patch defers the error
caused by cdecl attribute to the actual call point and adds diagnosing
of variadic function calls on device side using deferred diagnostics.
[AMDGPU] Support Wave Reduction for i16 types - 3
Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
[AMDGPU] Support Wave Reduction for true-16 types - 2
Supporting true-16 versions of the reduction intrinsics
Supported Ops: `add`, `sub`.
Supports only the iterative stratergy, DPP is yet
to be supported.
[AMDGPU] Support Wave Reduction for i16 types - 2
Supported Ops: `add`, `sub`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
Revert [LICM] Remove unnecessary check during store hoisting (#195606)
This check is needed after all, to handle the case where the load
aliases only on the first iteration. Even with correct cross-iteration
handling in MSSA, it's legal to return an out of loop clobbering memory
accesses in this case.
Reverts https://github.com/llvm/llvm-project/pull/187529.
Fixes https://github.com/llvm/llvm-project/issues/195513.
[IR] Remove volatile from nosync (#194391)
Volatile operations are explicitly specified as not synchronizing...
> This is not Java’s “volatile” and has no cross-thread synchronization
behavior.
... and LLVM does not model them as being synchronizing anywhere, except
the definition of this attribute, which is largely unused outside the
Attributor.
The ordering requirements of volatile operations are already fully
encoded in their memory effects (unlike what is the case for
stronger-than-monotonic atomics).
Clarify that "nosync" is specifically in the sense of
"synchronizes-with" (rather than just any cross-thread communication)
and remove volatile operations from the definition.
[libc][math] Refactor isnan family to header-only (#195598)
Refactors the isnan math family to be header-only.
part of: #147386
Target Functions:
- isnan
- isnanf
- isnanl
Also adds `__LIBC_USE_BUILTIN_ISNAN` compiler feature detection
[SystemZ,test] Cover non-preemptible PLTOFF (#195600)
R_390_PLTOFF{16,32,64} against a non-preemptible (hidden) symbol takes
fromPlt(R_PLT_GOTREL) → R_GOTREL in RelocScan::process so the relocation
resolves to symbol - .got with no PLT entry. Add tests to fix the
the test gap from the initial SystemZ port
(fe3406e349884e4ef61480dd0607f1e237102c74).