[SPIRV] Add a `SPIRVTypeInst` type with some guardrails (#179947)
The idea behind this PR is to propose a type that we can deploy
gradually to add some guardrails and enforce invariants in the SPIRV
backend.
The PR has 3 commits:
* A first commit where the `SPIRVTypeInst` type is proposed. It's just a
wrapper around `MachineInstr` that adds an assert to check that a
`SPIRVTypeInst` defines a register with the type register class.
* A second commit that shows how the migration could look like for a
single function.
* A third commit that motivates why: we have a `SPIRVType *TypeInst`
that never defines a type in a function whose intention looks very
confusing.
clang/AMDGPU: Do not look for rocm device libs if environment is llvm
Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.
I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway. Handle
-nostdlib/-stdlib as a pair overridable
[clang][ssaf][NFC] Refactor SerializationFormat to use macro-based field accessors (#180842)
This reduces code duplication and makes it easier to add new field
accessors.
Assisted-By: claude
[AArch64] Avoid selecting XAR for reverse operations. (#178706)
Rotations that implement reverse operations, for example:
```c
uint64x2_t foo(uint64x2_t r) {
return (r >> 32) | (r << 32);
}
```
Are currently lowered as XAR (when available):
```gas
foo:
movi v1.2d, #0000000000000000
xar v0.2d, v0.2d, v1.2d, #32
ret
```
This is suboptimal as REV* instructions typically have higher throughput
than XAR and do not require the zero operand.
This patch combines half-rotations to Neon or SVE REVs so that they're
no longer selected as XAR.
[CIR][NEON] Add lowering for `vnegd_s64` and `vnegh_f16` (#180597)
Add CIR lowering support for the non-overloaded NEON intrinsics
`vnegd_s64` and `vnegh_f16`.
The associated tests are shared with the existing default codegen tests:
* `neon-intrinsics.c` → `neon/intrinsics.c`
* `v8.2a-fp16-intrinsics.c` → `neon/fullfp16.c`
A new test file,
* `clang/test/CodeGen/AArch64/neon/fullfp16.c`
is introduced and is intended to eventually replace:
* `clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics.c`
Since both intrinsics are non-overloaded, the CIR and default codegen
handling is moved to the appropriate switch statements. The previous
placement was incorrect.
This change also includes minor refactoring in `CIRGenBuilder.h` to
better group related hooks.
NAS-139714 / 26.0.0-BETA.1 / Validate capabilities_state keys in container create/update (#18169)
## Context
We were missing validation for capabilities state which meant that any
invalid value provided by consumer would get stored in the database even
though if it won't have any effect in usage with `nsenter` but still we
should not allow this to happen in the first place.
[RISCV][CodeGen] Combine vwaddu+vabd(u) to vwabda(u)
Note that we only support SEW=8/16 for `vwabda(u)`.
Reviewers: topperc, lukel97, preames
Reviewed By: topperc, lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/180162
[MLIR][NVVM][NFC] Fix PTX builder class api (#180787)
Previously, `NVVM_PTXBuilder_Op` included `BasicPtxBuilderOpInterface`
as part of the default value of the `traits` parameter. This meant any
subclass that provided an explicit traits list would silently replace
the default and lose the interface, defeating the purpose of the base
class. Callers had to redundantly re-specify the interface.
[SLP] Use the correct calling convention for vector math routines (#180759)
When vectorising calls to math intrinsics such as llvm.pow we
correctly detect and generate calls to the corresponding vector
math variant. However, we don't pick up and use the calling
convention for the vector math function. This matters for veclibs
such as ArmPL where the aarch64_vector_pcs calling convention
can improve codegen by reducing the number of registers that
need saving across calls.
[AArch64] Eliminate XTN/SSHLL for vector splats
Combine:
sext(duplane(insert_subvector(undef, trunc(X), 0), idx))
Into:
duplane(X, idx)
This avoids XTN/SSHLL instruction sequences that occur when splatting
elements from boolean vectors after type legalization, which is common
when using shufflevector with comparison results.
Create rde_filter_out() to optimise filter matching
rde_filter_match() now just uses struct filter_match data for matching
and the peer info from struct filter_peer is only used by rde_filter().
Outbound filters are per-peer and so the filter_peer check is done during
configuration of the peer. So rde_filter_out() just calls rde_filter_match().
OK tb@
[lldb] Step over non-lldb breakpoints (#174348)
Several languages support some sort of "breakpoint" function, which adds
ISA-specific instructions to generate an interrupt at runtime. However,
on some platforms, these instructions don't increment the program
counter. When LLDB sets these instructions it isn't a problem, as we
remove them before continuing, then re-add them after stepping over the
location. However, for breakpoint sequences that are part of the
inferior process, this doesn't happen - and so users might be left
unable to continue past the breakpoint without manually interfering with
the program counter.
This patch adds logic to LLDB to intercept SIGTRAPs, inspect the bytes
of the inferior at the program counter, and if the instruction looks
like a BRK or BKPT or similar, increment the pc by the size of the
instruction we found. This unifies platform behaviour (e.g. on x86_64,
LLDB debug sessions already look like this) and improves UX (in my
opinion, but I think it beats messing with stuff every break).
[21 lines not shown]
[libc++] Only make comparators transparent in __tree if they don't cause a conversion (#179453)
We're currently unwrapping `less<T>` even if the `key_type` isn't `T`.
This causes the removal of an implicit conversion to `const T&` if the
types mismatch. Making `less<T>` transparent in that case changes
overload resolution and makes it fail potentially.
Fixes #179319
(cherry picked from commit 9d2303103288f6110622644f78dbd26c8bcf28d5)
[IndVarSimplify] Add safety check for getTruncateExpr in genLoopLimit (#172234)
getTruncateExpr may not always return a SCEVAddRecExpr when truncating
loop bounds. Add a check to verify the result type before casting, and
bail out of the transformation if the cast would be invalid.
This prevents potential crashes from invalid casts when dealing with
complex loop bounds.
Co-authored by Michael Rowan
Resolves [#153090](https://github.com/llvm/llvm-project/issues/153090)
[lld][Hexagon] Fix R_HEX_TPREL_11_X relocation on duplex instructions (#179860)
findMaskR11() was missing handling for duplex instructions. This caused
incorrect encoding when R_HEX_TPREL_11_X relocations were applied to
duplex instructions with large TLS offsets.
For duplex instructions, the immediate bits are located at positions
20-25 (mask 0x03f00000), not in the standard positions used for
non-duplex instructions.
This fix adds the isDuplex() check to findMaskR11() to return the
correct mask for duplex instruction encodings.
(cherry picked from commit 62d018b87a161bb2797c1ed03a482ffcdc8b162c)