[VPlan] Always set flags for overflowing ops etc via VPIRFlags. (#179138)
Enforce that all VPInstructions set the correct OpType of the VPIRFlags.
Flag mis-matches (e.g. VPInstruction Add without `OverflowingBinOp`
being set) can cause crashes (e.g. in CSE) or potentially mis-compiles.
Add a few helpers in VPBuilder to create common instructions with
correct flags.
PR: https://github.com/llvm/llvm-project/pull/179138
[libc++][NFC] Disable std_format_spec benchmarks through lit instead of the preprocessor (#179228)
This is probably a relic from when we didn't use lit to run benchmarks.
Nowadays we should just use the lit features to disable benchmarks like
we do in any other test instead of using the preprocessor.
Revert "[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible" (#179339)
A miscompile was found (see #175724), and it's complicated to fix. We're
going to revert for now, and look at reimplementing a fixed version
later.
[AMDGPU] Clear no convergence flag on onperand folding. NFCI
Clear the flag. It fails verification if set, only convergent
operations may have NoConvergent flag. NFCI as it is now because
it just does not happen.
[AArch64][SME] Limit where SME ABI optimizations apply (#179273)
These were added recently with a fairly complex propagation step,
however, these optimizations can cause regressions in some cases.
This patch limits the cross-block optimizations to the simple case
picking a state that matches all incoming blocks. If any block doesn't
match, we fallback to using "ACTIVE", the default state.
[CIR][CUDA][HIP] Add NVPTX target info and CUDA/HIP global emission filtering (#177827)
related: #175871
This patch adds foundational infra for device-side CUDA/HIP compilation
by introducing NVPTX target info and implementing the global emission
filtering logic.
NVPTX Target Info to allows us to compile against that triple:
- Add NVPTXABIInfo and NVPTXTargetCIRGenInfo classes
- Wire up nvptx and nvptx64 triples in getTargetCIRGenInfo()
- Add createNVPTXTargetCIRGenInfo() factory function
CUDA/HIP Global Emission Filtering (most of this is boilerplate from the
AST) This basically narrows down to:
- Skip host-only functions (no `__device__` attribute) when
`-fcuda-is-device`
- Skip device-only functions (device without host) on host side
[5 lines not shown]
[AMDGPU][SROA] Unify cast chain implementations (#177945)
The AMDGPU promote alloca pass is missing a conversion link when casting
between vectors of pointers and pointers or vectors of pointers with
different number of elements. This causes codegen to crash due to
invalid casts being generated. To address this, this commit adds the
missing conversion link.
In addition to this, the commit moves the common load/store cast logic
into a new function `createLoadStoreCastChain`.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
[AArch64][llvm] Remove `+xs` gating for `tlbip *nxs` instructions
A recent specification update has removed FEAT_XS gating for `tlbip *nxs`
instructions. It remains gated on FEAT_XS for `tlbi *nxs` instructions.
[Polly] Use GenDT in assertion (#179433)
`DT` is always the analysis for the to-be-optimized function while
`GenDT` is the analysis of the function that we currently generate code
for which can also be an outlined function. Here, we want to check
dominance in the generated code, hence we must use `GenDT`.
Fixes: #179135
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```
[AArch64][llvm] Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions
Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions.
We removed gating for `sys`, `mrs` and `mrs` instructions previously,
on the basis that it doesn't add value, as it doesn't indicate that
any particular system registers or system instructions are available.
Therefore, remove `+d128` gating for these too.
(In an upcoming change, some `tlbip` instructions, which are `sysp` aliases
are allowed to be used with either `+d128` or `tlbid`. If we don't remove
this gating, then it would require some ugly work-arounds in the code to
support the relaxation mandated by the 2025 MemSys specification.
In this change, retain `+d128` gating for all `tlbip` instructions, which
will then be loosened to either `+d128` or `+tlbid` in a subsequent change)
[NFC][analyzer] Refactor switch handling in the engine (#178678)
This commit refactors `ExprEngine::processSwitch()` and related logic to
make it easier to understand and "prepare the ground" for planned
functional changes.
Unfortunately there were many idiosyncratic decisions in this part of
the engine -- e.g. `BranchNodeBuilder` does not derive from
`NodeBuilder` and doesn't use a `NodeBuilderContext`. For now I left
these skeletons in the closet, but I tried to pick the low-hanging fruit
and moved `processSwitch` a bit closer to its "big sibling"
`processBranch`.
For example I moved the initialization of the node builder into the body
of `processSwitch` because if I want to trigger `BranchCondition`
callbacks from this method (the way `processBranch` does it) I will need
to iterate over the nodes created by checkers and construct a new node
builder in each iteration.
[5 lines not shown]
[lldb][TypeSystemClang] Remove mostly unused is_complex output parameter to IsFloatingPointType (#178906)
Depends on:
* https://github.com/llvm/llvm-project/pull/178904
(only last commit is relevant for the review)
This is part of a patch series to clean up the
TypeSystemClang::IsFloatingPointType API. The `is_complex` parameter is
rarely checked. This patch introduces a `CompilerType::IsComplexType`
API which callers that previously checked `is_complex` can use instead.
This will also allow us to remove `CompilerType::IsFloat`, which is just
`IsFloatingPointType` that ignores the `is_complex` parameter.
Attributor: Add -light options to -attributor-enable flag (#179346)
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.