[WebAssembly][GlobalISel] Fix legalizeCustom return value for Helper.lower() (#191345)
Helper.lower() returns a LegalizerHelper::LegalizeResult enum where
UnableToLegalize=2, which implicitly converts to true (success). Compare
against LegalizerHelper::Legalized instead so that legalization failures
are correctly reported.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[Passes][Inliner] Add separate optsize inlinehint threshold (#191213)
PGO pre-inlining wants to set a different inlinehint threshold when
optimizing for size. Currently this is done by adjusting the InlineHint
threshold based on the pipeline optimization level.
Replace this with a separate OptSizeInlineHint threshold that is applied
based on attributes instead.
[WebAssembly] Fix: fixCallUnwindMismatches after fixCatchUnwindMismatches (#187484)
`fixCallUnwindMismatches()` adds an extra try block around call sites
with incorrect unwind targets. `fixCatchUnwindMismatches()` handles
catch blocks that have incorrect next unwind destinations. Previously we
ran `fixCallUnwindMismatches()` first and then ran
`fixCatchUnwindMismatches()`. The problem is that
`fixCatchUnwindMismatches()` wraps entire try blocks which can change
the unwind destination of the calls inside. If the calls had an
incorrect unwind target to begin with, they will be wrapped already and
so the outer wrapping won't alter their unwind target. However, if they
start out with a correct unwind target, they won't get wrapped and then
that can be messed up by `fixCatchUnwindMismatches()`.
The fix is to run `fixCatchUnwindMismatches()` first.
`fixCallUnwindMismatches()` never messes up the result of
`fixCatchUnwindMismatches()` so this is the correct order.
Resolves #187302
[2 lines not shown]
[Passes][Inliner] Handle optsize/minsize via attributes only (#190168)
InlineParams already has separate threshold for OptSize/MinSize
functions that get applied based on the corresponding function
attributes. As such, we should not also be changing the DefaultThreshold
based on the pipeline Os/Oz levels as well.
[LV] Use -force-target-supports-masked-memory-ops on target agnostic tail folding tests. NFC (#191181)
It's a good bit easier to read tail folding tests if masked memory ops
are allowed. This adds -force-target-supports-masked-memory-ops to tests
where we aren't explicitly trying to test predicated replicate regions
[Support] Factor PatternMatch m_Combine(And|Or), m_Isa (NFC) (#190753)
Introduce a new PatternMatchHelpers with a variant of m_Combine(And|Or)
and m_Isa to share across the IR PatternMatch,
ScalarEvolutionPatternMatch, and VPlanPatternMatch. m_Combine(And|Or)
has been generalized to be variadic. Planned follow-ups include
factoring the specific-value matcher.
[TTI] Add BasicTTIImpl cost model for llvm.masked.{u,s}{div,rem} (#191240)
This implements a generic cost model for the intrinsics added in
#189705. It costs it equivalently to the current expansion, i.e. an
unmasked divide + select.
[AMDGPU] Improve max3/min3 formation for tree-structured reductions (#190734)
The existing `performMinMaxCombine` forms `max3` by matching
`max(max(a,b), c) -> max3(a,b,c)`. For tree reductions like
`max(max(a,b), max(c,d))`, this produces `max3(a, b, max(c,d));` placing
`max3` on top. At the next tree level, `max(max3, max3)` cannot combine
because the 3-op opcode (`FMAXIMUM3`) differs from the 2-op opcode
(`FMAXIMUM`).
This patch:
1. Adds a tree combine: `max(max(a,b), max(c,d)) -> max(max3(a,b,c),
d)`, keeping a 2-op node on top that enables further combining.
2. Defers the existing combine when the operand is a tree node whose
children can still be combined, ensuring inner tree levels are optimized
before outer levels consume them.
Deferral is skipped when neither child has a single use, since the inner
combine cannot fire in that case.
Fix: LCOMPILER-1652
[PAC][clang] Use `Error` behavior for ptrauth module flags (#189923)
Previous use of `Min` for `ptrauth-elf-got` and
`ptrauth-sign-personality` module flags was introducing a risk of silent
decrease of security during module merge. The previous choice for `Min`
was mimicking the behavior for the `sign-return-address*` family of
module flags, but it does not make sense to apply this behavior here.
[NFC][lldb] Extract Do{Dis}EnableBreakpoint into helper functions (#191136)
Re-using this code will be important in an upcoming patch.
This commit also greatly simplifies the comments in the function.
[OpenMP][NFC] Refactor Non-contiguous Update Tests (#190923)
The PR refactors the non-contiguous update tests as raised as a TODO in
one of the comments in the related PR. Prefixed all with
`strided_update`. For offload tests, added a dedicated sub-directory.
[NFC][AMDGPU]: expose IGLPStrategyID in AMDGPUIGroupLP.h (#191340)
Move IGLPStrategyID and its enumerators into llvm::AMDGPU in the public
header so other translation units can share the immediate encoding.
[mlir][NVVM] Add InferTypeOpInterface to NVVM MBarrier ops with deterministic result types (#188173)
Add result type inference to 5 NVVM ops whose result types can be fully
determined from their operands and attributes. This enables the Python
binding generator to emit `results=None` as a default parameter,
removing the need for callers to pass explicit result types.
Ops with optional results (using `InferTypeOpAdaptorWithIsCompatible`):
- `MBarrierArriveOp`: i64 for non-cluster pointers, no result for
shared_cluster
- `MBarrierArriveDropOp`: same as above
- `MBarrierArriveExpectTxOp`: same, plus no result when predicate is set
- `MBarrierArriveDropExpectTxOp`: same as MBarrierArriveOp
- `BarrierOp`: i32 when reductionOp is present, no result otherwise
The optional-result ops use a permissive `isCompatibleReturnTypes` that
allows omitting the result, preserving backward compatibility with the
existing zero-result assembly form.
[8 lines not shown]
[SelectionDAG] Salvage debuginfo when combining load and z|s ext instrs. (#188544)
Reland 2b958b9ee24b8ea36dcc777b2d1bcfb66c4972b6
Salvage debuginfo when combining load and z|s ext instrs.
SelectionDAG uses the DAGCombiner to fold a load followed by a sext to a
load and sext instruction. For example, in x86 we will see that
```
%1 = load i32, ptr @GlobArr
#dbg_value(i32 %1, !43, !DIExpression(), !52)
%2 = sext i32 %1 to i64, !dbg !53
```
is converted to:
```
%0:gr64_nosp = MOVSX64rm32 $rip, 1, $noreg, @GlobArr, $noreg, debug-instr-number 1, debug-location !51
[14 lines not shown]
[CIR][Lowering] Handle address space cast in GlobalViewAttr lowering (#190197)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2099
This PR fixes the GlobalViewAttr LLVM lowering to use AddrSpaceCastOp
when the source and destination address spaces differ.
This fixes crashes when lowering globals referenced across address
spaces, such as AMDGPU globals in addrspace(1) referenced from
llvm.compiler.used arrays.
[DWARFYAML] Begin DWARFv5 debug_line support (#191167)
This patch adds enough support to generate a correct basic v5 header
(llvm-dwarfdump complains it can't find DW_LNCT_path, but I wouldn't say
it's strictly required). Directory and file name counts use relatively
complex encodings, so I'm leaving those for separate patch(es). For now,
I'm hardcoding the relevant fields to zero.
[libc] Implement accept(2) on linux (#191203)
The implementation follows the same patterns as the other socket
functions (this was mostly done using AI).
I've extended the connect test to test accepting connections as well
(and renamed it accordingly).
[DA] Fix overflow of findBoundsALL in BanerjeeTest
Fix signed overflow handling in `findBounds*` for the Banerjee test.
The previous implementation computed bounds using `getMinusSCEV` and
`getMulExpr` without checking for signed overflow, which could produce
incorrect bounds when coefficients have extreme values.
- Add `mulSCEVNoSignedOverflow` helper function that checks for
multiplication overflow before computing the result
- Use `minusSCEVNoSignedOverflow` and `mulSCEVNoSignedOverflow` in
`findBounds*` to safely compute bounds, returning `nullptr`
when overflow would occur
[clang][Serialization] Serialize DiagStateOnPushStack to fix pragma d… (#190420)
**Serialize DiagStateOnPushStack to fix pragma diagnostic push/pop
across PCH boundary**
`DiagStateOnPushStack` was not serialized in PCH files, causing `#pragma
clang diagnostic pop` to emit a spurious "no matching push" warning when
the corresponding push was in the preamble. This is because clangd
splits files into a preamble (compiled to PCH) and the main file body,
and the push/pop stack was lost during the PCH round-trip.
Serialize and deserialize DiagStateOnPushStack in
`WritePragmaDiagnosticMappings`/`ReadPragmaDiagnosticMappings` so that
unmatched pushes from a preamble are correctly restored.
Fixes https://github.com/clangd/clangd/issues/1167