[Support] Optimize parallel `TaskGroup` (#189196)
Two improvements to reduce `TaskGroup::spawn()` overhead:
1. Replace mutex-based `Latch::inc()` with atomic `fetch_add`. `dec()`
retains the mutex to prevent a race where `sync()` observes Count==0
and destroys the Latch while `dec()` is still running.
2. Pass `Latch&` through `Executor::add()` so the worker calls `dec()`
directly, eliminating the wrapper lambda that previously captured
both the user's callable and the Latch reference. This avoids one
`std::function` construction and potential heap allocation per spawn.
[DA] Check nsw flags for addrecs in gcd MIV test
This patch adds a check to ensure that every addrec have nsw flags
in gcd MIV test. If one of them doesn't have, the analysis bails
out. This check is necessary because the subsequent process in the
gcd MIV test assumes that they don't wrap.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
[APINotes][unsafe-buffer-usage] Add [[clang::unsafe_buffer_usage]] support in APINotes
Support the ``[[clang::unsafe_buffer_usage]]`` attribute in APINotes, e.g.,
```
Functions:
- Name: myUnsafeFunction
UnsafeBufferUsage: true
```
rdar://171859135
[Psuedoprobe][MachO] Enable pseudo probes emission for MachO (#185758)
Enable pseudo probes emission for MachO. Due to the 16 character limit
of MachO segment and section, the file sections will be
`__PSEUDO_PROBE,__probes` and `__PSEUDO_PROBE,__probe_descs`.
[SLP] Replace TrackedToOrig DenseMap with parallel SmallVector in reduction
Replace the DenseMap<Value*, Value*> TrackedToOrig with a SmallVector<Value*>
indexed in parallel with Candidates. This avoids hash-table overhead for the
tracked-value-to-original-value mapping in horizontal reduction processing.
Fixes #189686
[VPlan] Stop outerloop vectorization from vectorizing nonvector intrinsics (#185347)
In outer-loop VPlan, avoid emitting vector intrinsic calls for intrinsics
without a vector form. In VPRecipeBuilder, detect missing vector intrinsic
mapping and emit scalar handling instead of a vector call.
Also fix assertion when `llvm.pseudoprobe` in VPlan's native path is being
treated as a `WIDEN-INTRINSIC`.
Reproducer: https://godbolt.org/z/GsPYobvYs
[NFC][LLVM] Simplify `TypeInfoGen` in Intrinsics.td (#189278)
Eliminate `MappingRIdx` by making it an identity function. Currently,
`MappingRIdx` is used to map the index of an `llvm_any*` type in an
intrinsic type signature to its overload index. Eliminating this mapping
means that dependent types in LLVM intrinsic definitions (like
`LLVMMatchType` and its subclasses) should use the overload index to
reference the overload type that it depends on (and not the index within
the llvm_any* subset of overloaded types).
See
https://discourse.llvm.org/t/rfc-simplifying-intrinsics-type-signature-iit-info-generation-encoding-in-intrinsicemitter-cpp/90383
[RISCV][P-ext] Support i32 ushlsat on RV32. (#189730)
We have a sshl instruction on RV32 in the 0.21 spec. Unfortunately,
we don't have a SSLLI instruction, but we can put a constant shift
amount in a register.
[lldb][Platform] Handle LoadScriptFromSymFile per-module FileSpec (#189696)
This patch changes the `Platform::LocateXXX` to return a map from
`FileSpec` to `LoadScriptFromSymFile` enum.
This is needed for https://github.com/llvm/llvm-project/pull/188722,
where I intend to set `LoadScriptFromSymFile` per-module.
By default the `Platform::LocateXXX` set the value to whatever the
target's current `target.load-script-from-symbol-file` is set to. In
https://github.com/llvm/llvm-project/pull/188722 we'll allow overriding
this per-target setting on a per-module basis.
Drive-by:
* Added logging when we fail to load a script.
[TargetLowering] Replace always true if with an assert. NFC (#189750)
We already returned for UADDSAT/USUBSAT leaving SADDSAT/SSUBSAT as the
only opcodes that can get here.
[compiler-rt][asan] Forward fix for free_aligned_sized_mismatch.cpp (#189760)
Mark this test as UNSUPPORTED for android since android's libc doesn't
seem to support aligned_alloc.
[RISCV][P-ext] Rename simm8_unsigned/simm10_unsigned used PLUI/PLI. NFC (#188808)
Replace unsigned with plui or pli_b to better indicate their usage.
Templatize the render function and rename it addSExtImm instead of
addSImm*Unsigned.
[mlir][MemRef] Migrate memref dialect alias op folding to interface (#187168)
This PR adds code to FoldMemRefAliasOps / --fold-memref-alias-ops to use
the new IndexedMemoryAccessOpInterface and
IndexedMemCopyOpInterface and implement those operations for relevant
operations in the memref dialect.
This is a reordering of the changes planned in #177014 and #177016 to
make them more testable.
There are no behavior changes expected for how memref.load and
memref.store behave within the alias ops folding pass, though support
for new operations, like memref.prefetch, has been added.
Some error messages have been updated because certain laws of
memref.load/memref.store have been moved to IndexedAccessOpInterface.
Assisted-by: Claude 4.6 (helped deal with some of the boilerplate in the
rewrite patterns and with extracting the patch)