[InstSimplify] Consider `dereferenceable(N)` when simplifying pointer equalities (#203867)
Extend `computePointerICmp` to leverage `dereferenceable(N)` attribute
when simplifying pointer equality comparisons. Per attribute semantics,
an argument pointer marked as such cannot be a one-past-the-end pointer
to some object, thus it cannot equal the start of an adjacent object.
This lets us prove inequality between a `dereferenceable` argument and
storage allocated within the function.
Fixes: https://github.com/llvm/llvm-project/issues/200511.
[Test] Remove test creating invalid assume operand bundles (#203945)
This was creating random assume operand bundles, using unsupported
attributes, and using invalid arguments for supported ones.
Rather than trying to salvage this test, delete it and the API it tests.
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
[clangd][nfc] Avoid type erasure for local recursive callbacks (#203042)
Four local clangd callbacks use std::function only to call themselves.
Switch to local structs and static functions to avoid std::function
type-erasure and copy-support machinery.
In matched Release AArch64 builds, the four object files shrink by 8,152
bytes and 131 relocations; linked clangd shrinks by 3,872 bytes
unstripped and 16 bytes stripped, with __text down 360 bytes,
__DATA_CONST,__const down 208 bytes, unwind data down 32 bytes, and 21
fewer dyld fixups.
Work towards #202616
AI tool disclosure: Co-authored with OpenAI Codex.
[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards (#202523)
WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
[libc] Generate a stub for libpthread.a (#200908)
Several build systems / existing scripts assume that pthread functions
are exposed through separate library (`libpthread.so` / `libpthread.a`)
and thus use `-lpthread` flag explicitly. Since llvm-libc puts all the
pthread functions into the regular `libc`, teach the CMake build rules
to produce an empty static archive `libpthread.a` for compatibility
purposes.
[VPlan] Simplify reverse(reverse(x)) -> x (#199057)
This is a version of #196900 that performs the simplification as a
separate transform.
We need to add an additional `vp.splice.right(vp.splice.left(poison, x,
evl), poison, evl) -> x` simplification to avoid left over splices
whenever reverses are removed in an EVL tail folded loop.
Co-authored-by: Madhur Amilkanthwar <madhura at nvidia.com>
[llvm] Replace unordered_{map,set} with Dense{Map,Set} in llvm tools (#204058)
std::unordered_map is slow. Switch the remaining local maps and sets in
the command-line tools (llvm-profgen, llvm-profdata, llvm-objdump,
llvm-exegesis, llvm-xray, llvm-remarkutil) to DenseMap/DenseSet.
[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit (#183233)
According to the la-abi-specs, the `R_LARCH_*_PCADD_HI20` relocations
are also used on 64-bit LoongArch. Fix the range checking accordingly.
[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit
According to the la-abi-specs, the R_LARCH_*_PCADD_HI20 relocations are
also used on 64-bit LoongArch. Fix the range checking accordingly.
[mlir][OpenMP] Translate task_reduction on taskgroup
Add LLVM IR translation support for the task_reduction clause on
omp.taskgroup.
The translation builds task-reduction descriptors for the listed reduction
variables and emits the runtime initialization before the taskgroup body.
The reducer init and combiner callbacks are generated from the corresponding
omp.declare_reduction regions.
This patch keeps taskloop reduction and in_reduction translation unsupported;
those remain follow-up work. Unsupported task_reduction forms are diagnosed
instead of being lowered incorrectly.
Add MLIR translation tests for taskgroup task_reduction, multiple reducers,
plain taskgroup translation, and remaining unsupported cases.
[libc] Add TIOCGWINSZ and struct winsize support (#203919)
Added support for the TIOCGWINSZ ioctl command.
* Defined struct winsize in llvm-libc-types.
* Defined TIOCGWINSZ in linux/sys-ioctl-macros.h.
* Exposed struct_winsize and TIOCGWINSZ in sys/ioctl.yaml.
* Added struct_winsize proxy header in hdr/types/struct_winsize.h.
* Added a unit test in test/src/sys/ioctl/linux/ioctl_test.cpp.
Assisted-by: Automated tooling, human reviewed.
[mlir][emitc] Add a common type converter (#203763)
MemRef type conversion is currently implemented as part of the memref
dialect lowering pass, which means e.g. that func-to-emitc cannot lower
functions taking MemRef types as arguments.
This patch refactors the existing type conversions in EmitC's lowering
passes into a structure similar to the LLVM dialect by adding a common
EmitC type converter and using it across dialect-specfic EmitC lowering
passes and the generic convert-to-emitc pass.
Assisted-by: Copilot
[clang-format][NFC] Don't always rebuild clang-format-check-format (#203828)
Instead, check the format of clan-format source only if the built
clang-format binary or one of the source files is newer.
[llvm] Replace unordered_set<std::string> with StringSet (#204048)
std::unordered_set<std::string> without a pointer-stability requirement
can use StringSet: it avoids per-TU hashtable instantiations and the
std::string temporary at StringRef lookup sites (~3-4 KiB smaller .text
for llc/opt).
[ELF] Support multiple PT_GNU_RELRO when SECTIONS is used without PHDRS (#203675)
When a SECTIONS command interleaves relro and non-relro sections, the
relro
region is split into discontiguous runs. lld emits an error since
https://reviews.llvm.org/D40359
error: section: <name> is not contiguous with other relro sections
This is overly strict: while glibc only honors the first PT_GNU_RELRO,
other loaders (e.g. Bionic and FreeBSD rtld-elf) protect every
PT_GNU_RELRO segment.
Emit one PT_GNU_RELRO segment for each contiguous run of relro sections.
Track the boundary section so that `createPhdrs` starts a fresh PT_LOAD
at each relro->non-relro transition, as before.
Consumers that don't expect multiple PT_GNU_RELRO should check the
output themselves.
[llvm] Replace unordered_set<T *> with SmallPtrSet<T *, 0> (#204051)
std::unordered_set is slow. For pointer sets without a pointer-stability
or iterator-stability requirement, use SmallPtrSet<T *, 0> for a smaller
code size.
[TTI] Add missing no-cost coroutine intrinsics (#203816)
These intrinsics are lowered in the CoroCleanup pass and don't represent
actual code. This patch adds them to the no-cost list so they do not
contribute to the cost of inlining and optimization.
[flang][mlir] Add flang to mlir lowering for groupprivate (#180934)
This PR implements the Flang frontend lowering for the OpenMP
`groupprivate`
Changes:
- Update genOMP handler for OpenMPGroupprivate in OpenMP.cpp to generate
`omp.groupprivate` MLIR operation.
- Add clause processing for groupprivate variables
- Add test cases for groupprivate lowering
Co-Authored-By: Claude
[noreply at anthropic.com](mailto:noreply at anthropic.com)
[RISCV] Consider known leading zeros in narrowIndex for gather/scatter. (#203970)
If there are enough leading zeros for the shift amount, then
we can do the shift in the narrow type.
[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards
WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
[AMDGPU] Set WMMA source-operand reuse bits in SIPreEmitPeephole
gfx1250 WMMA instructions can set matrix_a_reuse / matrix_b_reuse bits
that keep the A or B source operand in a high-temporality state in the
VALU source-operand cache, so a later WMMA reusing the same registers
hits in the cache instead of re-reading the register file.
Add a late, post-RA peephole in the existing pre-emit peephole pass that
scans each basic block and, for every WMMA, sets the A/B reuse bit when
one of the next few WMMAs reuses the same physical registers as its A or B
operand and those registers are not redefined in between.
Stale sticky entries in the cache are cleared when a register is used in
an instruction without a reuse bit being set. Therefore, the final WMMA
use of the same source should not set the bit.