[VPlan] Simplify reverse(reverse(x)) -> x (#199057)
This is a version of #196900 that performs the simplification as a
separate transform.
We need to add an additional `vp.splice.right(vp.splice.left(poison, x,
evl), poison, evl) -> x` simplification to avoid left over splices
whenever reverses are removed in an EVL tail folded loop.
Co-authored-by: Madhur Amilkanthwar <madhura at nvidia.com>
[llvm] Replace unordered_{map,set} with Dense{Map,Set} in llvm tools (#204058)
std::unordered_map is slow. Switch the remaining local maps and sets in
the command-line tools (llvm-profgen, llvm-profdata, llvm-objdump,
llvm-exegesis, llvm-xray, llvm-remarkutil) to DenseMap/DenseSet.
[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit (#183233)
According to the la-abi-specs, the `R_LARCH_*_PCADD_HI20` relocations
are also used on 64-bit LoongArch. Fix the range checking accordingly.
[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit
According to the la-abi-specs, the R_LARCH_*_PCADD_HI20 relocations are
also used on 64-bit LoongArch. Fix the range checking accordingly.
[libc] Add TIOCGWINSZ and struct winsize support (#203919)
Added support for the TIOCGWINSZ ioctl command.
* Defined struct winsize in llvm-libc-types.
* Defined TIOCGWINSZ in linux/sys-ioctl-macros.h.
* Exposed struct_winsize and TIOCGWINSZ in sys/ioctl.yaml.
* Added struct_winsize proxy header in hdr/types/struct_winsize.h.
* Added a unit test in test/src/sys/ioctl/linux/ioctl_test.cpp.
Assisted-by: Automated tooling, human reviewed.
[mlir][emitc] Add a common type converter (#203763)
MemRef type conversion is currently implemented as part of the memref
dialect lowering pass, which means e.g. that func-to-emitc cannot lower
functions taking MemRef types as arguments.
This patch refactors the existing type conversions in EmitC's lowering
passes into a structure similar to the LLVM dialect by adding a common
EmitC type converter and using it across dialect-specfic EmitC lowering
passes and the generic convert-to-emitc pass.
Assisted-by: Copilot
[clang-format][NFC] Don't always rebuild clang-format-check-format (#203828)
Instead, check the format of clan-format source only if the built
clang-format binary or one of the source files is newer.
[llvm] Replace unordered_set<std::string> with StringSet (#204048)
std::unordered_set<std::string> without a pointer-stability requirement
can use StringSet: it avoids per-TU hashtable instantiations and the
std::string temporary at StringRef lookup sites (~3-4 KiB smaller .text
for llc/opt).
[ELF] Support multiple PT_GNU_RELRO when SECTIONS is used without PHDRS (#203675)
When a SECTIONS command interleaves relro and non-relro sections, the
relro
region is split into discontiguous runs. lld emits an error since
https://reviews.llvm.org/D40359
error: section: <name> is not contiguous with other relro sections
This is overly strict: while glibc only honors the first PT_GNU_RELRO,
other loaders (e.g. Bionic and FreeBSD rtld-elf) protect every
PT_GNU_RELRO segment.
Emit one PT_GNU_RELRO segment for each contiguous run of relro sections.
Track the boundary section so that `createPhdrs` starts a fresh PT_LOAD
at each relro->non-relro transition, as before.
Consumers that don't expect multiple PT_GNU_RELRO should check the
output themselves.
[llvm] Replace unordered_set<T *> with SmallPtrSet<T *, 0> (#204051)
std::unordered_set is slow. For pointer sets without a pointer-stability
or iterator-stability requirement, use SmallPtrSet<T *, 0> for a smaller
code size.
[TTI] Add missing no-cost coroutine intrinsics (#203816)
These intrinsics are lowered in the CoroCleanup pass and don't represent
actual code. This patch adds them to the no-cost list so they do not
contribute to the cost of inlining and optimization.
[flang][mlir] Add flang to mlir lowering for groupprivate (#180934)
This PR implements the Flang frontend lowering for the OpenMP
`groupprivate`
Changes:
- Update genOMP handler for OpenMPGroupprivate in OpenMP.cpp to generate
`omp.groupprivate` MLIR operation.
- Add clause processing for groupprivate variables
- Add test cases for groupprivate lowering
Co-Authored-By: Claude
[noreply at anthropic.com](mailto:noreply at anthropic.com)
[RISCV] Consider known leading zeros in narrowIndex for gather/scatter. (#203970)
If there are enough leading zeros for the shift amount, then
we can do the shift in the narrow type.
[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards
WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
[AMDGPU] Set WMMA source-operand reuse bits in SIPreEmitPeephole
gfx1250 WMMA instructions can set matrix_a_reuse / matrix_b_reuse bits
that keep the A or B source operand in a high-temporality state in the
VALU source-operand cache, so a later WMMA reusing the same registers
hits in the cache instead of re-reading the register file.
Add a late, post-RA peephole in the existing pre-emit peephole pass that
scans each basic block and, for every WMMA, sets the A/B reuse bit when
one of the next few WMMAs reuses the same physical registers as its A or B
operand and those registers are not redefined in between.
Stale sticky entries in the cache are cleared when a register is used in
an instruction without a reuse bit being set. Therefore, the final WMMA
use of the same source should not set the bit.
[AMDGPU] Update f8f6f4-wmma hazard tests regarding matrix format, NFC (#204037)
Need to map the matrix format suffix to the register size correctly in
the MIR tests. For example, 'F4' format needs v8i32 register class.