[RISCV] Add tied destination constraint to CustomSiFiveVMACC. (#179567)
As the name suggess, these are multiply-accumulate instructions and
thus they have 3 sources.
[Clang][WebAssembly] Fix WASM tables to allow `__funcref` function pointers (#178720)
Allows __funcref pointers to be used as the element type for WASM tables
in Clang (static, global, zero-length arrays of a reference type).
Modifies `QualType::isWebAssemblyFuncrefType` to correctly look at the
addrspace of the pointee, rather than the pointer type.
Related: #140933
[mlir][shard,mpi] Fixing lowering allgather shard->mpi->llvm (#178870)
`shard.allgather` concatenates along a specified gather-axis. However,
`mpi.allgather` always concatenates along the first dimension and there
is no MPI operation that allows gathering along an arbitrary axis.
Hence, if gather-axis!=0, we need to create a temporary buffer where we
gather along the first dimension and then copy from that buffer to the
final output along the specified gather-axis. This is not ideal by far.
Along the way also
- fixing computation of memref size in mpitollvm
- adding a simple canonicalization pattern for comm_size for easier
debugging
- adding more tests
[CodeGen] Remove unused first operand of SUBREG_TO_REG (#179690)
The first input operand of SUBREG_TO_REG was an immediate that most
targets set to 0. In practice it had no effect on codegen. Remove it.
[AArch64][llvm] Pre-commit tests for enabling streaming with +fprcvt
Add pre-commit tests for enabling streaming with +fprcvt. Because I've
added a `+sve,+neon,+fullfp16,+fprcvt -force-streaming-compatible` line
to the testfiles, this required a small change to prevent an assert.
[RISCV] Add C/Zcf/Zcd/Zce implication rules to subtarget construction. (#179615)
This ensures the feature bits and RISCVSubtarget flags match what
RISCVISAInfo would do.
I'm not excited about the code duplication, but I need to set the
RISCVSubtarget flags along with calling ToggleFeature. I'll think about
how to improve this.
[llvm-readelf] --unwind: Support DW_EH_PE_sdata8 encoding (#179152)
... for both eh_frame_ptr_enc and table_enc fields when parsing the
PT_GNU_EH_FRAME program header (which contains .eh_frame_hdr) . This is
needed for large binaries where offsets exceed the 32-bit range,
The sdata8 encoding has been tested on an executable
generated by lld patched with
https://github.com/llvm/llvm-project/pull/179089
```
% cat a.cc
#include <stdio.h>
int main() { try { throw 1; } catch (...) { puts("a"); } }
% cat a.lds
SECTIONS
{
. = SIZEOF_HEADERS;
[39 lines not shown]
[clang][modules] Allow specifying thread-safe module cache (#179510)
This PR adds new member to `CompilerInstance::ThreadSafeCloneConfig` to
allow using a different `ModuleCache` instance in the cloned
`CompilerInstance`. This is done so that the original and the clone
can't concurrently work on the same `InMemoryModuleCache`, which is not
thread safe. This will be made use of shortly from the dependency
scanner along with the single-module-parse-mode to compile modules
asynchronously/concurrently.
This also fixes an old comment that incorrectly claimed that
`CompilerInstance`'s constructor is responsible for finalizing
`InMemoryModuleCache` buffers, which is no longer the case.
AMDGPU: Strip sign bit operations on llvm.amdgcn.trig.preop uses
The instruction ignores the sign bit, so we can find the magnitude source.
The real library use has a fabs input which this avoids.
stripSignOnlyFPOps should probably go directly into PatternMatch in some
form.
AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop (#179026)
Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop (#179025)
We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
[NVPTX][NFC] Update fence.py and cmpxchg.py to generate ptxas-sm_XY and ptxas-isa-X.Y checks in RUN lines (#179378)
The cmpxchg-sm*.ll, fence*.ll files were manually updated to include
version checks. Modifying the generator scripts so that they will
correctly generate the version checks.
Fixes the issue raised in
https://github.com/llvm/llvm-project/pull/176078#issuecomment-3792304497
that led to
https://github.com/llvm/llvm-project/commit/acff9fa4dba2e39da73227d835dfd12be434645e.
(Thanks @vvereschaka!)
When I regenerated cmpxchg tests, I ended up overwriting the ptxas-sm
checks, because the generator script does not have them. Added comments
in the tests explaining that they should not be modified manually.
[SystemZ][z/OS] Reverse the order of instructions to save and restore CSRs (#179540)
Reverse the order of instructions to save and restore CSRs so
instruction on small numbered reg goes first.
[milr][gpu] Make barrier elimination address-space aware (#178101)
Upgrade the barrier eliminiation pass to account for the address spaces
of accessed memory when deciding which barriers to eliminiate. In
particular, a loop that only reads and writes global memory that has a
workgoup-memory-fencing barrier inside of it will now have that barrier
marked for elimiination, as the global memory traffic is not being
synchronized by the barrier.
The pass is also adjusted to ignore barriers whose memory fencing list
is [], as those do not synchronize memory and therefore the logic in
this pass would potentially incorrectly remove them after proving that
fact.
---------
Co-authored-by: Jakub Kuderski <kubakuderski at gmail.com>
[mlir][emitc] Update the `WrapFuncInClassPass` pass (#179184)
Update the `WrapFuncInClassPass` pass so that, by default, the generated
method is named `operator()()` rather than `execute()`. This makes the
pass more generic, instead of catering to specific users expecting an
`execute()` method.
To preserve the original behaviour, add a new pass option to override
the method name: `func-name`. For example:
```bash
mlir-opt file.mlir -wrap-emitc-func-in-class=func-name=execute
```
Additionally, make a couple of small editorial changes:
* Rename `populateFuncPatterns` to `populateWrapFuncInClass` to make it
clear that the corresponding pattern is specific to the
`WrapFuncInClass` pass.
* Remove `// CHECK: module {` to reduce test noise.
[2 lines not shown]
[Flang][mlir][OpenMP] Add affinity clause to omp.task and Flang lowering (#179003)
- Add MLIR OpenMP affinity clause
- Lower flang task affinity to mlir
- Emit TODO for iterator modifier and update negative test
[flang] Add getFIRToLLVMPassOptions helper function (#179293)
Extract `FIRToLLVMPassOptions` initialization into a helper function,
allowing other code to construct pass options from pipeline
configuration without duplication.
---------
Co-authored-by: Delaram Talaashrafi <dtalaashrafi at rome5.pgi.net>
[libc++] Refactor formatter_int.bench.cpp to not use CartesianProduct (#179483)
The CartesianProduct machinery is incredibly expensive and makes it
trivial to add significant amounts of benchmarks which may not actually
serve much of a purpose. This patch doesn't remove any of the actual
benchmarks, but explicitly lists the benchmarks previous generated via
the CartesianProduct machinery. Still, the benchmarks run ~2x faster.
Fixes #178458
[SystemZ][z/OS] Set R5 as not restored. (#179666)
R5 (environment register) should not be restored. This is missing in the
code.
Add it back and also add a test to verify it.