[clang][NFC] Mark CWG1336 as implemented and add a test (#196000)
[CWG1336](https://wg21.link/cwg1336) clarifies that, as long as it isn't
explicit, a constructor is still a converting constructor even if it has
multiple arguments. Clang seems to implement this since 3.1:
https://godbolt.org/z/919zdMd3h (I checked a few versions following 3.1
as well, and didn't notice any regressions).
[MLIR] Fix use-after-scope when interchanging ploops (#196076)
getInductionVars returns a SmallVector, so going through zip+reverse
gets us a dangling reference. Quite a footgun.
Found by asan.
[AIX][Clang][Driver] Fix OBJECT_MODE bug on AIX (#193550)
If `--target` is specified it should take precedence over `OBJECT_MODE`.
This is important, for example, for lit tests which want to specify an
explicitly 32-bit or 64-bit triple on AIX, or they may get the wrong bit
mode depending on the environment they run in.
[Support] Move UndefPoisonKind enum to a shared header (#195523)
This patch moves the **`UndefPoisonKind`** enum to a shared header in
`llvm/include/llvm/Support/UndefPoison.h` to resolve the dependency
issues identified in #194818.
Changes:
- Created the new header` llvm/include/llvm/Support/UndefPoison.h`.
- Removed duplicate local definitions from
`llvm/lib/Analysis/ValueTracking.cpp` and
`llvm/lib/CodeGen/GlobalISel/Utils.cpp`.
[Driver][HIP/SPIRV] Fix crash when llvm-link is executed.
There is a design limitation that is forwarding flags to llvm-link
when it shouldn't happen. This commit fixes this issue by sanitizing
the arguments forwarded to llvm-link.
This may happen when clang-linker-wrapper eventually calls clang.
Crash reproducer is here: https://gcc.godbolt.org/z/rxvWcvan3.
The fix is based on MrSidims's old PR (#183492).
Co-authored-by: Dmitry Sidorov <18708689+MrSidims at users.noreply.github.com>
Co-authored-by: Manuel Carrasco <manuel.carrasco at amd.com>
[PowerPC] Remove duplicate patterns for atomic_swap (#195936)
The definition and implementation of atomic_load_* and atomic_swap is
basically similar. Changing the way how the operations are enumerated
makes it possible to remove the separate patterns for atomic_swap.
[mlir][spirv] Improve verification for SPIR-V TOSA ops (#195624)
Add shape and attribute verification for several SPIR-V TOSA ops:
reductions, FFT2D, RFFT2D, MatMul, Clamp, Concat, and Resize.
Add negative parser/verification tests for the new checks.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[AArch64] Match vector neg(and X, 1) as CMTST (#194833)
AArch64 already recognizes vector icmp/sext forms such as
sext(icmp ne (and X, C), 0) as CMTST.
However, for bit-zero mask idioms, the middle-end can canonicalize the
expression to sub 0, (and X, 1). This produces a 0/-1 vector mask, but
currently lowers to and+neg instead of CMTST.
Recognize vector neg(and X, splat(1)) / sub 0, (and X, splat(1)) as a
CMTST idiom.
The match is intentionally limited to exact splat(1). For example,
neg(and X, 2) produces 0/-2, not a 0/-1 mask, and is not equivalent to
CMTST.
Fixes #107093.
[HLSL] Allow __builtin_hlsl_resource_getpointer to take no indices (#195151)
In preperation for adding ConstnatBuffer<T>, we will need to be able to
access the base pointer for the data constat buffer resource handle is
pointingto
to. This is done by:
1. Making the index operand in __builtin_hlsl_resource_getpointer
optional.
2. Modifing the codegen for __builtin_hlsl_resource_getpointer to emit a
call to resource.getbasepointer when no index is provided.
3. Add the resource.getbasepointer for the dx and spv targets.
Another issue is that the address space for the pointer returned by
__builtin_hlsl_resource_getpointer is not always hlsl_device any more.
Changes are made to get the correct address space based on the resource
class of the handle.
Note that we cannot implement codegen for
[17 lines not shown]
[BOLT][AArch64] Refuse to run Stoke analysis on AArch64 (#195878)
`--stoke` and `--stoke-out` yields an UNIMPLEMENTED crash on AArch64. It
is a fundamentally X86 pass.
- Add a non-X86 guard
- Add the error message to unsupported-passes.test.
[gn] Add +x bit on scripts missing it (#196064)
rg -l '#!' llvm/utils/gn/build/*.py | xargs chmod +x
No effective behavior change. Makes it easier to run these scripts
manually.
[Instrumentor] Add Instrumentor pass (#138958)
This commit adds the basic infrastructure for the Instrumentor pass, which
allows instrumenting code in a simple and customizable way. This commit
adds support for instrumenting load and store instructions. The
Instrumentor can be configured with a JSON file that describes what
should be instrumented, or can be used programmatically from another
pass.
The default JSON config file can be found in:
`llvm/test/Instrumentation/Instrumentor/default_config.json`. More
information about Instrumentor in the
[RFC](https://discourse.llvm.org/t/rfc-introducing-instrumentor-easily-customizable-code-instrumentation/86020).
This is only a squash commit of several contributions to the
Instrumentor. The authors and contributors of this pass are:
- Johannes Doerfert @jdoerfert
- Kevin Sala @kevinsala
[7 lines not shown]
[clang][test] Add `%clang_cc1_cg_arm64_neon` substitution (#188547)
Add a LIT substitution `%clang_cc1_cg_arm64_neon` expanding to:
```python
clang -cc1 -internal-isystem <path> \
-triple arm64-none-linux-gnu \
-target-feature +neon -o -
```
This invocation is repeated across multiple tests. Introducing a
substitution reduces duplication, shortens RUN lines, and ensures
consistency across `clang -cc1` invocations.
Shorter RUN lines also make test-specific flags easier to spot.
[mlir][gpu] Reject conflicting async operands on gpu.launch_func (#196012)
Reject gpu.launch_func ops that have both async dependencies and an
explicit async object.
[mlir] Use custom mlir::Complex type for non-float complex numbers (#191821)
Instantiating std::complex for types where std::is_floating_point<T> is
false is not allowed, and throws warnings when building with MSSTL. This
patch fixes those warnings by introducing an mlir::Complex type, which
is a typedef to std::complex when T satisfies is_floating_point, and a
custom complex type otherwise.
The std::complex implementation from libc++ has been used as a guide for
implementing the custom type.
Fixes #65255
[OpenMP][offload] Inline target reductions
Significantly reduces register usage and removes register spilling in
`offload/test/offloading/multiple-reductions.cpp`, for example.
Provides speedup of up to 5-10x for a lot of reductions in such a larger
setup.
[OpenMP][offload] Add enhanced cross-team reduction test
Tests different patterns of OpenMP cross-team reductions, for multiple
data types.
If run with `LIBOMPTARGET_INFO=16`, shows current register spilling due
to dispatch jump chains (which grow for every reduction in the same
translation unit) for indirect function calls in the reduction runtime.