[RISCV] Add codegen patterns to support short forward branches with immediates (#185643)
This is a follow-up to #182456. This PR adds support for short forward
branches where branches are from Qualcomm uC `Xqcibi` extension.
[libclc][NFC] Rename three .inc files to avoid name conflicts (#186384)
Follow-up of 9b96ebc. There are binary_def.inc and unary_def.inc in
header directory.
- clc_ep.inc -> clc_ep_decl.inc
- relational/binary_def.inc -> relational/relational_binary_def.inc
- relational/unary_def.inc -> relational/relational_unary_def.inc
[NFC][Support] Don't test UB in Caching.WriteAfterCommit (#186532)
The test expects crash after commit essentially null-dereferencing.
Just check that it's nullptr directly.
Fixes asan/ubsan buildbot.
[clang][DirectX] Specify element-aligned vectors in TargetInfo (#185954)
Add a bit to TargetInfo to specify that vectors are element-aligned
rather than naturally aligned. This is needed to match DirectX's Data
Layout in LLVM.
Note that this removes the `Opts.HLSL` early exit from
`checkDataLayoutConsistency` so that we actually get these checks when
compiling HLSL. This check looks like it was put there because of
similarity between OpenCL and HLSL, but it isn't actually necessary.
Resolves #123968
[libc] Reference the proper namespaced variables in the GPU header
Summary:
These linked to the extern "C" versions which did not exist in test
builds.
[ScalarEvolution] Limit recursion in getRangeRef for PHI nodes. (#152823)
Restrict PHI nodes that getRangeRef is allowed to recursively examine so
we don't need a "visited" set. And fix createSCEVIter so it creates all
the relevant SCEV nodes before getRangeRef tries to examine them.
The tests that are affected have induction variables that aren't
AddRecs. (Other cases are theoretically affected, but don't seem to show
up in our tests.)
Fix callee type generation (#186272)
The callee_type metadata is expected to be a list of generalized type
metadata by the IR verifier. But for indirect calls with internal
linkage the type metadata is just an integer. Avoid including them in
callee_type metadata.
This will reduce the precision of the generated call graph as the edges to internal linkage functions whose address were taken will not be present anymore. We need to handle this in the future.
[lldb] Enable SanitizersAllocationTraces=tagged in darwin-mte-launcher (#186326)
Collect allocation traces for tagged memory when using the
`darwin-mte-launcher` to help debug MTE crashes.
[lldb] Add support for the darwin-mte-launcher to lldb-dotest (#186319)
Add support for the `darwin-mte-launcher` to `lldb-dotest` when LLDB is
configured to run the tests under MTE.
Remove unicode character from AttrDocs.td (#186521)
PR #185225 introduced a single unicode character, which is the only
unicode character in this file. Change this to a ASCII/Latin1 letter.
[LoopIdiomVectorize] Preserve address space in FindFirstByte (#185226)
Fixes #185188
Use SearchStart->getType() instead of Builder.getPtrTy() so that
pointer-typed PHI nodes preserve the address space of the original
pointers.
Assisted-by: Claude (Anthropic)
[flang] Fix SELECT TYPE in OpenACC construct (#186511)
A routine in Semantics/resolve-directives.cpp was overwriting a symbol
table pointer in a parse tree Name, thereby removing the AssocEntity
with the correct type for a TYPE IS or CLASS IS clause that had been
placed there. I don't really understand why resolve-directives has to
overwrite symbol table pointers in the first place, but it definitely
shouldn't be replacing these.
[mlir] Replace MLIR_ENABLE_ROCM_CONVERSIONS with LLVM_HAS_AMDGPU_TARGET (#182652)
`LLVM_HAS_NVPTX_TARGET` is already defined in `llvm/Config/Targets.h`
and used to gate NVPTX-related code in MLIR. The same macro exists for
AMDGPU as `LLVM_HAS_AMDGPU_TARGET`, but MLIR defined its own
`MLIR_ENABLE_ROCM_CONVERSIONS` variable for this purpose. This PR
removes `MLIR_ENABLE_ROCM_CONVERSIONS` and replaces it with
`LLVM_HAS_AMDGPU_TARGET`, bringing parity with the NVPTX target.
---------
Co-authored-by: William Moses <gh at wsmoses.com>
[LoopUnroll] Remove `computeUnrollCount()`'s return value (#184529)
`computeUnrollCount()`'s return value is used to communicate whether
unrolling was explicitly requested. However, each of
`computeUnrollCount()`'s two callers can compute this directly:
- `LoopUnrollAndJamPass` already checks for loop unrolling metadata
[before calling
`computeUnrollCount()`](https://github.com/llvm/llvm-project/blob/43dbcdea98f5bb04ae967bdd81ece2d2144f4661/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp#L308).
The return value only added handling for `-unroll-count`, a testing flag
with no UnrollAndJam test coverage.
- `tryToUnrollLoop()` can use `PragmaInfo(L).ExplicitUnroll` directly at
the `setLoopAlreadyUnrolled()` call site.
- In all but one case where `computeUnrollCount()` explicitly `return`s
`false` instead of `ExplicitUnroll`, `UP.Count = 0` is set. This causes
`tryToUnrollLoop()` to early-exit before reaching
`setLoopAlreadyUnrolled`.
- The remaining case that `return`s false, but does not set `UP.Count =
[7 lines not shown]
[HLSL][DirectX] Add `transpose` HLSL intrinsic and DXIL lowering of `llvm.matrix.transpose` (#186263)
Fixes #184922
- [x] Implement `transpose` clang builtin in `Builtins.td`
- [x] Link `transpose` clang builtin with `hlsl_alias_intrinsics.h`
- [x] Add sema checks for `transpose` to `CheckHLSLBuiltinFunctionCall`
in `SemaHLSL.cpp`
- [x] Add codegen for `transpose` to `EmitHLSLBuiltinExpr` in
`CGHLSLBuiltins.cpp`
- `transpose` lowers to the `llvm.matrix.transpose` intrinsic
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/transpose.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/transpose-errors.hlsl`
- [x] Implement lowering of the `llvm.matrix.transpose` intrinsic in the
DXIL backend in `DXILIntrinsicExpansion.cpp`
- The intrinsic lowers to a shufflevector like in DXC
https://hlsl.godbolt.org/z/Gj959q6sq
[3 lines not shown]
[libc] Support ls in printf (#178841)
Add support for %ls in printf by calling internal string converter and
add relevant end-to-end sprintf test. Additionally, modified printf
parser for recognizing length modifier. This also disables wide string
support on windows
and other unsupported platforms.
Co-authored-by: shubhe25p <shubhp at mbm3a24.local>
[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)
The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.
Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.
This improves on my original change 66e8270e8.
[coro] [async] There needs to be a one-to-one corespondance between the async resume function value and the suspend intrinsic (#186436)
We need to mark both the async.resume intrinsic function and the
supsend.async function as not duplicatable. The async.resume function
models the continuation after a suspend. It is non sense to not have a
one-to-one correspondance between the two: if the suspend intrinsic is
cloned so needs to be the matching async.resume intrinsic call.
rdar://172130181https://github.com/swiftlang/swift/issues/87719
[CodeGen] Drop uses of BranchInst (#186391)
Largely a straight-forward replacement with occasional simplifcations.
For AMDGPU, I assumed that unconditional branches are always uniform and
therefore "simplified"/changed AMDGPUAnnotateUniformValues to only
annotate conditional branches.
Target-specific FastISel only selects conditional branches,
unconditional branches are already handled by the non-target-specific
code.
Remove unicode character from AttrDocs.td
PR #185225 introduced a single unicode character, which is the only
unicode character in this file. Change this to a ASCII/Latin1 letter.