[profcheck] Disable verification of selects on vector conditions. (#167973)
We don't currently support profile metadata on selects where the condition is a vector.
Issue #147390
[Support][Jobserver][Tests] Simplify default executor init and make (#165264)
jobserver tests deterministic
- Replace call-once wrapper in Parallel.cpp with a function-local static
default executor.
- Rework Jobserver tests for parallelFor/parallelSort to run in a fresh
subprocess. The parent test spawns the current test binary with a gtest
filter selecting a child test, ensuring the child process initializes
the default executor after setting parallel::strategy =
jobserver_concurrency() and after setting up a FIFO-backed jobserver
proxy. This makes the tests reliable and independent from prior executor
initialization in the combined SupportTests binary.
AMDGPU: Constrain readfirstlane operand when writing to m0
Fixes another verifier error after introducing AV registers.
Also fixes not clearing the subregister index if there was
one.
[Hexagon] Implement isUsedByReturnOnly (#167637)
Prior to this patch, libcalls inserted by the `SelectionDAG` legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking
`TargetLowering::isInTailCallPosition` and comparing the return type of
the libcall and the caller. `isInTailCallPosition` in turn calls
`TargetLowering::isUsedByReturnOnly` (which always returns false if not
implemented by the target).
AMDGPU: Constrain readfirstlane operand to vgpr_32
When inserting a readfirstlane, ensure the operand constraint
is respected. If the source register was an av_* class, the
verifier would fail.
Fixes regression after c7019c7eda6629ae99eb95aa1ee9e1f8249a4f49
RuntimeLibcalls: Move VectorLibrary handling into TargetOptions
This fixes the -fveclib flag getting lost on its way to the backend.
Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.
Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.
RuntimeLibraryAnalysis could follow the same trick that TargetLibraryInfo is
using in the future, but a lot more boilerplate changes are needed to thread
that analysis through codegen. Ideally this would come from an IR module flag,
and nothing would be in TargetOptions. For now, it's better for all of these
sorts of controls to be consistent.
DeclareRuntimeLibcalls: Use RuntimeLibraryAnalysis
Also add boilerplate to have a live instance when running
opt configured from CommandFlags / TargetOptions.
[CIR] Upstream l-value emission for ExprWithCleanups (#167938)
This adds the necessary handler for emitting an l-value for an
ExprWithCleanups expression.
[mlir][ROCDL] Refactor wmma intrinsics to use attributes not operands where possible (#167041)
The current implementation of the WMMA intrinsic ops as they are defined
in the ROCDL tablegen is incorrect. They represent as operands what
should be attributes such as `clamp`, `opsel`, `signA/signB`. This
change performs a refactoring to bring it in line with what we expect.
---------
Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
[dfsan] Fix Endianess issue (#162881)
Fix Endianess issue with getting shadow 4 bytes corresponding to the
first origin pointer.
---------
Co-authored-by: anoopkg6 <anoopkg6 at github.com>
[AArch64] Optimize extending loads of small vectors
Reduces the total amount of loads and the amount of moves between SIMD
registers and general-purpose registers.
[sanitizer-common] [Darwin] Fix overlapping dyld segment addresses (attempt 2) (#167800)
This re-lands #166005, which was reverted due to the issue described in
#167797.
There are 4 small changes:
- Fix LoadedModule leak by calling Clear() on the modules list
- Fix internal_strncpy calls that are not null-terminated
- Improve test to accept the dylib being loaded from a different path
than compiled `{{.*}}[[DYLIB]]`
- strcmp => internal_strncmp
This should not be merged until after #167797.
rdar://163149325
[offload-arch] Fix amdgpu-arch crash on Windows with ROCm 7.1 (#167695)
The tool was crashing on Windows with ROCm 7.1 due to two issues: misuse
of hipDeviceGet which should not be used (it worked before by accident
but was undefined behavior), and ABI incompatibility from
hipDeviceProp_t struct layout changes between HIP versions where the
gcnArchName offset changed from 396 to 1160 bytes.
The fix removes hipDeviceGet and queries properties directly by device
index. It defines separate struct layouts for R0600 (HIP 6.x+) and R0000
(legacy) to handle the different memory layouts correctly.
An automatic API fallback mechanism tries R0600, then R0000, then the
unversioned API until one succeeds, ensuring compatibility across
different HIP runtime versions. A new --hip-api-version option allows
manually selecting the API version when needed.
Additional improvements include enhanced error handling with
hipGetErrorString, verbose logging throughout the detection process, and
[3 lines not shown]
[clang-format] Recognize Verilog DPI export and import (#165595)
The directives should not change the indentation level. Previously the
program erroneously added an indentation level when it saw the
`function` keyword.
[Polly] Introduce PhaseManager and remove LPM support (#125442) (#167560)
Reapply of a22d1c2225543aa9ae7882f6b1a97ee7b2c95574. Using this PR for
pre-merge CI.
Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.
Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:
* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM
[27 lines not shown]
[clang-format] Align trailing comments for function parameters (#164458)
before
```C++
void foo(int name, // name
float name, // name
int name) // name
{}
```
after
```C++
void foo(int name, // name
float name, // name
int name) // name
{}
```
[5 lines not shown]
[AMDGPU] Prioritize allocation of low 256 VGPR classes
If we have 1024 VGPRs available we need to give priority to the
allocation of these registers where operands can only use low 256.
That is noteably scale operands of V_WMMA_SCALE instructions.
Otherwise large tuples will be allocated first and take all low
registers, so we would have to spill to get a room for these
scale registers.
Allocation priority itself does not eliminate spilling completely
in large kernels, although helps to some degree. Increasing spill
weight of a restricted class on top of it helps.
[CIR][NFC] Add missing code markers for Dtor_VectorDeleting (#167969)
This adds some minimal code to mark locations where handling is needed
for Dtor_VectorDeleting type dtors, which were added in
https://github.com/llvm/llvm-project/pull/165598
This is not a comprehensive mark-up of the missing code, as some code
will be needed in places where the surrounding function has larger
missing pieces in CIR currently.
This fixes a warning for an uncovered switch case that was causing CI
builds to fail.