Clang: Deprecate float support from __builtin_elementwise_max (#180885)
Now we have
__builtin_elementwise_maxnum
__builtin_elementwise_maximum
__builtin_elementwise_maximumnum
[libc][math][c23] implement C23 `acospif` math function (#183661)
Implementing C23 `acospi` math function for single-precision with the
header-only approach that is followed since #147386
[libc][math] Refactor floor family to header-only (#182194)
Refactors the floor math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182193
Target Functions:
- floor
- floorbf16
- floorf
- floorf128
- floorf16
- floorl
[AMDGPU] Remove extra pipes from load-saddr-offset-imm.ll (#183874)
This test uses opt to run instcombin and then pipes that into llc which
has its output piped into FileCheck. Before this patch, the test also
piped in the source file into llc as well, which caused issues with a
downstream test executor that executes the lines in bash. However, these
extra pipes don't make sense anyways, so remove them.
[clang] NFC: remove unused / untested workaround in pack deduction
This snippet was part of what was introduced in 130cc445e46836b28defdce03b1adfdb16ddcf41
However, none of the existing tests require it, including the tests added in
that commit.
One of those tests had a FIXME which was fixed when we switched
frelaxed-template-template-args on by default as well.
[CUDA] Allow `extern __shared__` on non-array types
NVCC allows `extern __shared__` on any type, not just incomplete arrays.
This is commonly used in CUDA libraries like NCCL to overlay a struct on
dynamically-allocated shared memory:
extern __shared__ ncclShmemData ncclShmem;
Previously, Clang rejected this with a hard error and did not add
`CUDASharedAttr` to the VarDecl. This caused a cascade: `IdentifyTarget()`
classified the variable as host-side, and any device code referencing it
got a spurious "reference to __host__ variable in __device__ function"
error.
Downgrade the error to a default-ignored warning (`-Wcuda-extern-shared`)
and always add `CUDASharedAttr` so the variable is correctly classified as
device-side. The old `err_cuda_extern_shared` is preserved for potential
future use.
[AMDGPU][SIInsertWaitcnts] Move VCCZ workaround code out of the way (#182619)
This is a cleanup patch that moves the VCCZ specific workaround code
from `SIInsertWaitcnts::insertWaitcntInBlock()` to a separate class and
refactors it a bit to make it easier to read.
The end result is a simpler `insertWaitcntInBlock()`.
Should be NFC.
[CIR][NFC] Move some builtin tests to the CodeGenBuitins folder (#183607)
This moves a few tests that were created in the wrong location. Also
changes the names of some test files to maintain consistency.
Fix profile metadata propagation in InstCombine select folding
Propagate profile metadata when folding select instructions with logical AND/OR conditions and when canonicalizing SPF to intrinsics. This fixes profile verification failures in Transforms/InstCombine/select-and-or.ll.
[SLP]Fix operand reordering when estimating profitability of operands
Need to swap operand for a single instruction, not for the the same lane
of the first and second instruction in the list
[CMake] Propagate dependencies to OBJECT libraries in `add_llvm_library` (#183541)
Previously, transitively inherited calls to
`target_include_directories(foo SYSTEM ...)` were being squashed into a
flat list of includes, effectively stripping off `-isystem` and
unintentionally forwarding warnings from such dependencies.
To correctly propagate `SYSTEM` dependencies, use
`target_link_libraries` to forward the parent target's link dependencies
to the OBJECT library (similar to the `_static` flow below). Unlike a
flat `target_include_directories`, this lets CMake resolve transitive
SYSTEM include directories through the proper dependency chain.
Note that `target_link_libraries` on an OBJECT library propagates all
usage requirements, not just includes. This also brings in transitive
`INTERFACE_COMPILE_DEFINITIONS`, `INTERFACE_COMPILE_OPTIONS`, and
`INTERFACE_COMPILE_FEATURES`. This is arguably more correct, as the
OBJECT library compiles the same sources and should see the same flags.
[13 lines not shown]
[Hexagon] Define __HVX_IEEE_FP__ when -mhvx-ieee-fp is enabled (#183829)
Add a __HVX_IEEE_FP__ define when the compiler is invoked with
-mhvx-ieee-fp flag
[mlir][GPU] Add ValueBoundsOphinterface to gpu.subgroup_broadcast (#183848)
This commit adds an ValueBoundsOpInterface to gpu.subgroup_broadcast,
matching its integer range interface implementation, so that affine
analysis can peek through subgroup broadcast ops.
[CIR] Fix dominance problems with values defined in cleanup scopes (#183810)
We currently encounter dominance verification errors when a value is
defined inside a cleanup scope but used outside the scope. This occurs
when forceCleanup() is used to exit a cleanup scope while a variable is
holding a value that was created in the scope body. Classic codegen
solved this problem by passing a list of values to spill and reload to
forceCleanup(). This change implements that same solution for CIR.
I have also aligned the ScalarExprEmitter::VisitExprWithCleanups
implementation with that of classic codegen, eliminating an extra
lexical scope. This causes temporary allocas to be created at the next
higher existing lexical scope, but I think that's OK since they would be
hoisted there anyway by a later pass.
[cmake] Disable -Wdangling-pointer on GCC 12+ (#183593)
GCC 12 started warning on the RAII DAGUpdateListener pattern in
SelectionDAG.h (storing `this` in the constructor). It's a false
positive -- suppress it the same way we handle -Wno-dangling-reference
(GCC 13+) and -Wno-stringop-overread (GCC 11+).
[lldb] Change the way the shlib directory helper is set (#183637)
This PR changes the way we set the shlib directory helper. Instead of
setting it while initializing the Host plugin, we register it when
initializing the Python plugin. The motivation is that the current
approach is incompatible with the dynamically linked script
interpreters, as they will not have been loaded at the time the Host
plugin is initialized.
The downside of the new approach is that we set the helper after having
initialized the Host plugin, which theoretically introduces a small
window where someone could query the helper before it has been set.
Fortunately the window is pretty small and limited to when we're
initializing plugins, but it's less "pure" than what we had previously.
That said, I think it balances out with removing the plugin include.
[NFC] Fix use-after-free: track TargetLibraryAnalysis in BasicAAResult invalidation (#183852)
`BasicAAResult` holds a reference to `TargetLibraryInfo` but its
`invalidate()` function did not check `TargetLibraryAnalysis`. When the
pass manager destroyed and re-created `TLI` (e.g. during `CGSCC`
invalidation or `FAM.clear()`), `BasicAAResult` survived with a dangling
`TLI` reference.
This was exposed by #157495 which added `aliasErrno()`, the first code
path that dereferences `TLI` from `BasicAAResult` during the `CGSCC`
pipeline, causing a AV when compiling Rust's core library on Arm64
Windows.
This change adds `TargetLibraryAnalysis` to the invalidation check so
`BasicAAResult` is properly invalidated when its `TLI` reference becomes
stale.