[SLP] Update test against const-folding (#202532)
223ef1f3 ([IRBuilder] ConstFold unary intrinsics, #200496) made a lot of
test updates to SLPVectorizer. The tests were written a long time ago,
and it is unclear what their intent was, but at least update the one
test to replace constants with arguments, where the intent is clear.
[AMDGPU] Clamp load_monitor scope to minimum SCOPE_SE (#198245)
The load_monitor instructions monitor L2 cache lines and therefore
require at least SCOPE_SE to ensure the L2 cache is hit. The current
memory model requires the user to ensure that the specified scope is
such that it results in at least SCOPE_SE, otherwise the behaviour is
undefined. Instead, we now clamp the emitted scope at a minimum of
SCOPE_SE, so that the undefined behaviour is converted into a
performance loss instead.
Assisted-By: Claude Opus 4.6
[StringMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R (#202103)
StringMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.
Switch to linear probing with Knuth TAOCP 6.4 Algorithm R deletion,
similar to DenseMap #200595.
erase now relocates the following entries to close the hole. StringMap
buckets are pointers to heap-allocated entries, so only the pointers
(and the parallel hash array) move. References and pointers to entries
remain valid, but iterators are invalidated.
Depends on #202237 and #202520
Aided by Claude Opus 4.8
[InstCombine] Fix infinite combine loop in evaluateInDifferentType (#202572)
The implementation assumes that all original uses inside visited
instructions would get removed as part of changing the type. However,
this is not true for uses in select conditions, as only the value
operands change type in that case. Bail out if we encounter uses in
select conditions to avoid this.
Fixes https://github.com/llvm/llvm-project/issues/173148.
[NewPM][AArch64][GlobalISel] Port AArch64PostLegalizerCombiner to NewPM (#194156)
Adds a standard porting.
Updates some (but not all) tests to verify the NewPM path is working.
[AMDGPU] Improve the description of asyncmark semantics
- The semantics of asyncmarks is now definded purely in terms of sequences,
without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording in
terms of asyncmark sequences.
[X86] Hygon Processors Initial enablement (#187622)
This patch adds initial support for several Hygon architectures.
The Hygon architectures include:
- c86-4g-m4
- c86-4g-m6
- c86-4g-m7
This patch includes:
- Added Hygon architectures CPU targets recognition in Clang and LLVM
- Added Hygon architectures to target parser and host CPU detection
- Updated compiler-rt CPU model detection for Hygon architectures
- Added Hygon architectures to various optimizer tests
- Added scheduler models for Hygon architectures CPU targets
Revert "[HLSL] Set visibility of cbuffer global variables to internal" (#202538)
Reverts llvm/llvm-project#200312
Breaks several buildbots, e.g.,
https://lab.llvm.org/buildbot/#/builders/203/builds/48531
Co-authored-by: Nikolas Klauser <nikolasklauser at berlin.de>
[mlir][bufferization] Drop TensorLikeType::getBufferType() (#201350)
Replace TensorLikeType::getBufferType() with
options.unknownTypeConverterFn() hook. Make the hook work with
tensor-like and buffer-like types (instead of builtins) to maintain the
same behaviour at the API boundary level and still allow user types to
be properly supported.
Historically, an attempt to support user types within the one-shot
bufferization framework was made. As part of it,
TensorLikeType::getBufferType() was introduced to allow user-provided
types to customize bufferization. However, the whole affair proved to be
overly complex: there is an interface with customization points for
user-provided tensors, and options-based (not sufficient) implementation
for builtin tensors. On top of this, there was always a
function-specific hook to customize function-level behaviour further. As
a result of this, users would need to implement two different mechanisms
on their end: interface implementation + option hooks.
[19 lines not shown]
[clang][bytecode] Fix shifting by negative IntAP values (#202505)
The negation of a negative value didn't necessarily result in a positive
value. Fix that by giving it one more bit of precision.
[OFFLOAD][L0] Add wait events for AsyncQueue memFill (#202287)
Fix an issue where memFill operations were not chained properly with respect prior operations.
[Loads] Migrate isDereferenceable APIs to SimplifyQuery (#202553)
These take the usual set of analysis parameters, so we can encapsulate
them using SimplifyQuery.
[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>` (#201563)
In HIP, `constexpr` functions are treated as both `__host__` and
`__device__`.
A new version of the MS STL shipped with the build tools version
14.51.36231 has `constexpr` definitions for some `cmath` functions when
the
compiler in use is Clang (this gets worse when C++23 is in use).
These definitions conflict with the `__device__` declarations we provide
in the header wrappers.
There is a workaround for this: We do not mark `constexpr`
functions [_that are defined in a system
header_](https://github.com/llvm/llvm-project/blob/03127a03860b9d8cb440fe8f51c00647f45eb8be/clang/lib/Sema/SemaCUDA.cpp#L877)
as
`__host__` and `__device__` if there is a previous `__device__`
declaration.
[14 lines not shown]
[libc++][vector] Test `[[nodiscard]]` applied to `vector::iterator` (#202262)
Adds test coverage.
`[[nodicard]]` applied in:
- #198489
- #198492
Towards #172124
Co-authored-by: Hristo Hristov <zingam at outlook.com>
[clang][bytecode] Remove `InterpFrame::ThisPointerOffset` (#202322)
Replace it with a `uint8_t` representing some bool flags about the
function. This reduces the size of a frame from 88 to 80 bytes.
[SCEVExpander] Don't expand a UDiv with a possibly-poison divisor (#202378)
SCEVExpander::isSafeToExpand only check divisor isKnownNonZero, which
ignore the possibility of poison. For the following divisor:
```
%ct = call i32 @llvm.cttz.i32(i32 %x, i1 true)
%divisor = add i32 %ct, 1
...
%rem = urem i32 1, %divisor
```
The urem may be hoisted unsafely.
Fix by also check divisor isGuaranteedNotToBePoison.
Fixes https://github.com/llvm/llvm-project/issues/202028
[RFC][AMDGPU] Add BARRIER address space
Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.
These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.
The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
[NFC][AMDGPU] Generalize some LDS MemoryUtils
In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.
I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
[NFCI][clang] Allow overriding any global variable address space
Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
[mlir][Interfaces] Document completeness requirement of `RegionBranchOpInterface` (#202018)
Document that interface implementations must report all possible control
flow edges. Failure to report a possible edge may break
analyses/transformations/APIs such as
`RegionBranchOpInterface::isRepetitiveRegion`.