[AMDGPU][SIInsertWaitcnts][NFC] Replace Wait.combined() with simple assignment (#179142)
Wait is initialized with all ~0s and by the time it reaches the updated
line it still holds the same value. So Wait.combined(AllZeroWait) is
effectively combining all ~0s with AllZeroWait and given that combined()
returns the min() of the two it should always return AllZeroWait.
So this patch replaces the assignment with `= AllZeroWait` to make it
easier to read.
[HLSL] Stop redeclaration of resources types in HLSLExternalSemaSource (#178808)
During the declaration of HLSL resource types, we add a delay to the
completion of such declarations to happen during
`HLSLExternalSemaSource::CompleteType`. Such might cause the parser to
declare the resources twice, like here
https://hlsl.godbolt.org/z/aT4b1Goeb, triggering an assert.
This patch fixes this issue by removing the resources out of the
completions declaration list once they are declared.
Fix: #153619
Fix profile metadata propagation for umax in InstCombine
Synthesize branch weights for select instructions created from umax intrinsics to satisfy profile verification requirements.
[HLSL][Codegen][NFC] Simplify intrinsic picking (#179300)
A pattern developed to do WaveActive intrinsics in their own helpers
because some wave intrinsics on spirv lack a signed\unsigned variant.
In the case of Min and Max the variants exist on both DirectX and SPIRV.
That means we can do away with a specialized helper.
[libc++] Implement a type-safe iterator for optional (#154239)
Create a new `__capacity_aware_iterator` iterator type which wraps an
existing iterator, takes its container as a template parameter, and
encodes the maximum amount of elements the container can hold. The main
objective is to prevent iterator mixups between different containers
(e.g. `vector`).
[lld-macho] Fix branch relocations with addends to target actual function (#177430)
When a branch relocation has a non-zero addend (e.g., `bl _func+16`),
the linker was incorrectly computing `stub_address + addend` instead of
`function_address + addend`. This caused the branch to land in the wrong
location (past the stub section) rather than at the intended interior
point of the function.
The fix checks for non-zero addends on branch relocations and uses the
actual symbol VA in those cases. This makes sense semantically—branching
to an interior offset implies reliance on the original function's
layout, which an interposed replacement wouldn't preserve anyway.
Added test `arm64-branch-addend-stubs.s` that verifies the correct
behavior using `-flat_namespace` (which makes local symbols interposable
and thus routed through stubs).
[Assisted-by](https://t.ly/Dkjjk): Cursor IDE + claude-opus-4.5-high
[AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics
Accurately represent both the load and the store part of those
intrinsics.
The test changes seem to be mostly fairly insignificant changes caused by
subtly different scheduler behavior.
commit-id:0269189c
[CodeGen] Refactor targets to override the new getTgtMemIntrinsic overload (NFC)
This is a fairly mechanical change. Instead of returning true/false,
we either keep the Infos vector empty or push one entry.
commit-id:c7770af6
[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC)
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.
Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.
GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.
Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
commit-id:b4a924aa
[AutoUpgrade] Prevent deletion of call if uses still exist (#177606)
The calls to the llvm.x86.sse2.pshuflw are being deleted due to invalid
vector type, even though uses still exist. Adding checks to prevent
deletion of call when uses still exist or even if eraseFromParent() is
called ensuring it is called after replaceAllUsesWith().
Fixes: #176674
Reapply "[lldb] Add FP conversion instructions to IR interpreter" (#179022)
This reapplies #175292 with the fixed test. The original test used
integer types with different bit widths on different platforms.
----- Original message:
This allows expressions that use these conversions to be executed when
JIT is not available.
[RISCV] Split RISCVLSUMOP tablegen class for type safety. NFC
Since loads and stores have overlapping encodings we should have
different classes to make sure they stay separate.
bar syntax and only print input if different from output.
Breaks update_test_checks Function Attrs comment check in the rare
case where the modes mismatch.
IR: Promote "denormal-fp-math" to a first class attribute
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
deormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
[18 lines not shown]
[clang] __builtin_os_log_format has incorrect PrintfFormat Attribute argument (#178320)
The format string is the 2nd argument of __builtin_os_log_format, thus
has index 1 instead of 0 in 0-based indexing.
The incorrect format attribute argument causes false positive
-Wunsafe-buffer-usage-in-format-attr-call warnings.
rdar://169043228