[Support] Avoid misguided FreeBSD hack (#177508)
FreeBSD doesn't do anything wrong here, it just happens to define and
use a struct thread in its own headers. The problems arise because here
in LLVM we have using namespace llvm prior to including system headers,
which is bad practice for precisely this reason. If we instead play by
the rules and defer our using namespace llvm until after we've included
the system headers then we no longer need this hack.
This hack is particularly problematic by being conditional on
__FreeBSD__ as of 9093ba9f7ee5 ("[Support] Include Support/thread.h
before api implementations (#111175)"), since on non-FreeBSD
Threading.inc can reference anything in Support/thread.h, only causing
errors on FreeBSD, which is precisely what happened in 64be34c562a2
("Enable using threads on z/OS (#171847)").
By deferring the using namespace llvm until after Threading.inc is
included there may be build failures introduced on untested platforms
due to needing to replace unqualified identifiers with qualified ones by
prepending llvm::.
AMDGPU: Ignore type legality in isFAbsFree (#177630)
This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
[HIP] Pass HIP library directly and refactor (#176019)
Summary:
Currently we pass `-L` and `-l` to get the HIP library. Because we are
attached to a single HIP installation it's far better to pass it by
filename. This is because the `-L` could be out of order with other user
libraries and those could override it. If someone uses HIP with a
specific ROCm installation they most likely want that library, otherwise
incompatibilities can occur. This is still overridable with command line
flags if users want to pass a different one for some reason.
This PR also refactors the handling to be more generic for future
additions.
[profcheck] Fix profle metatdata propagation for Large Integer operations (#175862)
This PR improves the propagation of profile metadata within the
ExpandIRInsts pass. When lowering large integer division operations, the
pass now ensures that branch weights are correctly attached to the
generated control flow, preventing the loss of profile data during IR
expansion.
This PR improves signed and unsigned division/remainder for non-native
bit widths (e.g., `sdiv/udiv i129`, `srem/urem i129`) and implemented
Heuristic-Based Branch Weights labeling using established heuristics for
edge cases e.g., `Division-by-zero guards` and `Magnitude comparisons
between dividends and divisors`.
It also adds detailed comments within the expansion logic to explain the
rationale behind specific branch weight choices and the underlying
mathematical invariants.
Please refer to the implementation details in the source code for the
[2 lines not shown]
[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections (#171476)
This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
AMDGPU: Remove dead code configuring f16 is_fpclass (#177626)
isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts` (#177621)
I'm not sure why the predicate has a `!`, and more surprisingly,
removing it doesn't change anything.
[clang][test] Fix builtin-rotate.c failure on ARM32 (#177290)
Replace unsigned __int128 with unsigned _BitInt(128) since __int128 is
not supported on ARM 32-bit targets.
Fixes https://lab.llvm.org/buildbot/#/builders/79/builds/2754
[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934)
Resolves #170500.
Implemented mergeInfo static helper to return common
TTI::OperandValueInfo data .
Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost
calculation.
AMDGPU: Ignore type legality in isFAbsFree
This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
[RISCV][llvm-objdump] Support --symbolize-operands (#166656)
This adds support for `--symbolize-operands`, so that local references
are turned back into labels by objdump, which makes it easier to tell
what is going on with a linked object.
When using `--symbolize-operands`, branch target addresses are not
printed, only the referenced symbol is printed, and the address is
elided:
```
# Without --symbolize-operands
0: 04a05263 blez a0, 0x44 <.text+0x44>
...
40: fd1ff06f j 0x10 <.text+0x10>
44: 00000613 li a2, 0x0
# With --symbolize-operands
0: 04a05263 blez a0, <L3>
[4 lines not shown]
AMDGPU: Remove dead code configuring f16 is_fpclass
isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts`
I'm not sure why the predicate has a `!`, and more surprisingly, removing it doesn't change anything.
[AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs
This patch optimizes the insertion of s_wait_xcnt instruction for
sequences of atomic read-modify-write (RMW) operations in the
SIInsertWaitcnts pass. The Memory Legalizer conservatively inserts a
soft xcnt instruction before each atomic RMW operation as part of PR
168852, which is correct given the nature of atomic operations.
However, for back-to-back atomic RMWs, only the first s_wait_xcnt is
necessary for better runtime performance. This patch tracks atomic
RMW blocks within each basic block and removes redundant soft xcnt
instructions, keeping only the first wait in each sequence. An atomic
RMW block continues through subsequent atomic RMWs and non-memory
instructions (e.g., ALU operations) but is broken by CU-scoped memory
operations, atomic stores, or basic block boundaries.
[SPIR-V] Implement sample and sample_clamp intrinsics for HLSL resources (#177234)
This patch implements the `sample` and `sample_clamp` intrinsics for
HLSL
resources in the SPIR-V backend. It adds the necessary intrinsic
definitions
in `IntrinsicsDirectX.td` and `IntrinsicsSPIRV.td`, and implements the
instruction selection logic in `SPIRVInstructionSelector.cpp`.
Key changes:
- Added `int_dx_resource_sample` and `int_dx_resource_sample_clamp`
intrinsics.
- Added `int_spv_resource_sample` and `int_spv_resource_sample_clamp`
intrinsics.
- Implemented `selectSampleIntrinsic` to handle
`OpImageSampleImplicitLod` generation.
- Added `ResourceDimension` enum in `DXILABI.h` and `HLSLResource.h`.
- Added a new test case
`llvm/test/CodeGen/SPIRV/hlsl-resources/Sample.ll` to verify the
implementation.