[InstCombine] Allow simplifying FP selects of cmpxchg instructions. (#181977)
We already simplify selects that test the flag returned by a cmpxchg and
select between the value the cmpxchg loaded and the compare operand.
This patch extends the fold to FP (and vector) compare-exchange
operations, where the compare operand and loaded value are bitcast.
[CodeGen] Introduce MIR-level target-independent rematerialization helper (#177080)
This introduces a `Rematerializer` class that identifies register
rematerialization opportunities within a machine function and provides
an API to easily perform those rematerializations with a high level of
control. Its key feature is its ability to model relationships between
rematerializable registers and rematerialize arbitrarily complex groups
of registers at once to specific locations. The class comment describes
the underlying model in details.
This includes unit tests for the class to both verify its correct
behavior and showcase its current rematerialization capabilities.
This hopefully can be a step toward addressing long-standing
rematerialization limitations in LLVM backends. In the future, the goal
is to pair this support with generic or target-dependent strategies for
picking the best rematerialization opportunities to perform to achieve
some kind of objective (e.g., a specific register pressure target in
scheduling regions). As a concrete example, I intend to use this in the
AMDGPU scheduler to help in reducing spilling and/or increasing
occupancy in kernels.
[AMDGPUEmitPrintf] Use CreatePtrDiff() (#182283)
Use CreatePtrDiff() to emit the pointer subtraction, which will use
ptrtoaddr instead of ptrtoint.
Add a conservative cast to i64 as the return value of CreatePtrDiff is
no longer guaranteed to be a i64.
[NVPTXCtorDtorLowering] Removing unnecessary pointer arithmetic (#182269)
This code was computing `begin + ((end - begin) exact/ 8) * 8`, which is
a very complicated way to spell `end`.
[clang][DebugInfo] Add virtuality call-site target information in DWARF. (#167666)
Given the test case:
struct CBase {
virtual void foo();
};
void bar(CBase *Base) {
Base->foo();
}
and using '-emit-call-site-info' with llc, the following DWARF
is produced for the indirect call 'Base->foo()':
1$: DW_TAG_structure_type "CBase"
...
2$: DW_TAG_subprogram "foo"
...
[18 lines not shown]
[mlir][tosa] Refactor convolution infer return type (#178869)
Lots of logic was repeated for Conv2D, Conv3D and Conv2DBlockScaled ops.
This commit factors out common logic to reduce code duplication.
In doing so, a bug in calculating the bias shape was also fixed. Since
DepthwiseConv2D and TransposeConv2D were fixed independently, this
commit fixes #175765.
[X86] knownbits-vpmadd52.ll - replace extended unicode character with regular ascii (#182278)
Stops update_llc_test_checks.py from complaining / unnecessarily changing the file
[flang][OpenMP] Initial support for DEPTH clause
The semantic checks do not check any conditions on the associated loop
nest (such as actual depth or whether it is a perfect nest).
Lowering will emit a not-implemented-yet message.
[libclc] Completely remove ENABLE_RUNTIME_SUBNORMAL option (#182125)
Summary:
This isn't really used and this simplifies the code. I could go deeper
to remove this content entirely as they all return `false` but I figured
this was an easier change to do first.
---------
Co-authored-by: Wenju He <wenju.he at intel.com>