[InstCombine] Allow simplifying FP selects of cmpxchg instructions. (#181977)
We already simplify selects that test the flag returned by a cmpxchg and
select between the value the cmpxchg loaded and the compare operand.
This patch extends the fold to FP (and vector) compare-exchange
operations, where the compare operand and loaded value are bitcast.
.github: support all stable branches
If this eventually poses a problem for unsupported branches we can fix
them directly.
Sponsored by: Innovate UK
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D53838
stddef.h: centralize definition of offsetof()
Move to sys/_offsetof.h and use __builtin_offsetof() instead of
__offsetof to avoid reintroducing sys/cdefs.h polution in stddef.h.
This has the side effect of allowing sys/stddef.h to be included after
stddef.h which can happen in compatability headers.
Effort: CHERI upstreaming
Sponsored by: DARPA, AFRL
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D55307
stddef.h: add ptraddr_t
I'd missed that stddef.h is standalone and isn't a copy of sys/stddef.h
in my initial merge.
Effort: CHERI upstreaming
Reviewed by: kib
Sponsored by: Innovate UK
Fixes: dca634d1544b ("new type: ptraddr_t")
Differential Revision: https://reviews.freebsd.org/D55305
[CodeGen] Introduce MIR-level target-independent rematerialization helper (#177080)
This introduces a `Rematerializer` class that identifies register
rematerialization opportunities within a machine function and provides
an API to easily perform those rematerializations with a high level of
control. Its key feature is its ability to model relationships between
rematerializable registers and rematerialize arbitrarily complex groups
of registers at once to specific locations. The class comment describes
the underlying model in details.
This includes unit tests for the class to both verify its correct
behavior and showcase its current rematerialization capabilities.
This hopefully can be a step toward addressing long-standing
rematerialization limitations in LLVM backends. In the future, the goal
is to pair this support with generic or target-dependent strategies for
picking the best rematerialization opportunities to perform to achieve
some kind of objective (e.g., a specific register pressure target in
scheduling regions). As a concrete example, I intend to use this in the
AMDGPU scheduler to help in reducing spilling and/or increasing
occupancy in kernels.
[AMDGPUEmitPrintf] Use CreatePtrDiff() (#182283)
Use CreatePtrDiff() to emit the pointer subtraction, which will use
ptrtoaddr instead of ptrtoint.
Add a conservative cast to i64 as the return value of CreatePtrDiff is
no longer guaranteed to be a i64.
[NVPTXCtorDtorLowering] Removing unnecessary pointer arithmetic (#182269)
This code was computing `begin + ((end - begin) exact/ 8) * 8`, which is
a very complicated way to spell `end`.
[clang][DebugInfo] Add virtuality call-site target information in DWARF. (#167666)
Given the test case:
struct CBase {
virtual void foo();
};
void bar(CBase *Base) {
Base->foo();
}
and using '-emit-call-site-info' with llc, the following DWARF
is produced for the indirect call 'Base->foo()':
1$: DW_TAG_structure_type "CBase"
...
2$: DW_TAG_subprogram "foo"
...
[18 lines not shown]
[mlir][tosa] Refactor convolution infer return type (#178869)
Lots of logic was repeated for Conv2D, Conv3D and Conv2DBlockScaled ops.
This commit factors out common logic to reduce code duplication.
In doing so, a bug in calculating the bias shape was also fixed. Since
DepthwiseConv2D and TransposeConv2D were fixed independently, this
commit fixes #175765.