[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.
Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.
Made-with: Cursor
Revert "move cmp modes into td and update users"
This reverts commit 5950d9fcd6b2053e71929972b89cc983ce2cccaa, restoring
the hand-written PTXCmpMode enum in NVPTX.h and the switch-based
implementations of invertIntegerCmpMode, invertScalarFloatCmpMode,
NVPTXInstPrinter::printCmpMode and NVPTXDAGToDAGISel::getPTXCmpMode.
The TableGen GenericTable migration consolidated the comparison-mode
data but at the cost of an extra .inc file, an ODR-driven split between
NVPTXCodeGen and NVPTXDesc, and indirection through a generated lookup
where the local switches were already self-contained. Reverting until a
broader cleanup of NVPTX::PTXCmpMode is taken on as part of a larger
refactor.
Co-authored-by: Cursor <cursoragent at cursor.com>
Revert "move cmp modes into td and update users"
This reverts commit 5950d9fcd6b2053e71929972b89cc983ce2cccaa, restoring
the hand-written PTXCmpMode enum in NVPTX.h and the switch-based
implementations of invertIntegerCmpMode, invertScalarFloatCmpMode,
NVPTXInstPrinter::printCmpMode and NVPTXDAGToDAGISel::getPTXCmpMode.
The TableGen GenericTable migration consolidated the comparison-mode
data but at the cost of an extra .inc file, an ODR-driven split between
NVPTXCodeGen and NVPTXDesc, and indirection through a generated lookup
where the local switches were already self-contained. Reverting until a
broader cleanup of NVPTX::PTXCmpMode is taken on as part of a larger
refactor.
Co-authored-by: Cursor <cursoragent at cursor.com>
[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.
Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.
Made-with: Cursor
[RISCV] Fix inconsistent usage of ValVT and LocVT in CC_RISCV_Impl. NFCI (#195368)
I think all of our checks should be against LocVT. If LocVT is different
than ValVT, that means the location has already been changed and we
should be acting on that changed type. For the most part, I don't think
that happens for RISC-V.
[NFC][SSAF][EntityPointerLevel] Move EntityID-to-EPL map serialization to the EPL module (#193092)
Factor out the serialization of `std::map<EntityId,
EntityPointerLevelSet>` to `EntityPointerLevelFormat.h`.
---------
Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
Co-authored-by: Jan Korous <jkorous at apple.com>
Merge tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Fix devm_alloc_workqueue() passing a va_list as a positional arg to
the variadic alloc_workqueue() macro, which garbled wq->name and
skipped lockdep init on the devm path. Fold both noprof entry points
onto a va_list helper.
Also, annotate it using __printf(1, 0)
* tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
workqueue: fix devm_alloc_workqueue() va_list misuse
fix: wlan: set ifq maxlen before ether_ifattach
Include ifq_var.h and call ifq_set_maxlen() to initialize the send
queue length before attaching. Without this the default queue length
is 0, which can cause packet drops on busy VAPs.
[CIR] Use SymbolUserMap in applyReplacements to fix quadratic behavior (#195883)
applyReplacements() previously called replaceAllSymbolUses() for each
replacement, which walks the entire module every time — O(R × M) for R
replacements and M operations. For C++ programs with heavy template
instantiation (e.g., Eigen), this quadratic behavior dominated compile
time.
Replace the per-replacement module walk with a single SymbolUserMap
built once (O(M)), then use replaceAllUsesWith() which scopes each
replacement to only the actual user operations. The debug-only
verifyPointerTypeArgs helper is also updated to reuse the map.
Measured on Eigen's basicstuff.cpp (356 lines, heavy template
instantiation): compile time dropped from 20m29s to 1m2s (20x speedup).
CIR-to-classic ratio improved from 117x to 7.2x.
Made with [Cursor](https://cursor.com)
Co-authored-by: Cursor <cursoragent at cursor.com>
[CIR] Add pass_object_size hidden parameter support (#191482)
Emit the hidden `i64` parameter that
`__attribute__((pass_object_size(N)))` requires. At call sites the size
is constant-folded when possible (e.g. `&a` → 4) and falls back to
`cir.objsize` / `@llvm.objectsize` otherwise (e.g. VLAs).
On the callee side, `buildFunctionArgList` now creates an
`ImplicitParamDecl` for each annotated parameter so that
`emitBuiltinObjectSize` can load the passed size instead of re-computing
it.
This also fixes the `llvm_unreachable("NYI")` in
`RequiredArgs::getFromProtoWithExtraSlots` and the `errorNYI` in
`appendParameterTypes` / `arrangeFreeFunctionLikeCall` that fired
whenever `hasExtParameterInfos()` was true.
New test: `clang/test/CIR/CodeGen/pass-object-size.c` (CIR / LLVM /
OGCG).
[5 lines not shown]
Merge tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- During v6.19, cgroup task unlink was moved from do_exit() to after the
final task switch to satisfy a controller invariant. That left the kernel
seeing tasks past exit_signals() longer than userspace expected, and
several v7.0 follow-ups tried to bridge the gap by making rmdir wait for
the kernel side. None held up.
The latest is an A-A deadlock when rmdir is invoked by the reaper of
zombies whose pidns teardown the rmdir itself is waiting on, which
points at the synchronizing approach being fundamentally wrong.
Take a different approach: drop the wait, leave rmdir's user-visible
side returning as soon as cgroup.procs is empty, and defer the css
percpu_ref kill that drives ->css_offline() until the cgroup is fully
depopulated.
[16 lines not shown]
driver: add Apple SMC driver ported from FreeBSD asmc
Port of the FreeBSD asmc(4) driver to DragonFlyBSD with the following
adaptations:
- kmalloc/kfree instead of malloc/free
- ksnprintf instead of snprintf
- lockmgr() 2-arg form (no whandle)
- taskqueue_start_threads() with ncpu=-1 arg
- sys/bus_resource.h instead of machine/resource.h
- contrib/dev/acpica paths for acpi.h/accommon.h
- u_long instead of rman_res_t
Driver layout: sys/dev/apple/smc/
smc.c - probe/attach/detach, module glue
smc_io.c - ISA port I/O backend
smc_mmio.c - MMIO/T2 backend
smc_sysctl.c - sysctl handlers for fans, temps, SMS, light sensors
smc.h - shared types, macros, prototypes