[NFC][MC] Initialize all fields of DebugName::Parameters in default constructor (#202701)
Initialized both variables **Flags** and **NameLength** of
**DebugNameHeader** structure.
[X86] Record the enclosed register in X86DomainReassignment::buildClosure (#202534)
buildClosure recorded the seed register Reg in the function-wide
EnclosedEdges map on every worklist iteration instead of CurReg, the
register actually being added to the closure. EnclosedEdges therefore
only ever contained the seed of each closure.
The driver loop in runOnMachineFunction skips registers already present
in EnclosedEdges before starting a new closure. Because only seeds were
recorded, every non-seed member of an already-built closure looked like
a fresh seed, so a redundant closure was built for it and then
immediately discarded by the EnclosedInstrs cross-closure check. The
emitted code is unchanged; the pass just performed redundant work
proportional to closure size.
Key EnclosedEdges by CurReg so each enclosed register is recorded once.
This was found as part of @jlebar's X86 LLVM bug hunt / FuzzX effort:
[2 lines not shown]
[clang][bytecode] Add an on-by-default `CanFail` flag to opcodes (#203671)
We have several opcodes that can't fail, so add a flag to them
indicating that they always return `true` anyway.
This simplifies the generated code from e.g.
```c++
PRESERVE_NONE
static bool Interp_Activate(InterpState &S, CodePtr &PC) {
if (!Activate(S, PC))
return false;
#if USE_TAILCALLS
MUSTTAIL return InterpNext(S, PC);
#else
return true;
#endif
}
```
[12 lines not shown]
[CIR][AMDGPU] Adds lowering for amdgcn extended image sample/gather4 builtins (#201761)
Support for lowering of` __builtin_amdgcn_image_sample/gather4` for
AMDGPU builtins to clangIR.
Followed similar lowering from clang->llvmir:
`clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp`.
Upstreaming clangIR PR:
[llvm/clangir#2083](https://github.com/llvm/clangir/pull/2083)
clang/AMDGPU: Split out target ID flags in TranslateArgs. (#203750)
Change how xnack and sramecc are processed. Introduce
-mxnack/-mno-xnack and -msramecc/-mno-sramecc flags.
When the target is first parsed in TranslateArgs, synthesize
the appropriate flag for the toolchain. This avoids
special case feature string fixups in getAMDGPUTargetFeatures,
and also avoids an extra parse of the target ID.
In the future this will also simplify tracking these ABI
modifiers in a module flag.
As a side-effect, you can use these flags to override the
no specifier case with the flags. These do not fully replace
the target ID syntax, as there's no way to represent compiling
both modes for the same subtarget.
I didn't bother trying to forward these flags on the main command
line without being specified to the offload device, but I suppose
[2 lines not shown]
[libc++] Make the body of println(FILE*) dependent on the template parameter to avoid template instantiation (#200996)
Make the function parameter of the `std::print` call inside the
`std::println` overload taking `FILE*` dependent on the template
parameter to avoid eager instantiation.
[mlir][python] Fix segfault at interpreter shutdown with entered contexts
The thread-local context stack (`PyThreadContextEntry::getStack()`)
holds `nb::object` references to Python Context, Location, and
InsertionPoint objects. When a Context is entered via `__enter__` but
never exited before the interpreter shuts down, these references
cause a segfault during process teardown.
The crash sequence:
1. User calls `ctx.__enter__()`, pushing a frame onto the
`static thread_local vector<PyThreadContextEntry>`.
2. The script ends; CPython runs `Py_FinalizeEx()` which tears down
the interpreter (clears modules, destroys remaining objects).
3. `main()` returns.
4. The C runtime destroys static/thread_local storage. On the main
thread, thread_local variables have the same destruction timing
as static storage — they are destroyed *after* main() returns.
5. The vector destructor runs, and each `PyThreadContextEntry`'s
`nb::object` members call `Py_DECREF` — but the interpreter is
[8 lines not shown]
[mlir][python] Fix segfault at interpreter shutdown with entered contexts
The thread-local context stack (`PyThreadContextEntry::getStack()`)
holds `nb::object` references to Python Context, Location, and
InsertionPoint objects. When a Context is entered via `__enter__` but
never exited before the interpreter shuts down, these references
cause a segfault during process teardown.
The crash sequence:
1. User calls `ctx.__enter__()`, pushing a frame onto the
`static thread_local vector<PyThreadContextEntry>`.
2. The script ends; CPython runs `Py_FinalizeEx()` which tears down
the interpreter (clears modules, destroys remaining objects).
3. `main()` returns.
4. The C runtime destroys static/thread_local storage. On the main
thread, thread_local variables have the same destruction timing
as static storage — they are destroyed *after* main() returns.
5. The vector destructor runs, and each `PyThreadContextEntry`'s
`nb::object` members call `Py_DECREF` — but the interpreter is
[8 lines not shown]
[clang][bytecode] Overide constant context state in CallVar (#203747)
We do this for regular calls, so do it for variable calls as well. Also
remove two comments that don't have any meaning today anymore.