[fir] Lower to llvm int constants with appropriately typed int attrs (#195861)
When we lower fir operations to llvm int constants, we used to always
generate `llvm.mlir.constant`s with a i64 integer attribute regardless
of the width of the constant type. This made some llvm dialect level
folding hit assertions in some cases.
Fix this by generating the appropriately typed integer attributes
matching the constant type.
[ELF,test] Cover --why-live mark() paths in MarkLive (#196007)
Add cases that exercise the non-parallel mark() loop reached only when
TrackWhyLive is true: cNamedSections.lookup in resolveReloc
(__libc_atexit
via __start_/__stop_), the nextInSectionGroup fallthrough, and the
.eh_frame personality CIE relocation processed by scanEhFrameSection.
MarkLive.cpp coverage on check-lld-elf goes 90.88% -> 92.18% regions,
84.15% -> 86.04% branches.
[MLIR][XeGPU] Clean up the temporary layout usage in XeGPU test (#195739)
This PR cleans up the XeGPU test to remove the temporary layout usage.
All distribution and unrolling tests now don't use temporary layout from
the operation and TensorDescriptor, since the recovery process won't
honor the temporary layout and only depends on the anchor layout.
It also refactors the layout function implementation by removing
recursive loops in getDistributeLayoutAttr(), and fixes two issues
surfaced from the test clean up: adding layout recovery support for
Extract/Insert op and tensor descriptor type.
[LoopFusion] Document LoopFusion Pass (#192926)
The LoopFusion pass, currently disabled by default, lacks documentation. This patch is the first attempt to document the flow and current limitations.
Assisted by : Claude Opus 4.6
[LiveDebugValues] Avoid SmallSet for dead registers (#195841)
transferRegisterDef builds a list of dead registers and removes open ranges for
debug locations that use those registers. This list used a SmallSet, so each
insert also does uniquing in the hot per-instruction path. This showed up under
SmallSet<Register, 32>::insertImpl on profiles of sqlite on aarch64-O0-g.
Using a SmallVector instead and uniquing in collectIDsForRegs improves
compile-time.
CTMark geomean:
- stage1-O0-g: -0.35%
- stage1-aarch64-O0-g: -0.72%
- stage2-O0-g: -0.27%
https://llvm-compile-time-tracker.com/compare.php?from=c9d713aa48a714d20b8502d06b9feb24829e6f22&to=6c0d4aafb9e325259c88577d148ac13c643ea993&stat=instructions%3Au
Assisted-by: codex
[RegAlloc] consider urgent evict in evictInterference (#192631)
This assertion causes a crash in programs with high register pressure
when inline assembly is used.
```
assert((ExtraInfo->getCascade(Intf->reg()) < Cascade ||
VirtReg.isSpillable() < Intf->isSpillable()) &&
"Cannot decrease cascade number, illegal eviction");
```
It should account for the case where an urgent eviction may result in
cascade being less than `ExtraInfo->getCascade(Intf->reg())`
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[CIR][NFC] Upstream mem2reg.cir from incubator (#194517)
Upstream `mem2reg.cir` from incubator.
Check that stack slots are promoted away after CFG flattening.
Partially addresses #156747.
[RISCV] Rename and invert UseGPRForF16_F32/UseGPRForF16_F32. (#195971)
Rename to AllowFPR. We used to set these flags when we ran out of FPRs,
but we haven't for a while. I think rephrasing as allow FPR is a bit
clearer.
[RFC][Docs] Clarify brace omission for single-line bodies
Update the Coding Standards brace guidance to emphasize that braces should be
omitted only for simple bodies that do not wrap across multiple physical lines.
[SPIRV] Dummy implementation of the `returnaddress` and `frameaddress` intrinsics (#195976)
The SPIR-V specification doesn't define any operations for the
return and frame address. The valid implementation in this case is to
produce a null pointer.
Assisted-by: Claude Opus 4.6 <noreply at anthropic.com>
[SPIRV] Add support for SPV_KHR_abort extension (#193037)
This commit adds support for the SPV_KHR_abort extension in the SPIRV
backend. The extension allows shaders to abort execution with a custom
message.
Assisted-by: Claude Opus 4.7 <noreply at anthropic.com>
---------
Co-authored-by: Marcos Maronas <mmaronas at amd.com>
[clang][ssaf] Add `clang-ssaf-analyzer` (#188881)
This patch introduces `clang-ssaf-analyzer`, a new SSAF tool that runs whole-program analyses over an `LUSummary` and writes the resulting `WPASuite` to an output file.
[PseudoProbe] Include function hash in descriptor COMDAT key (#190296)
The .pseudo_probe_desc section uses COMDAT to deduplicate descriptors
for the same function across translation units. On COFF, the COMDAT key
is uniquely determined by the function name. The COMDAT selection type
is EXACT_MATCH, which requires byte-identical content. This holds for
applications that strictly follow C/C++ ODR rules.
Unfortunately, we consistently observe .pseudo_probe_desc COMDAT
duplicate symbol errors on Windows (see also #177540). Most of them are
due to hash mismatches, meaning two non-internal functions with the same
name but different bodies — a violation of ODR rules. Some of these
functions are generated by the compiler (e.g., TU-local optimizations
that alter the CFG of a linkonce_odr function), and some are caused by
source code issues (e.g., different preprocessor settings or
optimization pragmas across TUs).
It is hard to fix all of them, but they seriously affect the user
experience of using pseudo probe on Windows due to the endless COFF
[12 lines not shown]
[MLIR][XeGPU] Unroll Dpasmx Op (#195179)
This PR adds support to unroll Dpasmx.
Assisted by Claude
---------
Co-authored-by: Claude Sonnet 4.5 <noreply at anthropic.com>