[LV][RISCV] Add explicit LMUL controls via computeFeasibleMaxVF
Add components of maxVF and its support for scalable
vectorization. The default for unspecified RISCV is
LMUL=4 with this change, so some tests will have
the flag that controls max LMUL to extend to LMUL=8
when the request is made.
[flang] Fix inline transfer for unsigned integer types (#193570)
Fix a crash when transfer is used with Fortran unsigned types. The
arith.bitcast op requires signless integer or float operands, but the
inline optimization was applying it to unsigned integer types (ui32),
causing a verification failure. Changed the guard from
mlir::isa<mlir::IntegerType> to isSignlessIntOrFloat() so unsigned
integer transfers fall through to the address-level fir.convert path
instead.
This is to fix a regression reported here:
https://github.com/llvm/llvm-project/pull/191589#issuecomment-4298846795
[HLSL] Disallow `volatile` keyword (#193322)
This PR disallows the `volatile` keyword in HLSL.
The keyword is meaningless in this language, and it comes from the C++
foundation that HLSL stands on.
Fixes https://github.com/llvm/llvm-project/issues/192559
It is arguably in the category of this scenario:
https://github.com/llvm/wg-hlsl/issues/300
Assisted by: Github Copilot
[CIR] Fix a dangling reference to a replaced global (#193561)
We had a bug in CIR where we were replacing a global value that was
being used to track the insertion location of the last global created.
When we erased this value while still holding a reference to it, it
caused subsequent globals to be created in a detatched state, which in
turn led to crashes when lowering uses of those globals to the LLVM
dialect.
This change updates `lastGlobalOp` when the global it is referencing is
replaced.
Assisted-by: Cursor / claude-4.7-opus-high
[BOLT] Fix null pointer dereference in DWP processing with split DWARF (#191474)
Fix two null pointer dereferences in BOLT's DWP processing path that
cause SIGSEGV in worker threads when -update-debug-sections is used with
a co-located .dwp file.
1. getSliceData() in updateDebugData() dereferences the result of
getContribution() without checking for null. getContribution() returns
nullptr when the requested section kind (e.g. DW_SECT_LINE) is not
present as a column in the DWP CU index. When BOLT processes a DWP where
certain section kinds are absent from the index, every worker thread
that hits this path crashes simultaneously.
2. processSplitCU() dereferences getUnitDIEbyUnit() without checking for
null. If buildDWOUnit() fails for a CU, the returned DIE* is null and
the dereference crashes.
Crash signature from dmesg:
```
[11 lines not shown]
[lldb] Eliminate linear scan in SetSectionLoadAddress (#193560)
This PR changes SectionLoadList::SetSectionLoadAddress to avoid O(n)
linear scan when removing stale reverse-map entries. While I was there,
I did some gardening to improve the function's readability.
The change is not NFC as I also fixed a pre-existing bug where the stale
addr-to-sect entry was not removed when the new load address already
existed in the map (the ats_pos != end() branch).
[llvm-objdump][offload] Fix offload bundle decompressing (#192729)
The flag to enable decompressing offload bundles was not passed along to
the OffloadBundleFatBin class. The --offloading option was not
decompressing compressed objects.
[HLSL] Reuse temporaries of aggregate types in list initialization (#191605)
When aggregate types appear as _prvalues_ in HLSL initializer lists, convert them to _xvalues_ and wrap them in `OpaqueValueExpr` so the temporaries can be reused across all element accesses. This allows code generation to avoid emitting redundant copies of the same aggregate temporary.
This should be especially helpful once support for constructors on user-defined structs is removed, and initializer lists will be the primary mechanism for struct initialization.
A similar optimization may also be applicable to vector and matrix types. However, their current code generation path does not yet support handling `OpaqueValueExpr` in initializer lists.
[PreISelIntrinsicLowering] Expand all unary elementwise intrinsics (#193552)
This expands the set of scalable typed unary intrinsics which can be
expanded to match the entire set of builtin element wise routines
provided by clang. Support for the binary ones will follow in a separate
patch.
Note that the lowering quality is terrible, particularly when the libc
entry for the scalar routine doesn't preserve vector registers (e.g.
RISC-V default). This is a functional fix to avoid crashes when trying
to codegen these, nothing more.
Written by Claude, sanity checked by me.
[Clang][AST] Introduce `ExplicitInstantiationDecl` to preserve source info and fix diagnostic locations (#191658)
This is the initial fix of
https://github.com/llvm/llvm-project/issues/191442. Following the
discussion here
https://github.com/llvm/llvm-project/issues/115418#issuecomment-2467017012.
- Fix #21040
- Fix #52659
- Fix #115418
- Fix #14230
- Fix #21133
### Description
This PR introduces a new AST node, `ExplicitInstantiationDecl`, to
systematically fix the long-standing issue of missing or incorrect
source location information for explicit template instantiations.
[53 lines not shown]
[flang][OpenMP] Support user-defined declare reduction with derived types (#190288)
Fix lowering of `!$omp declare reduction` for intrinsic operators
applied
to user-defined derived types (e.g., `+` on `type(t)`). Previously, this
hit a TODO in `ReductionProcessor::getReductionInitValue` because the
code
tried to compute an init value for a non-predefined type, when it should
instead use the initializer region from the `DeclareReductionOp`.
This fixes the issue #176278: [Flang][OpenMP] Compilation error when
type-list in declare reduction directive is derived type name.
The root cause was a naming mismatch: `genOMP` for
`OpenMPDeclareReductionConstruct` used a raw operator string (e.g.,
"Add")
as the reduction name, while `processReductionArguments` at the use site
computed a canonical name via `getReductionName` (e.g.,
"add_reduction_byref_rec__QFTt"). The `lookupSymbol` in
[76 lines not shown]
AMDGPU: Skip last corrections in afn f64 reciprocal
Device libs has a fast reciprocal macro that is close
to the fast division expansion, but skips the last terms
compared to the full division.
The basic reciprocal handling has identical output to this
macro. The negative reciprocal case has different fneg placement
and smaller code size, but I believe should be the same.
[CIR] Fix lowering of strings in constant array attributes (#193553)
There was code in the CIR CXXABILowering pass that was assuming
ConstArrayAttr::getElts() would return an ArrayAttr. This isn't true in
the case of string constants with trailing zeros, so we had a crash in a
mlir::cast<> call. The problem only appeared when a string array
appeared in the same initializer as a type that required CXXABI-specific
lowering, such as a member pointer.
This change fixes the CXXABILowering to simply keep the existing string
attribute, which is known to be legal for the purposes of that pass.
Assisted-by: Cursor / claude-4.7-opus-high
[LangRef] inline asm: the instructions are treated opaquely (#157080)
This wasn't true until recently, but
https://github.com/llvm/llvm-project/issues/156571 got fixed to make it
true.
I was not entirely sure where to put this; for now I made it a new
paragraph fairly early on in the inline asm docs.
IR: Allow !fpmath metadata on homogeneous float structs (#193537)
This matches the logic for fast math flags / nofpclass, and allows
marking llvm.sincos calls with !fpmath.
[SLP]Fix scheduling of copyable bundle with commutative op used outside parent PHI
The previous (V, Op) pair insert was a no-op since V is unique per iteration.
Replace it with a hasOneUse() fast path plus a check that bails only when I
has a user outside the grandparent PHI's Scalars. Uses within the same
vectorized PHI are tracked by the existing dep machinery; an external user
(e.g. a scalar PHI in a different block) is what trips scheduleBlock's
"must be scheduled at this point" assertion.
Fixes #193315.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/193566
[CIR] Support guard COMDAT for weak linkage in LoweringPrepare (#193274)
Static locals inside inline functions get `linkonce_odr` linkage, and
their guard variables need their own COMDAT groups so the linker can
deduplicate them across TUs. We were hitting an NYI error for this case
in `LoweringPrepare`.
The fix is straightforward: set `guard.setComdat(true)`, which makes
`LowerToLLVM` create a per-symbol COMDAT selector — the same thing
classic codegen does at `ItaniumCXXABI.cpp:2798`.
I ran into this while trying to compile the Bullet physics engine
through CIR. Functions like `btMatrix3x3::getIdentity()` use this
pattern (return a reference to a function-local static from an inline
member function), and 6 of the 121 source files were failing because of
it. With this fix, all 121 compile cleanly.
Made with [Cursor](https://cursor.com)