[MLIR][Python] Fix AffineIfOp insertion point (#171957)
This bug was introduced by #108323, where the loc and ip were not
properly set. It may lead to errors when the operations are not linearly
asserted to the IR.
[libc++][format] Applied `[[nodiscard]]` to more classes (#170808)
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html
Some classes in `<format>` were already annotated. This patch completes
the remaining.
[RISCV] Add an OperandType to VMaskOp. NFC (#171926)
Use that instead of register class to detect the mask operand in
lowerRISCVVMachineInstrToMCInst.
There are other instructions like vmerge and vadc that have a VMV0
operand that isn't optional and do not reach this code. Having a
dedicated marker for the optional mask is more precise.
[RISCV] Use VMV0 instead of VMaskOp in masked vector pseudoinstructions. NFC (#171924)
VMaskOp primarily exists for parsing/printing in the MC layer where the
mask is optional. The vector pseudos are split into mask and unmasked
versions. The mask is always rquired for the mask version.
[IR] Optimzie `PHINode::removeIncomingValue()` by swapping with the last of incoming value.
Add an optional argument `KeepIncomingOrder` defaults true, when `KeepIncomingOrder` is true,
the new implementation simply moves the last incoming value and block into the position of the element being removed.
This improve compile-time for PHI nodes with many predecessors.
[AMDGPU] DS loop wait relaxation -- more test cases and improvements to handle them (4/4)
Add handling for same-iteration use/overwrite of DS load results:
- Track DS load destinations and detect when results are used or
overwritten within the same iteration
- Compute FloorWaitCount for WMMAs that only use flushed loads
Add bailout for tensor_load_to_lds and LDS DMA writes after barrier
Add negative test based on profitability criteria
Assisted-by: Cursor / claude-4.5-opus-high
[Clang] Remove the early-check for anonymous struct in ShouldDeleteSpecialMember (#171799)
That check doesn't seem very useful. For non-dependent context records,
ShouldDeleteSpecialMember is called when checking implicitly defined
member functions, before the anonymous flag which the check relies on is
set. (One could notice that in ParseCXXClassMemberDeclaration,
ParseDeclarationSpecifiers ends up calling
ShouldDeleteSpecialMember, while the flag is only set later in
ParsedFreeStandingDeclSpec.)
For dependent contexts, this check actually breaks correctness: since we
don't create those special members until the template is instantiated,
their deletion checks are skipped because of the anonymity.
There's only one regression in ObjC test about notes; we are more
explanative now.
Fixes https://github.com/llvm/llvm-project/issues/167217
SROA: Recognize llvm.protected.field.ptr intrinsics.
When an alloc slice's users include llvm.protected.field.ptr intrinsics
and their discriminators are consistent, drop the intrinsics in order
to avoid unnecessary pointer sign and auth operations.
Reviewers: nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/151650
[HLSL] Add the DXC matrix orientation flags (#171550)
fixes #58676
- Make /Zpr and /Zpc turn on the -fmatrix-memory-layout= row-major and
column-major flags
- Add the new DXC driver flags to Options.td
- Error in the HLSL toolchain when both flags are specified
- Add the new error diagnostic to DiagnosticDriverKinds.td
- propogate the flag via the Clang toolchain
[flang][docs] Remove stale inline links to Intel and IBM compiler option
Remove all inline links to Intel and IBM compiler options from the
comparison tables, as these links have become stale (Intel links
redirect to generic pages, IBM links redirect to PDF-only pages).
Option names are preserved for readability. The Data sources section
still contains links to the main documentation pages.
Details:
- Removed 43 Intel compiler option links
- Removed 35 IBM compiler option links
- Removed 2 stale links in notes section
- Updated documentation text accordingly
Fixes #171464
---------
Co-authored-by: Tarun Prabhu <tarun at lanl.gov>
AMDGPU/PromoteAlloca: Refactor into analysis / commit phases (#170512)
This change is motivated by the overall goal of finding alternative ways
to promote allocas to VGPRs. The current solution is effectively limited
to allocas whose size matches a register class, and we can't keep adding
more register classes. We have some downstream work in this direction,
and I'm currently looking at cleaning that up to bring it upstream.
This refactor paves the way to adding a third way of promoting allocas,
on top of the existing alloca-to-vector and alloca-to-LDS. Much of the
analysis can be shared between the different promotion techniques.
Additionally, the idea behind splitting the pass into an analysis
phase and a commit phase is that it ought to allow us to more easily
make
better "big picture" decision about which allocas to promote how in the
future.