[clang] fix crash related to missing source locations for converted template arguments
This adds a way to attach source locations to trivially created template
arguments such as packs, or converted expressions when there is no
expression anymore.
This also avoids crashes due to missing source locations.
In a few places where this matters, we already create expressions
from the converted arguments, but this requires access to Sema,
where currently creating trivial typelocs only requires access to
to the ASTContext.
So this creates a new storage kind for TemplateArgumentLocs, where
a single SourceLocation is stored, embedded in the pointer where
possible.
As a drive-by, strenghten asserts by enforcing the TemplateArgumentLocs
are created with the right kinds of locations.
[2 lines not shown]
[MLIR][XeVM] Update HandleVectorExtractPattern (#186247)
isExtractContiguousSlice:
- Check if mask size is not greater than the vector size of the operand.
- Check if mask values do not exceed vector size.
HandleVectorExtractPattern:
- Narrow the scope of matching to,
- Source shuffle doing contiguous extract
- Source shuffle with at least the same mask size.
[CIR] Fix CFG flattening for loops with cleanup in special regions (#187369)
If a loop required a cleanup scope in the condition or step region of
the loop, we crashed during CFG flattening because the flattening of the
cleanup scope created multiple blocks in the region, but we were
assuming there would only be one block.
This change updates the CFG flattening code to look for the
cir.condition or cir.yield operation in the last block of the region.
[MLIR][XeVM] Add truncf and mma_mx op. (#180055)
truncf op converts 16 bit floats to 8 bit or 4 bit floats.
mma_mx op does cooperative matrix multiply accumulate on
8 or 4 bit float type with 8bit scale value.
[VPlan] Fix masked_cond expansion.
masked_cond is used to combine early-exit conditions with masks from
predicate. The early-exit condition should only be evaluated if the mask
is true. Emit the mask first, to avoid incorrect poison propagation.
Fixes https://github.com/llvm/llvm-project/issues/187061.
[MLIR][Python] Add optional emit reset to exportSMTLIB (#187366)
Previously, the MLIR's python binding `smt.export_smtlib(...)` always
emit `(reset)` to the end of smtlib string as a solver terminator.
This PR added an option to suppress this trailing, as downstream users
like python z3 module don't need it.
[RISCV] Fix IDiv/IRem scheduling data for RV32 cores that use the SiFive7 model (#187331)
The integer division and remainder instructions on a 32-bit core that
uses SiFive7 scheduling model should have the same latency and
throughput as its word counterparts on a 64-bit SiFive7 core.
This patch fixes those scheduling entries by adding a new SchedPred that
predicates on `Feature64Bit` to toggle the SchedVariant that is attached
on the affected integer division / remainder instructions.
[AMDGPU] Regenerate codegen tests to check extra stuff at end of line (#187325)
Regenerate checks after two recent commits that caused extra stuff to be
added at the end of assembly lines, so the existing checks did not fail.
- #179414 added "nv" to loads and stores on GFX1250.
- #185774 added "msbs" comments on setreg instructions.
[LSR] skip ephemeral IV users when collecting IV chains (#187282)
IVUsers records ephemeral values used only by `llvm.assume` as IV
operands in the Processed set. As a result, `CollectChains` picks them
up and builds unnecessary increment chains. Fix this by checking
`IVUsers::isEphemeral` before collecting the chains.
Fixes #187270
[CIR][NFC] Remove NYI checks in ternary with cleanup (#186870)
We added those checks when CleanupScopeOp is used to emit an error
message in this edge case until we fix it. Now it's already fixed, and
we don't need to keep the NYI
[AMDGPU] Updated getMaxNumAGPRs to use getMaxNumVectorRegs.
Removed other variants of getMaxNumAGPRs. So with this patch, there
is only one way to get the maximum number of AGPRs. If the client
provides a target occupancy, that value will be used. Otherwise,
the function level attributes for waves-per-eu are used. In both the
cases, the utility uses getMaxNumVectorRegs.