[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)
The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).
I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
[AMX][NFC] Match pseudo name with isa (#182235)
Adds missing suffix to clear intent for isa.
we switch from `TILEMOVROWrre` to `TILEMOVROWrte` in
https://github.com/llvm/llvm-project/pull/168193 , however pseudo was
same, updating pseudo to intent right isa version, This patch makes
changes `PTILEMOVROWrre` to `PTILEMOVROWrte`, even though pseudo does
not actually have any tile register.
---------
Co-authored-by: mattarde <mattarde at intel.com>
[Clang][NFCI] Make program state GDM key const pointer (#183477)
This commit makes the GDM key in ProgramState a constant pointer. This
is done to better reflect the intention of the key as a unique
identifier for the data stored in the GDM, and to prevent the use of the
storage pointed to by the key as global state.
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
Lower strictfp vector rounding operations similar to default mode
Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)
Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.
VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.
Fixes #165813
AMDGPU: Skip last corrections in afn f64 reciprocal
Device libs has a fast reciprocal macro that is close
to the fast division expansion, but skips the last terms
compared to the full division.
The basic reciprocal handling has identical output to this
macro. The negative reciprocal case has different fneg placement
and smaller code size, but I believe should be the same.
[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)
After #183080 vscale can no longer be a non-power of 2, which means the
canonical IV can't overflow with tail folding w/ scalable vectors
anymore. Therefore we don't need to drop the NUW flag.
IVUpdateMayOverflow is left to be removed in a separate PR since it
removes further runtime checks.
[Clang][AMDGPU] Change __fp16 to _Float16 in GFX1250 WMMA/SWMMAC builtin definitions (#183493)
Change the type signature of `gfx1250 WMMA/SWMMAC` builtins from
`__fp16` to `_Float16` in the tablegen builtin definitions.
[mlirbc] Switch generator to enable write's with failures. (#182464)
Previously one had to have a matching case per entry (e.g., one could
use a printer predicate, but the assumption was one woujld never
fallback) and just always return success.
[Clang][docs] Fix proposal number typo for P1847R4 (#183671)
This PR fixes a typo in `clang/www/cxx_status.html`.
The link text for the feature "Make declaration order layout mandated"
incorrectly referred to **P1874R4**, while the actual URL
(https://wg21.link/p1847r4) and the feature name correctly point to
**P1847R4**.
This change corrects the displayed text to match the proposal number.
Revert "[Sema] Fix crash on invalid operator template-id (#181404)" (#183682)
Reverts llvm/llvm-project#181404
(c056d7c5d6ea076b38fa937c54ab44ce2e5a95e1) because of post-commit ci
failure.
[lldb] Don't add remap entries for empty segments (#183651)
There are some binaries in the shared cache with a zero-length segment,
or segments who get mapped to lldb address 0 to indicate a failure. Do
not add entries to the VirtualDataExtractor's LookupTablefor those -
they
are not readable.
rdar://171106338
[MLIR][XeGPU] XeGPU Layout adds support for fractional-subgroup-size vector (#183434)
This PR enhances the layout assignment for XeGPU load/store operations
to handle vector size smaller than subgroup size.
Say for vector[4], in case of lane_data=[1], lane_layout=[4] and
inst_data=[4].
The fractional-subgroup-size vector support is required to support the
cross-subgroup reduction case. The number of participant subgroups in
reduction can be small, so it causes each subgroup needs to reduce a
small vector size, often a fraction of subgroup size.
Most layout-based subgroup distribution patterns support
fraction-subgroup-size without no change except a few: reduction,
insert/extract, constant. We don't expect ND operations (like
load_nd/store_nd/dpas) accept fractional-subgroup-size vector.
Revert "[mlir-tblgen] Remove `namespace {}` around OpDocGroup (#182721)" (#183458)
Reverts #182721, it's not needed after #183457.
It was a work around for #182720.
This reverts commit a0f344f69d7eb5d87dd78c628a196a3a7440e792.
[SafeStack] Allow -fsanitize-minimal-runtime with -fsanitize=safestack (#183644)
SafeStack does not require a full sanitizer runtime, so it should be
compatible
with the minimal runtime flag.
[mlir][vector] Fix fold result for empty vector.mask with no results (#180345)
This PR fixes `foldEmptyMaskOp` to return `failure` when folding an
empty vector.mask whose terminatorhas no operands. Previously this case
returned success without producing any folded results, which violates
the folding contract. Fixes #177825.
[DenseMap] Add memory barrier for sanitizers in getInlineBuckets/getLargeRep (#183457)
Add a compiler memory barrier to prevent optimizations from triggering
false positives on partially poisoned buckets in (HW)ASan.
Fixes #182720.