RenameIndependentSubregs: try to only implicit def used subregs (#167486)
Attempt to only define used subregisters when creating IMPLICIT_DEF fix
ups for live interval subranges. This avoids the appearance at the MIR
level of entire (wide) registers becoming live rather than relying only
on transient LiveIntervals dead definitions for unused subregisters.
[AMDGPU] Fixed crash in getLastMIForRegion when the region is empty. (#168653)
PreRARematStage builds region live-outs if GCN trackers are enabled. If
rematerialization leads to empty regions, this can cause a crash because
of dereference of an invalid iterator in getLastMIForRegion. The fix is
to skip calling getLastMIForRegion for empty regions.
This patch fixes another bug in the same code region. getLastMIForRegion
calls skipDebugInstructionsBackward which may immediately return the
RegionEnd if it is not the begin instruction and it is a non-debug
instruction. That would imply considering an instruction that is outside
the relevant region. The fix is to always pass the previous of RegionEnd
to skipDebugInstructionsBackward.
This bug was found while using GCN trackers on the existing LIT test
machine-scheduler-sink-trivial-remats.mir. Here's the assertion failure.
llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:168:
llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::reference
[4 lines not shown]
[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)
This PR adds pattern for unrolling shape_cast given a targetShape. This
PR is a follow up of #164010 which was very general and was using
inserts and extracts on each element (which is also
LowerVectorShapeCast.cpp is doing).
After doing some more research on use cases, we (me and @Jianhui-Li )
realized that the previous version in #164010 is unnecessarily generic
and doesn't fit our performance needs.
Our use case requires that targetShape is contiguous in both source and
result vector.
This pattern only applies when contiguous slices can be extracted from
the source vector and inserted into the result vector such that each
slice remains in vector form with targetShape (and not decompose to
scalars). In these cases, the unrolling proceeds as:
vector.extract_strided_slice -> vector.shape_cast (on the slice
unrolled) -> vector.insert_strided_slice
[MLIR][Conversion] XeGPU to XeVM: Use adaptor for getting base address from memref. (#168610)
adaptor already lowers memref to base address.
Conversion patterns should use it instead of generating code to get base
address from memref.
[clang][deps] Enable calling `DepScanFile::getBuffer()` repeatedly (#168789)
This PR makes it possible to call `getBuffer()` on `DepScanFile` (a
`llvm::vfs::File`) repeatedly. Previously, this function would return a
moved-from `unique_ptr`. This doesn't fix any existing bugs, I
discovered this while experimenting with the VFSs in the scanner. Note
that the returned instances of `llvm::MemoryBuffer` are non-owning and
share the underlying buffer storage.
[CIR] Upstream CIR codegen for `lzcnt` and `tzcnt` x86 builtins (#168479)
Support CIR codegen for x86 builtins `__builtin_ia32_lzcnt` and
`__builtin_ia32_tzcnt`.
[AMDGPU] Prioritize allocation of low 256 VGPR classes (#167978)
If we have 1024 VGPRs available we need to give priority to the
allocation of these registers where operands can only use low 256.
That is noteably scale operands of V_WMMA_SCALE instructions.
Otherwise large tuples will be allocated first and take all low
registers, so we would have to spill to get a room for these
scale registers.
Allocation priority itself does not eliminate spilling completely
in large kernels, although helps to some degree. Increasing spill
weight of a restricted class on top of it helps.
[CIR] Add CxxCTorAttr, CxxDTorAttr, CxxAssignAttr, CxxSpecialMemberAttr to cir::FuncOp (#167975)
This PR adds a special member attribute to `cir::FuncOp`. This attribute
is also present in the incubator repo. Additionally, I added a
"is_trivial" flag, to mark trivial members. I think that might be useful
when trying to replace calls to the copy constructor with memcpy for
example, but please let me know your thoughts on this. [Here in the
incubator
repo](https://github.com/llvm/clangir/blob/823e943d1b9aaba0fc46f880c5a6ac8c29fc761d/clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp#L1537-L1550)
this function is called `LowerTrivialConstructorCall`, but I don't see a
check that ensures the constructor is actually trivial.
[CIR] Upstream isfpclass op (#166037)
Ref commit in incubator: ee17ff67f3e567585db991cdad1159520c516bb4
There is a minor change in the assumption for emitting a direct callee.
In incubator, `bool hasAttributeNoBuiltin = false`
(`llvm-project/clang/lib/CIR/CodeGen/CIRGenExpr.cpp:1671`), while in
upstream, it's true, therefore, the call to finite(...) is not converted
to a builtin anymore.
Fixes #163892
[ConstantFolding] Add constant folding for scalable vector interleave intrinsics. (#168668)
We can constant fold interleave of identical splat vectors to a larger
splat vector.
[clang][DebugInfo] Mark _BitInt's as reconstitutable when emitting -gsimple-template-names (#168383)
Depends on:
* https://github.com/llvm/llvm-project/pull/168382
As of recent, LLVM includes the bit-size as a `DW_AT_bit_size` (and as
part of `DW_AT_name`) of `_BitInt`s in DWARF. This allows us to mark
`_BitInt`s as "reconstitutable" when compiling with
`-gsimple-template-names`. We still only omit template parameters that
are `<= 64` bit wide. So support `_BitInt`s larger than 64 bits is not
part of this patch.
[OpenACC] Make sure 'link' gets the right node in the AST with ASE
Another miss when working through 'link', we didn't properly handle
giving the whole array-section expression or array index expression,
instead allowed it to only get the decl-ref-expr. This patch makes
sure we don't add the wrong thing.
[SystemZ] Fix linux s390x main can't bootstrap itself on SanitizerSpecialCaseList.cpp #168088 (#168779)
This test has long call chain in recursion. Search tree can be pruned
early by swapping CC test and recursive simplifyAssumingCCVal.
Fixes: https://github.com/llvm/llvm-project/issues/168088
Co-authored-by: anoopkg6 <anoopkg6 at github.com>
[lit] Add LIT_CURRENT_TESTCASE environment variable when running tests (#168762)
I'm not aware of any way for `%run` wrapper scripts like
`iosssim_run.py`
([ref](https://github.com/llvm/llvm-project/blob/d2c7c6064259320def7a74e111079725958697d4/compiler-rt/test/sanitizer_common/ios_commands/iossim_run.py#L4))
to know what testcase they are currently running. This can be useful if
these wrappers need to create a (potentially remote) temporary directory
for each test case.
This adds the `LIT_CURRENT_TESTCASE` environment variable to both the
internal shell and the external shell, containing the full name of the
current test being run.
[llvm][DebugInfo] Add support for _BitInt in DWARFTypePrinter (#168382)
As of recent, LLVM includes the bit-size as a `DW_AT_bit_size` (and as
part of `DW_AT_name`) of `_BitInt`s in DWARF. This allows us to mark
`_BitInt`s as "reconstitutable" when compiling with
`-gsimple-template-names`. However, before doing so we need to make sure
the `DWARFTypePrinter` can reconstruct template parameter values that
have `_BitInt` type. This patch adds support for printing
`DW_TAG_template_value_parameter`s that have `_BitInt` type. Since
`-gsimple-template-names` only omits template parameters that are `<=
64` bit wide, we don't support `_BitInt`s larger than 64 bits.
[HLSL][TableGen] Add `__hlsl_resource_t` to known built-in function types (#163465)
This change adds resource handle type `__hlsl_resource_t` to the list of types recognized in the Clang's built-in functions prototype string.
HLSL has built-in resource classes and some of them have many methods, such as
[Texture2D](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-texture2d).
Most of these methods will be implemented by built-in functions that will take resource handle as an argument. This change enables us to move from generic `void(...)` prototype string for these methods and explicit argument checking in `SemaHLSL.cpp` to a prototype string with explicit argument types. Argument checking in `SemaHLSL.cpp` can be reduced to handle just the rules that cannot be expressed in the prototype string (for example verifying that the offset value in `__builtin_hlsl_buffer_update_counter` is `1` or `-1`).
In order to make this work, we now allow conversions from attributed resource handle type such as `__hlsl_resource_t [[hlsl::resource_class(UAV)]] [[hlsl::contained_type(float)]]` to a plain non-attributed `__hlsl_resource_t` type.
[OpenACC] Fix crash when checking an section in a 'link' clause (#168783)
I saw this while doing lowering, we were not properly looking into the
array sections for the variable. Presumably we didn't do a good job of
making sure we did this right when making this extension, and missed
this spot.
AMDGPU: Handle invariant loads when considering if a load can be scalar
Doesn't touch the globalisel version because the handling
there looks a bit broken.
Reapply "DAG: Allow select ptr combine for non-0 address spaces" (#168292)
This reverts commit 6d5f87fc4284c4c22512778afaf7f2ba9326ba7b.
Previously this failed due to treating the unknown MachineMemOperand
value as known uniform.
AMDGPU: Fix treating divergent loads as uniform
Avoids regression which caused the revert 6d5f87fc42.
This is a hack on a hack. We currently have isUniformMMO,
which improperly treats unknown source value as known uniform.
This is hack from before we had divergence information in the
DAG, and should be removed. This is the minimum change to avoid
the regression; removing the aggressive handling of the unknown
case (or dropping isUniformMMO entirely) are more involved fixes.
[NFC][TableGen] Use `IfGuardEmitter` in CallingConvEmitter (#168763)
Use `IfGuardEmitter` in CallingConvEmitter. Additionally refactor the
code a bit to extract duplicated code to emit the CC function prototype
into a helper function.
[OpenACC][CIR] Fix atomic-capture single-line-postfix (#168717)
In my last patch, it became clear during code review that the postfix
operation was actually a read THEN update, not update/read like other
single line versions. It wasn't clear at the time how much additional
work this would be to make postfix work correctly (and they are a bit of
a 'special' thing in codegen anyway), so this patch adds some
functionality to sense this and special-cases it when generating the
statement info for capture.
Fix build breakage from: #167948 (#168781)
It appears that this broke the build by not using the 'correct' name for
the expression. This is probably something that crossed in review.
[VPlan] Collect FMFs for in-loop reduction chain in VPlan. (NFC)
Replace retrieving FMFs for in-loop reduction via underlying instruction
+ legal by collecting the flags during reduction chain traversal in
VPlan.
[CIR] Upstream CIR await op (#168133)
This PR upstreams `cir.await` and adds initial codegen for emitting a
skeleton of the ready, suspend, and resume branches. Codegen for these
branches is left for a future PR. It also adds a test for the invalid
case where a `cir.func` is marked as a coroutine but does not contain a
`cir.await` op in its body.