[RISCV] Incorporate scalar addends to extend vector multiply accumulate chains (#168660)
Previously, the following:
%mul0 = mul nsw <8 x i32> %m00, %m01
%mul1 = mul nsw <8 x i32> %m10, %m11
%add0 = add <8 x i32> %mul0, splat (i32 32)
%add1 = add <8 x i32> %add0, %mul1
lowered to:
vsetivli zero, 8, e32, m2, ta, ma
vmul.vv v8, v8, v9
vmacc.vv v8, v11, v10
li a0, 32
vadd.vx v8, v8, a0
After this patch, now lowers to:
li a0, 32
vsetivli zero, 8, e32, m2, ta, ma
vmv.v.x v12, a0
[12 lines not shown]
llvm: Disable copy for SingleThreadExecutor (#168782)
This is a workaround for the MSVC compiler, which attempts to generate a
copy assignment operator implementation for classes marked as
`__declspec(dllexport)`. Explicitly marking the copy assignment operator
as deleted works around the problem.
DevCom ticket:
https://developercommunity.microsoft.com/t/Classes-marked-with-__declspecdllexport/11003192
[lldb] Don't enable the Limited C API with Python 3.13 and SWIG 4.4.0 (#169065)
Don't automatically enable the Limited C API when we're targeting Python
3.13 or later in combination with SWIG 4.4.0 due to a bug in the latter.
SWIG Issue: https://github.com/swig/swig/issues/3283
SWIG PR: https://github.com/swig/swig/pull/3285
[AST] Construct iterator_range with the conversion constructor (NFC) (#169004)
This patch simplifies iterator_range construction with the conversion
constructor.
[scudo] Small cleanup of memory tagging code part 2. (#168807)
Make the systemSupportsMemoryTagging() function return even on system
that don't support memory tagging. This avoids the need to always check
if memory tagging is supported before calling the function.
Modify iterateOverChunks() to call useMemoryTagging<>(Options) to
determine if mte is supported. This already uses the cached check of
systemSupportsMemoryTagging() rather than directly calling that
function.
Updated the code that calls systemSupportsMemoryTagging().
[flang] Use hlfir.cmpchar for SELECT CASE of charsSelect case hlfir cmpchar (#168476)
For SELECT CASE with character selector, instead of allways calling
runtime comparison function, emit hlfir.cmpchar. This has different
behaviors at different optimization levels: at -O0, it still emits
flang-rt call, but at higher optimization levels it does inline
comparison. Modify test/Lower/select-case-statement.f90 to test both
comparison cases.
[lldb] Add MockMemory class for dwarf expression testing (#168467)
This change unifies the way that we specify mocked memory to make it
easy to control the process and target memory contents for unit tests.
We add a MockMemory class that can be used in dwarf expression testing
to specify the output of the `ReadMemory` function.
The MockMemory class is built on a map that maps a `(address, size)`
pair to a vector of bytes that is `size` bytes long and contains the
memory contents for that `address`.
The MockProcessWithMemRead and MockTarget classes are updated to use the
new MockMemory interface. The MockProcessWithMemRead class was renamed
to MockProcess and the old MockProcess was deleted. The old MockProcess had
and ReadMemory implementation that returned the value `i & 0xff` for reading the
address `i` and was easily be replaced with the MockMemory object.
The CreateTestContext function now takes optional values for process memory and
target memory and uses those to create the mock objects.
[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)
ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).
Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.
After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.
Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.
X86: Stop overriding getRegClass
This function should not be virtual; making this virtual was
an AMDGPU hack that should be removed not spread to other
backends.
This does not need to be overridden to reserve registers. The
register reservation mechanism is orthogonal to to the register
class constraints of the instruction, this should be reporting
the underlying instruction constraint. The registers are separately
reserved, so they will be removed from the allocation order anyway.
If the actual class needs to change based on the subtarget,
it should probably generalize the LookupPtrRegClass mechanism.
This was added by #70958. The new tests there for the class are
probably not useful anymore. These instead should compile to the
end and try to stress the allocation behavior.
[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)
Some targets have a specific calling convention that should be used for
generated calls to runtime functions.
Pass that down and use it.
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[acc][flang] Implement acc interface for tracking type descriptors (#168982)
FIR operations that use derived types need to have type descriptor
globals available on device when offloading. Examples of this can be
seen in `CUFDeviceGlobal` which ensures that such type descriptor uses
work on device for CUF.
Similarly, this is needed for OpenACC. This change introduces a new
interface to the OpenACC dialect named
`IndirectGlobalAccessOpInterface` which can be attached to operations
that may result in generation of accesses that use type descriptor
globals. This functionality is needed for the `ACCImplicitDeclare` pass
that is coming in a follow-up change which implicitly ensures that all
referenced globals are available in OpenACC compute contexts.
The interface provides a `getReferencedSymbols` method that collects all
global symbols referenced by an operation. When a symbol table is
provided, the implementation for FIR recursively walks type descriptor
globals to find all transitively referenced symbols.
[13 lines not shown]
[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)
Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.
Fixes: #168761
AMDGPU: Stop implementing shouldCoalesce (#168988)
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049)
Reverts llvm/llvm-project#165873
The revert is triggered by a failing integration test on a couple of
buildbots.