[InferAddressSpaces] Fix bad `addrspacecast` insertion for phinode (#163528)
The IR verifier will carsh if there is any instructions located before
phi-node. The `infer-address-spaces` pass would like to insert
`addrspacecast` before phi-node in some corner cases. Indeed, since the
operand pointer(phi-node's incoming value) has been determined to
`NewAS` by the pass, it is safe to `addrspacecast` it immediately after
the position where defined it.
Co-authored-by: Kerang Mao <krmao at birentech.com>
[AArch64][SME] Handle SME state around TLS-descriptor calls (#155608)
This patch ensures we switch out of streaming mode before TLS-descriptor
calls. ZA state will also be preserved when using the new SME ABI
lowering (`-aarch64-new-sme-abi`).
Fixes #152165
[AArch64] Remove FEAT_TME assembly and ACLE support
The Transactional Memory Extension (TME) was introduced as part of
Armv9-A but has not been adopted by the ecosystem. This mirrors what
Arm has observed with similar extensions in other architectures.
Therefore, remove FEAT_TME assembly and ACLE code from llvm, because
support for TME has now been officially withdrawn, as noted here:
```
FEAT_TME is withdrawn from all future versions of Arm®
Architecture Reference Manual for A-profile architecture.
```
referenced in Known Issue D24093, documented here:
https://developer.arm.com/documentation/102105/lb-05/
[TableGen][NFCI] Change TableGenMain() to take function_ref.
It was switched from a function pointer to std::function in
TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6165ab6add5e57cd43a2e9fa1a9bc21d81
but there's no mention of any particular reason for that.
[TableGen] Split *GenRegisterInfo.inc.
Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.
AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.
Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.
It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
[2 lines not shown]
[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit
Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.
Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.
This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
where we spill SGPRs to VGPRs, as long as this does not end up in
excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
includes the SGPRs that would be spilled to VGPR).
With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.
[4 lines not shown]
[mlir][tosa] Allow int64 index tensors in gather/scatter (#167894)
This commit ensures that gather and scatter operations with int64 index
tensors can be created. This aligns with the EXT_INT64 extension.
[GlobalISel] Add support for value/constants as inline asm memory operand (#161501)
InlineAsmLowering rejected inline assembly with memory reference inputs
if the values passed to the inline asm weren't pointers. The DAG
lowering however handled them just fine.
This patch updates InlineAsmLowering to store such values on the stack,
and then use the stack pointer as the "indirect" version of the operand.
[libc++] proper guarding for locale usage in filesystem on Windows (#165470)
- Resolves build issues when localization support is disabled on
Windows.
- Resolves dependencies on localization in filesystem header
implementations.
Related PR #164602
Fixes #164074
Destroy tasks as they are run in the thread pool (#167852)
Without this, any RAII objects held in the task's captures aren't
destroyed in a similar fashion to the task being run. If those objects
in turn interact with the thread pool itself, chaos ensues. This comes
up quite naturally with RAII-objects used for synchronization such as
RAII-powered latches or releasing a mutex, etc.
A unit test is crafted that tries to very directly test that the logic
of the thread pool continues to hold even with an RAII object. This
isn't the only type of failure mode (a deadlock due to mutexes in the
captures can also occur), but seemed the easiest to test.
[LV] Explicitly disable in-loop reductions for AnyOf and FindIV. nfc (#163541)
Currently, in-loop reductions for AnyOf and FindIV are not supported.
They were implicitly blocked. This happened because
RecurrenceDescriptor::getReductionOpChain could not detect their
recurrence chain. The reason is that RecurrenceDescriptor::getOpcode was
set to Instruction::Or, but the recurrence chains of AnyOf and FindIV do
not actually contain an Instruction::Or.
This patch explicitly disables in-loop reductions for AnyOf and FindIV
instead of relying on getReductionOpChain to implicitly prevent them.
[libc++] Reorganize and fix the libc++ CI dockerfiles (#167530)
Instead of having one large Dockerfile building multiple images with
relatively confusing inheritance, explicitly have three standalone
Dockerfiles each building one image. Then, tie the three images together
using the docker-compose file which explicitly versions the base image
used by the Android and the Github Actions images.
[Linalg] Add basic infra to add matchers for linalg.*conv*/*pool* ops (#163724)
-- This commit includes the basic infra/utilities to add matchers for
linalg.*conv*/*pool* ops - such that given a `linalg.generic` op it
identifies which linalg.*conv*/*pool* op it is.
-- It adds a few representative linalg.*conv*/*pool* ops to demo the
matchers' capability and does so as part of
`linalg-specialize-generic-ops`
pass.
-- The goal is directed towards addressing the aim of
[[RFC] Op explosion in
Linalg](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863)
iteratively for `*conv*/*pooling*` ops.
-- This is part-1 of a series of PRs aimed to add matchers for
Convolution ops.
-- For further details, refer to
https://github.com/llvm/llvm-project/pull/163374#pullrequestreview-3341048722
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
[LoongArch] Override `isLSRCostLess` to set `Insns` as the first priority
Similar to several other targets, this commit override
`isLSRCostLess` to set instruction number as the first priority
when LSR pass deciding the cost.
Besides, this commit also takes the extra temporary register
may be used into account in `NumRegs`. This is same as riscv,
see the reason in https://github.com/llvm/llvm-project/pull/92296.
[libcxx] [doc] Document the supported target versions of Windows (#167845)
The llvm-mingw toolchains defaults to `_WIN32_WINNT=0x601`, so this
configuration is covered by our CI build matrix.
[libcxx] [doc] Update the docs about LIBCXX_ENABLE_FILESYSTEM (#167843)
Since 1939eb3dc2330af6fb9609a7c3bd5276e127c9ce, std::filesystem is
enabled by default in MSVC builds too.