[libclc] compile w/o linking builtins with SPIRV backend (#176732)
As we're only building a single file, there is no need to link. This
avoids a dependency on spriv-link when we're using the native SPIRV
backend.
[CodeGen] Set pseudo probe desc comdat symbol to external for COFF (#176706)
lld-link performs COMDAT sections deduplication only when COMDAT symbol
is external.
[X86AsmBackend] Check fixup value overflow (#176827)
GNU Assembler has a generic error checking for overflowed fixup values
```
y.s:5: Error: value of 8000000000000000 too large for field of 4 bytes at 0000000000000004
```
In contrast, we have had an assertion that may fail for a long time.
https://reviews.llvm.org/D70652 improved the status by adding an
overflow check for PC-relative fixups, but missed other cases (#116899).
This patch improves the overflow check to resemble GAS.
For `.long x`, GAS accepts `x` if its value is in the range `(-2**32,
2**32)`. This design allows `.long x` to work regardless of signedness.
When a symbol is involved, GAS supports both `.long sym-0xffffffff` and
`.long sym+1`, as well as `.long sym+0xffffffff` and `.long sym-1`.
However,
`.long sym+0x100000000` is rejected in favor of `.long sym+0`.
[13 lines not shown]
[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)
Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16 (#176382)
Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication) and
makes it difficult to deal with the resulting IR in peephole
optimizations (e.g. si-peephole-sdwa).
Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom trig lowering to deal
with vectors.
[Offload][Tests] Non-contiguous_update_to_tests (#169623)
PR #144635 enabled non-contiguous updates for both `update from` and
`update to` clauses, but tests for `update to` were missing. This PR
adds those missing tests to ensure coverage.
[update_mc_test_checks] Support --show-inst output
This is useful to check that the correct registers were used in cases
where different register classes use the same name in asm input/output.
Pull Request: https://github.com/llvm/llvm-project/pull/174011
[NFCI][AMDGPU] Use X-macro to reduce boilerplate in `GCNSubtarget.h`
`GCNSubtarget.h` contained a large amount of repetitive code following the pattern `bool HasXXX = false;` for member declarations and `bool hasXXX() const { return HasXXX; }` for getters. This boilerplate made the file unnecessarily long and harder to maintain.
This patch introduces an X-macro pattern `GCN_SUBTARGET_HAS_FEATURE` that consolidates 129 simple subtarget features into a single list. The macro is expanded twice: once in the protected section to generate member variable declarations, and once in the public section to generate the corresponding getter methods. This reduces the file by approximately 265 lines while preserving the exact same API and functionality. Features with complex getter logic or inconsistent naming conventions are left as manual implementations for future improvement.
Ideally, these could be generated by TableGen using `GET_SUBTARGETINFO_MACRO`, similar to the X86 backend. However, `AMDGPU.td` has several issues that prevent direct adoption: duplicate field names (e.g., `DumpCode` is set by both `FeatureDumpCode` and `FeatureDumpCodeLower`), and inconsistent naming conventions where many features don't have the `Has` prefix (e.g., `FlatAddressSpace`, `GFX10Insts`, `FP64`). Fixing these issues would require renaming fields in `AMDGPU.td` and updating all references, which is left for future work.
[AMDGPU] Pre-commit test for WMMA NOP hoisting optimization (#176745)
Add test showing current behavior where V_NOP instructions for WMMA
coexecution hazards are inserted inside loop bodies at the use-site. A
future patch will hoist these NOPs to loop preheaders to reduce
redundant execution.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan at amd.com>
[Linalg] Support i1 data type in matchConvolutionOpOfType utility (#176704)
-- Extend bodyMatcherForConvolutionOps to recognize arith.ori/arith.andi
for i1 element types (in addition to add/mul for integer/float types)
for accumulation and multiplication.
-- Similarly, extend bodyMatcherForSumPoolOps to recognize arith.ori for
i1 accumulation (in addition to add for integer/float types).
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
[clang][bytecode] Handle corner condition for sign negation (#176390)
RHS = -RHS works for most cases, however, the behaviour when RHS is
INTXX_MIN is undefined. In these particular case(s), we should use
INTXX_MAX instead.
Fixes #176271.
[Clang] Check enable_if attribute without delayed diagnostics (#176080)
We ensure immediate access control checking when evaluating the
enable_if attribute to rule out inaccessible constructors during
potential overload resolution, treating them as SFINAE errors rather
than hard errors, making the behavior more preferable with the nature of
the enable_if attribute.
Compared to the last patch, we now avoid switching the DC directly
because there are cases where we're checking enable_if attribute within
a lambda and getCurLambda() requires a lambda context to distinguish
from template instantiation.
This reapplies #175899
Fixes https://github.com/llvm/llvm-project/issues/175895
[Clang] Ensure a lambda DeclContext in BuildLambdaExpr (#176319)
Since 5f9630b388, we only remove the LSI after the evaluation context is
popped. The TreeTransform of immediate functions may call getCurLambda,
which requires both the paired LSI and the lambda DeclContext. In
TransformLambdaExpr, we already switched the context, but this is not
the case when parsing a lambda expression.
No release note, as this is a regression from 22.
Fixes https://github.com/llvm/llvm-project/issues/176045
workflows/release-lit: Update workflow and enable trusted publishing with pypi (#174907)
This makes some small improvements to the workflow including using some
more modern python packaging modules and also enables the trusted
publishing for pypi. This will allow us to publish lit packages to pypi
without needing to use an access token.
This action also now uses the pypi environment which will only publish
files when triggered by an llvm-* tag.
[NFCI][AMDGPU] Use X-macro to reduce boilerplate in `GCNSubtarget.h`
`GCNSubtarget.h` contained a large amount of repetitive code following the pattern `bool HasXXX = false;` for member declarations and `bool hasXXX() const { return HasXXX; }` for getters. This boilerplate made the file unnecessarily long and harder to maintain.
This patch introduces an X-macro pattern `GCN_SUBTARGET_HAS_FEATURE` that consolidates 129 simple subtarget features into a single list. The macro is expanded twice: once in the protected section to generate member variable declarations, and once in the public section to generate the corresponding getter methods. This reduces the file by approximately 265 lines while preserving the exact same API and functionality. Features with complex getter logic or inconsistent naming conventions are left as manual implementations for future improvement.
Ideally, these could be generated by TableGen using `GET_SUBTARGETINFO_MACRO`, similar to the X86 backend. However, `AMDGPU.td` has several issues that prevent direct adoption: duplicate field names (e.g., `DumpCode` is set by both `FeatureDumpCode` and `FeatureDumpCodeLower`), and inconsistent naming conventions where many features don't have the `Has` prefix (e.g., `FlatAddressSpace`, `GFX10Insts`, `FP64`). Fixing these issues would require renaming fields in `AMDGPU.td` and updating all references, which is left for future work.