[TableGen] Slightly improve error location for a fatal error
I was hitting this error and the error location was pointing to the
register class definition instead of the incorrect InstAlias. Pass in
the InstAlias location to make it easier to debug.
Happens with `def : InstAlias<"foo", (Inst X0)>`, where `Inst` takes
a RegClassByHwMode operand that is not necessarily satisfied by
register X0. Similar problem with the CompressPat backend.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/170790
[MCAsmStreamer] Print register names in --show-inst mode
Passing the context to `Inst.dump_pretty()` allows printing symbolic
register names instead of `<MCOperand Reg:1234>` in the output.
I plan to use this in a future RVY test cases where we have register
class with the same name in assembly syntax, but different underlying
register enum values. Printing the name of the enum value makes it
easier to test that we selected the correct register.
Reviewed By: lenary
Pull Request: https://github.com/llvm/llvm-project/pull/171252
[Clang][CUDA] Add support for SM_88, SM_110, and SM_110a architectures (#170258)
This patch adds support for new GPU architectures introduced in CUDA
13.0 in Clang:
- SM_88: Ampere architecture variant
- SM_110/SM_110a: Blackwell architecture variants
Additionally, this patch deprecates SM_101/SM_101a support for CUDA 13.0
and later versions. The SM_101 architecture is superseded by SM_110 and
is no longer supported by CUDA 13.0+ toolchain components.
[Clang][counted_by] Correct signed counted_by values
If the 'counted_by' value is signed, we will incorrectly allow accesses
when the value is negative. This has obvious bad effects as it will
allow accessing a huge swath of unallocated memory.
Also clarify and rearrange the parameters to make them more
perspicuous.
Fixes: 170987
Fix test outputting to test dir (#171255)
The test introduced in #171118 has `llc` inadvertently producing an
output into the same dir as the test file itself. Most build bots don't
clean up the local git repo, which is assumed to not be written by build
+ test, and patch on top (for build performance reasons), which means
the produced output from the aforementioned PR is treated as a test from
here onwards, by all bots. Since it's missing `RUN` lines, we get
errors, for example
https://lab.llvm.org/buildbot/#/builders/108/builds/20674
This patch fixes the `llc` line and also removes the `.s`. This avoids
all bot maintainers go restart their bots. Then, the cleanup is removed
in #171256.
[LLVM/CodeGen] Use the correct address space when building structor tables. (#171247)
No in-tree target exercises this, but it's needed for CHERI, and I
believe its correctness is verifiable by inspection.
Co-authored-by: Alex Richardson <alexrichardson at google.com>
[RISCV] Don't unroll vectorized loops with vector operands (#171089)
We have disabled unrolling for vectorized loops in #151525 but this
PR only checked the instruction type.
For some loops, there is no instruction with vector type but they
are still vector operations (just like the memset zero test in the
precommit test).
Here we check the operands as well to cover these cases.
[OpenCL] Add missing Intel extensions to OpenCLExtensions.def (#169875)
Add following extensions:
cl_intel_bfloat16_conversion
cl_intel_subgroup_buffer_prefetch
cl_intel_subgroup_local_block_io
cl_intel_subgroups_char
cl_intel_subgroups_long
This allows targets to expose these extensions via
getSupportedOpenCLOpts and ensures macros are defined when enabled.
[AArch64] Fix missing register definitions in homogeneous epilog lowering (#171118)
The lowering for HOM_Epilog did not transfer explicit register defs from
the pseudo-instruction to the generated helper calls. MachineVerifier
would complain if a following tail call uses one of the restored CSRs.
This scenario occurs in code generated by the Swift compiler, where X20
is used to pass swiftself.
This patch fixes the issue by adding the missing defs back to the helper
call as implicit defs.