[mlir][SPIRV] Add alignment calculation to support `PhysicalStorageBuffer` with vector types (#187698)
This allows to lower `memref.load`/`store` operations on
`PhysicalStorageBuffer`-typed resources with the underlying type being a
vector type. This improves support for the `PhysicalStorageBuffer`
capability in pipelines that use the Vector dialect for distribution.
Signed-off-by: Artem Gindinson <gindinson at roofline.ai>
[clang-tidy] Speed up `bugprone-suspicious-semicolon` (#187558)
```txt
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
Status quo: 0.4743 (100.0%) 0.3802 (100.0%) 0.8546 (100.0%) 0.8567 (100.0%) bugprone-suspicious-semicolon
With this change: 0.0103 (100.0%) 0.0027 (100.0%) 0.0130 (100.0%) 0.0133 (100.0%) bugprone-suspicious-semicolon
```
Continuing the trend of registering one `anyOf` matcher being slower
than registering each of its matchers separately (see #178829 for a
previous example).
(This PR also changes the traversal mode, but I only saw a small speedup
from that. Most of it came from registering the matchers separately.)
This check wasn't super expensive to begin with, but the speedup is
still pretty nice.
[NFC][clang] Remove dead code in HandleCXXModuleDirective (#187737)
Remove the dead code in `Preprocessor::HandleCXXModuleDirective`.
Signed-off-by: yronglin <yronglin777 at gmail.com>
[NVPTX] Print param space sub-qualifiers where supported (#187350)
Print param space sub-qualifiers (`param::entry` and `param::func`) for
PTX 8.3+, as described in the [PTX ISA
docs](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parameter-state-space).
This requires threading the `MCSubtargetInfo` through the inst printer,
which is done by setting `PassSubtarget = 1` on the asm writer.
Emitting the full space avoids the need for ptxas to infer it, improving
readability and more importantly preventing potential bugs if valid LLVM
IR transformations were to move a load from ADDRESS_SPACE_ENTRY_PARAM
into a device function.
AMDGPU/GlobalISel: RegBankLegalize rules for pops_exiting_wave_id (#187778)
Merge rule with groupstaticsize, also change to use fast uniform rule
since both of these intrinsics are uniform with no inputs.
[AMDGPU][GlobalISel][NFC] Change mbcnt test to use new-reg-bank-select (#187772)
The amdgcn_mbcnt_lo and amdgcn_mbcnt_hi intrinsics already have
RegBankLegalize rules but the test was not converted to use
new-reg-bank-select yet.
[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190)
llvm-nm is covered by extra_deps in runtime build when
LLVM_INCLUDE_TESTS is true.
[libc][docs][NFC] Restructure Getting Started guide and update Build Concepts. (#187701)
Restructured the Getting Started guide into a numbered step-by-step path
for easier readability. Added a Hello World verification step to confirm
build integrity after build completion.
Additionally, updated build_concepts.rst and the Getting Started guide
to clarify that Overlay Mode is intended for augmenting the system's C
library rather than incremental adoption.
[lldb] Support arm64e Objective-C signing in the expression evaluator (#187765)
When targeting arm64e, ISA pointers, class_ro_t pointers, and interface
selectors are signed in Objective-C. This PR adds support for that in
the expression evaluator.
[clang][AST] Fix assertion in `getFullyQualifiedType` for AutoType (#186105)
getFullyQualifiedType() asserts "Unhandled type node" when the input
QualType is an AutoType.
This was exposed by clang-repl's value printer:
```
clang-repl> namespace N { struct D {}; }
clang-repl> auto x = N::D(); x // asserts
```
Strip AutoType early before the type-specific handling.
(cherry picked from commit 86c4e96856a645a4015adf0e4d1a779e5662c6ca)
[MLIR][XeGPU] Enhance XeGPU lane layout to support "wrap-around" distribution (#186958)
This PR extends XeGPU lane layout to support wrap-around distribution,
enabling replication of lane-level tensor tiles across all lanes when
the tile size matches lane_data along a given dimension. Previously,
distribution required the tile size to exceed the number of lanes ×
lane_data for even partitioning.
This PR also refactors layout attribute interface functions:
computeDistributedShape() computes the distributed vector shape and is
shared by work-to-subgroup and subgroup-to-lane distribution, which
follow the same distribution rule (even or wrap-around).
computeStaticDistributedCoords() computes compile-time distributed
coordinates of sub-tiles per subgroup/lane. It is the compile-time
counterpart of computeDistributedCoords() and is used by
isCompatibleWith().
[lldb] Add mechanism for auto-loading Python scripts from pre-configured paths (#187031)
Depends on:
* https://github.com/llvm/llvm-project/pull/187229
(only second commit and onwards are relevant)
This patch implements the base infrastructure described in this [RFC re.
Moving libc++ data-formatters out of
LLDB](https://discourse.llvm.org/t/rfc-lldb-moving-libc-data-formatters-out-of-lldb/89591)
The intention is to provide vendors with a way to pre-configure a set of
paths that LLDB can automatically ingest formatter scripts from.
Three main changes:
1. Adds a CMake variable `LLDB_SAFE_AUTO_LOAD_PATHS` which is a
semi-colon separated list of paths. This is intended to be set by
vendors when building LLDB for distribution.
2. Adds a setting that only exists in asserts mode called
[28 lines not shown]
[compiler-rt] Add bitmask to fix warning (#187812)
After #186881 was merged the gcc libc bots started complaining about the
conversion from u8 to 2 bit integer being unsafe (see:
https://lab.llvm.org/buildbot/#/builders/131/builds/42788). This PR
adds a bitmask that fixes the warning.
[Runtimes] Fix Unix Makefiles race between runtimes-build and EXTRA_TARGETS (#187634)
In our downstream we have a non-runtime target depending on libclc
EXTRA_TARGET and then observe a race condition in parallel build: both
runtimes-build (full build, no lock) and libclc EXTRA_TARGET (triggered
by non-runtime target, FileLock) build concurrently, leading to corrupt
libclc library.
This exposes an limitation in ExternalProject EXTRA_TARGET design:
EXTRA_TARGETS in llvm_ExternalProject_Add only depend on
${name}-configure, not ${name}-build. This makes EXTRA_TARGETS unsafe as
dependencies of a non-runtime target..
Fix: Add a locked BUILD_COMMAND to ExternalProject_Add for Unix
Makefiles generator, using the same cmake.lock as EXTRA_TARGETS. This
serializes runtimes-build with all EXTRA_TARGETS under one lock.
With this PR, a non-runtime target can depend on a specific
EXTRA_TARGET, rather than needing to depend on the umbrella runtimes
[9 lines not shown]
[AMDGPU][SIInsertWaitcnts][NFC] SGPRInfo: Move score selection logic closer (#186518)
Selecting the score in SGPRInfo used to require an index which you would
get by calling a getSgprScoresIdx(), which is defined in a different
class.
This patch moves the score selection logic into the SGPRinfo. This makes
the interface simpler and more intuitive.
Also given that SGPRInfo contains only two scores, this patch also
replaces the score array with individual score variables.
Should be NFC.
[libc] Fix function prototypes for <threads.h> C11 header. (#187808)
Fix return types and/or function arguments of several functions:
* mtx_destroy
* tss_delete
* thrd_exit