[CIR] Fix reference alignment to use pointee type (#186667)
getNaturalTypeAlignment on a reference type returned pointer alignment
instead of pointee alignment. Pass the pointee type with
forPointeeType=true to match traditional codegen's
getNaturalPointeeTypeAlignment behavior. Fix applies to both argument
and return type attribute construction paths.
InstCombine: Fold out nanless canonicalize pattern
Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.
The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
[17 lines not shown]
[CIR] Fix reference alignment to use pointee type
getNaturalTypeAlignment on a reference type returned pointer alignment
instead of pointee alignment. Pass the pointee type with
forPointeeType=true to match traditional codegen's
getNaturalPointeeTypeAlignment behavior. Fix applies to both argument
and return type attribute construction paths.
[OpenMP][flang] Fix crash in host offload
Guard `getGridValue` in `OMPIRBuilder` to avoid reaching the
`unreachable` in `getGridValue` when offloading to host device without
an explicit num_threads clause.
Reproducer (`-fopenmp -fopenmp-targets=x86_64-unknown-linux-gnu`):
```
program test
implicit none
!$omp target
!$omp end target
end program test
```
(Note: the linker still fails, but that's another issue.)
[clang][AST] Preserve qualifiers in getFullyQualifiedType for AutoType (#187717)
A previous change (86c4e96) did not preserve qualifiers attached to the
AutoType QualType when the type was deduced.
For an AutoType after `getDeducedType()`, qualifiers from the original
QualType were dropped. Preserve and reapply them to the deduced type.
[mlir][SPIRV] Add alignment calculation to support `PhysicalStorageBuffer` with vector types (#187698)
This allows to lower `memref.load`/`store` operations on
`PhysicalStorageBuffer`-typed resources with the underlying type being a
vector type. This improves support for the `PhysicalStorageBuffer`
capability in pipelines that use the Vector dialect for distribution.
Signed-off-by: Artem Gindinson <gindinson at roofline.ai>
[clang-tidy] Speed up `bugprone-suspicious-semicolon` (#187558)
```txt
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
Status quo: 0.4743 (100.0%) 0.3802 (100.0%) 0.8546 (100.0%) 0.8567 (100.0%) bugprone-suspicious-semicolon
With this change: 0.0103 (100.0%) 0.0027 (100.0%) 0.0130 (100.0%) 0.0133 (100.0%) bugprone-suspicious-semicolon
```
Continuing the trend of registering one `anyOf` matcher being slower
than registering each of its matchers separately (see #178829 for a
previous example).
(This PR also changes the traversal mode, but I only saw a small speedup
from that. Most of it came from registering the matchers separately.)
This check wasn't super expensive to begin with, but the speedup is
still pretty nice.
[NFC][clang] Remove dead code in HandleCXXModuleDirective (#187737)
Remove the dead code in `Preprocessor::HandleCXXModuleDirective`.
Signed-off-by: yronglin <yronglin777 at gmail.com>
[NVPTX] Print param space sub-qualifiers where supported (#187350)
Print param space sub-qualifiers (`param::entry` and `param::func`) for
PTX 8.3+, as described in the [PTX ISA
docs](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parameter-state-space).
This requires threading the `MCSubtargetInfo` through the inst printer,
which is done by setting `PassSubtarget = 1` on the asm writer.
Emitting the full space avoids the need for ptxas to infer it, improving
readability and more importantly preventing potential bugs if valid LLVM
IR transformations were to move a load from ADDRESS_SPACE_ENTRY_PARAM
into a device function.
AMDGPU/GlobalISel: RegBankLegalize rules for pops_exiting_wave_id (#187778)
Merge rule with groupstaticsize, also change to use fast uniform rule
since both of these intrinsics are uniform with no inputs.
[AMDGPU][GlobalISel][NFC] Change mbcnt test to use new-reg-bank-select (#187772)
The amdgcn_mbcnt_lo and amdgcn_mbcnt_hi intrinsics already have
RegBankLegalize rules but the test was not converted to use
new-reg-bank-select yet.
[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190)
llvm-nm is covered by extra_deps in runtime build when
LLVM_INCLUDE_TESTS is true.
[libc][docs][NFC] Restructure Getting Started guide and update Build Concepts. (#187701)
Restructured the Getting Started guide into a numbered step-by-step path
for easier readability. Added a Hello World verification step to confirm
build integrity after build completion.
Additionally, updated build_concepts.rst and the Getting Started guide
to clarify that Overlay Mode is intended for augmenting the system's C
library rather than incremental adoption.
[lldb] Support arm64e Objective-C signing in the expression evaluator (#187765)
When targeting arm64e, ISA pointers, class_ro_t pointers, and interface
selectors are signed in Objective-C. This PR adds support for that in
the expression evaluator.
[clang][AST] Fix assertion in `getFullyQualifiedType` for AutoType (#186105)
getFullyQualifiedType() asserts "Unhandled type node" when the input
QualType is an AutoType.
This was exposed by clang-repl's value printer:
```
clang-repl> namespace N { struct D {}; }
clang-repl> auto x = N::D(); x // asserts
```
Strip AutoType early before the type-specific handling.
(cherry picked from commit 86c4e96856a645a4015adf0e4d1a779e5662c6ca)