[GlobalOpt] Preserve Address Space when recreating GV (#171211)
Fix for GlobalOpt pass: preserve Address Space when recreating GV
This fix prevents dropping `addrspace(1)` in the following code snippet
(see modified LIT-test) for `@llvm.compiler.used`:
Before global-opt
```
@_ZM2C = internal addrspace(1) global %struct.FakeDeviceGlobal zeroinitializer, align 8
@_ZL1C = internal addrspace(1) global %struct.FakeDeviceGlobal zeroinitializer, align 8
@llvm.compiler.used = appending addrspace(1) global [2 x ptr addrspace(4)] [ptr addrspace(4) addrspacecast (ptr addrspace(1) @_ZM2C to ptr addrspace(4)), ptr addrspace(4) addrspacecast (ptr addrspace(1) @_ZL1C to ptr addrspace(4))]
```
After global-opt
```
@_ZM2C = internal addrspace(1) global %struct.FakeDeviceGlobal zeroinitializer, align 8
@_ZL1C = internal addrspace(1) global %struct.FakeDeviceGlobal zeroinitializer, align 8
[7 lines not shown]
[AArch64] Define apple-m5/a19 CPUs. (#171187)
A19 and M5 have been released in fall 2025.
They add several features on top of M4/A18:
- MTE, CSSC, HBC
- SME2p1, SMEB16B16, SMEF16F16
- SPECRES2
This also bumps apple-latest to apple-m5.
AMDGPU/PromoteAlloca: Refactor into analysis / commit phases
This change is motivated by the overall goal of finding alternative ways
to promote allocas to VGPRs. The current solution is effectively limited
to allocas whose size matches a register class, and we can't keep adding
more register classes. We have some downstream work in this direction,
and I'm currently looking at cleaning that up to bring it upstream.
This refactor paves the way to adding a third way of promoting allocas,
on top of the existing alloca-to-vector and alloca-to-LDS. Much of the
analysis can be shared between the different promotion techniques.
Additionally, the idea behind splitting the pass into an analysis
phase and a commit phase is that it ought to allow us to more easily make
better "big picture" decision about which allocas to promote how in the
future.
commit-id:138f5985
[flang][OpenMP] Store list of expressions in InitializerT (#170923)
The INITIALIZER clause holds a stylized expression that can be
intiantiated with different types. Currently, the InitializerT class
only holds one expression, which happens to correspond to the first type
in the DECLARE_REDUCTION type list.
Change InitializerT to hold a list of expressions instead, one for each
type. Keep the lowering code unchanged by picking the first expression
from the list.
AMDGPU/PromoteAlloca: Refactor into analysis / commit phases
This change is motivated by the overall goal of finding alternative ways
to promote allocas to VGPRs. The current solution is effectively limited
to allocas whose size matches a register class, and we can't keep adding
more register classes. We have some downstream work in this direction,
and I'm currently looking at cleaning that up to bring it upstream.
This refactor paves the way to adding a third way of promoting allocas,
on top of the existing alloca-to-vector and alloca-to-LDS. Much of the
analysis can be shared between the different promotion techniques.
Additionally, the idea behind splitting the pass into an analysis
phase and a commit phase is that it ought to allow us to more easily make
better "big picture" decision about which allocas to promote how in the
future.
commit-id:138f5985
[libunwind] Make sure libunwind test dependencies are installed before running tests (#171474)
This patch adds an installation step where we install libc++ in a fake
installation tree before testing libunwind. This is necessary because
some configurations (in particular "generic-merged") require libc++ to
be installed, since the libunwind tests are actually linking libc++.so
in which libc++abi.a and libunwind.a have been merged.
Without this, we were actually failing to find `libc++.so` to link
against and then linking against whatever system library we'd find in
the provided search directories. While this happens to work in the
current CI configuration, this breaks down when updating to newer build
tools.
[lldb] convert jit-loader_rtdyld_elf.test to an API test (#170333)
This patch converts the `jit-loader_rtdyld_elf.test` test from a Shell
test to an API test.
This test is timing out in CI on Windows and the hang cannot be
reproduced at desk. Converting it to an API test would allow us to
instrument it better in order to trace the failure.
[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit (#171589)
Currently we only permit i256 CTTZ/CTLZ AVX512 lowering when the source
is loadable as GPR->FPU transition costs would outweigh the
vectorization benefit.
This patch checks for other cases where the source can avoid the GPR - a
mayFoldToVector helper checks for a bitcast originally from a vector
type, as well as constant values and the original mayFoldLoad check.
There will be other cases for the mayFoldToVector helper, but I've just
used this for CTTZ/CTLZ initially.
[AArch64] support `.arch_extension` for features that the CLI also accepts (#169999)
fixes https://github.com/llvm/llvm-project/issues/146866
The CLI and `.arch_extension` use a different list of features, and some
features that the CLI supports cannot currently be toggled using
`.arch_extension`. This PR fixes that, adding support for
`.arch_extension` for the following features:
- `dit`
- `brbe`
- `bti`
- `fcma`
- `jscvt`
- `pauth`
- `ssve`
- `wfxt`
The issue discusses that it is unfortunate that command line flag
[7 lines not shown]