[AMDGPU][ASAN] Move allocas to entry block in amdgpu-sw-lower-lds pass (#190772)
The `amdgpu-sw-lower-lds` pass inserts a workitem-0 check, malloc, and
barrier before the original entry block, creating a new entry block.
This pushes the original allocas into a non-entry block, causing LLVM to
treat them as dynamic allocas.
AMDGPU backend generates incorrect flat addresses for dynamic alloca
addrspacecasts at -O0, causing memory faults when ASan is enabled with
LDS.
This PR hoists constant-size allocas to the new entry block so they
remain static.
py-packaging: updated to 26.1
26.1 - 2026-04-14
Features:
* PEP 783: add handling for Emscripten wheel tags in
* PEP 803: add handling for the ``abi3.abi3t`` free-threading tag in
* PEP 723: add ``packaging.dependency_groups`` module, based on the ``dependency-groups`` package in
* Add the ``packaging.direct_url`` module in
* Add the ``packaging.errors`` module in
* Add ``SpecifierSet.is_unsatisfiable`` using ranges (new internals that will be expanded in future versions) in
* Add ``create_compatible_tags_selector`` to select compatible tags in
* Add a ``key`` argument to ``SpecifierSet.filter()`` in
* Support ``&`` and ``|`` for ``Marker``'s in
* Normalize ``Version.__replace__`` and add ``Version.from_parts`` in
* Add an option to validate compressed tag set sort order in ``parse_wheel_filename`` in
Behavior adaptations:
[70 lines not shown]
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.