[AMDGPU][docs] Remove abandoned augementation-related changes (#204420)
These haven't been carried forward in the DWARF committee proposal, and
we don't expect them to standardized (at least in the form presented
here). Drop them to avoid confusion.
Change-Id: I60dd6ffb5df1bb63d132733466ecf3d697f79276
[VPlan] Narrow interleave groups with distinct live-in operands. (#203778)
Extend narrowInterleaveGroups so bundles with live-ins can be narrowed
by using BuildVector for the operands.
This only applies to fixed VFs: for scalable VFs the number of original
iterations processed by the narrowed plan depends on vscale, so a fixed
per-field vector cannot be built.
Another missing piece for
https://github.com/llvm/llvm-project/issues/128062
On a large IR corpus based on C/C++ workloads (32k modules), this
triggers in ~38 modules.
PR: https://github.com/llvm/llvm-project/pull/203778
[analyzer] Bring unix.cstring.UninitializedRead checker out of alpha (#196292)
There have been recent improvements (#186802) and fixes (#191061)
related to this checker. The reports are no longer noisy, as evaluated
on 14 OS projects.
---------
Co-authored-by: Donát Nagy <donat.nagy at ericsson.com>
[RISCV][P-ext] Fold (PSRL/PSRA (concat (trunc (PSRL X, C1)), (trunc (PSRL Y, C1))), C2). (#204659)
into (concat (trunc (PSRL/PSRA X, C1+C2)), (trunc (PSRL/PSRA Y,
C1+C2))). If C1 is equal to the number of bits discarded by the truncate.
We recently added this for for a single truncate. This expands it to
concatenated truncates.
Assisted-by: Claude
[clang][Mach-O] Add an option to force UNWIND_*_MODE_DWARF compact unwind info (#204005)
The new option value extends: `-femit-dwarf-unwind=dwarf-only`. This is
primarily intended as a testing mechanism to ensure coverage on the
DWARF-only parts of the unwinder, where previously the compact unwinder
would have taken care of most functions.
[clang-format] Fix crash on malformed operator input (#199098)
fixes the remaining clang-format crash case after #199100 landed.
The problematic input is:
```cpp
{ operator } a
```
When annotating operator, clang-format should stop scanning at } instead
of consuming it and disturbing brace scope tracking. And adds a no-crash
regression test for it.
[AMDGPU] Introduce WMMACoexecutionHazards target feature (#204654)
gfx1250, gfx1251 and gfx12-5-generic have this feature, but gfx1310
does not have it.
[clang-format] Fix annotation of alternative operator and (#199112)
I now annotate`and` as TT_BinaryOperator before the pointer/reference
heuristic. I left `bitand` alone since, like `&`, it can still be a
reference.
Fixes #199027.
[mlir][OpenMP] Translate reductions on taskloop (#199670)
This patch adds LLVM IR translation for `reduction` and `in_reduction`
clauses on `omp.taskloop.context`.
For `taskloop reduction`, the lowering emits the implicit taskgroup
reduction setup, builds the task-reduction descriptor array, and maps
each generated task to runtime-provided private reduction storage
through `__kmpc_task_reduction_get_th_data`.
For `taskloop in_reduction`, the lowering uses the same runtime lookup
path with a null descriptor, allowing the runtime to find the enclosing
task-reduction context.
Unsupported byref, cleanup-region, and two-argument initializer forms
remain diagnosed.
### Stack / review order
[18 lines not shown]
[AMDGPU] Fix 64->32 bit division corner case (#204469)
Do not implement 64-bit signed division with 32-bit division if operands
are only constrained to a 32-bit signed range.
-2147483648/-1 != -2147483648/1, but their lower 32-bits are identical.
32-bit division cannot generate the correct result for both sets of
operands. Only use 32-bit division if operands are constrained to a
31-bit signed range.
Bug appears in both AMDGPUCodeGenPrepare.cpp and AMDGPUISelLowering.cpp.
Tested in https://github.com/llvm/llvm-test-suite/pull/428.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[DWARFLinker] Fix data race on the global parallel strategy (#204642)
DWARFLinkerImpl::link() assigned the process-global
llvm::parallel::strategy on entry. dsymutil runs link() concurrently,
one call per architecture of a universal binary, so those assignments
race. An inconsistent strategy can route per-compile-unit cloning onto a
thread that is not an llvm::parallel ThreadPoolExecutor worker, where
the per-thread allocators call getThreadIndex().
This manifested itself as an assert, but otherwise returns in a
out-of-bounds.
```
Assertion failed: ((threadIndex != UINT_MAX) && "getThreadIndex() must be called from a thread created by " "ThreadPoolExecutor"), function getThreadIndex, file Parallel.h, line 51.
```
The assert is non-deterministic and needs more than one architecture to
reproduce.
[5 lines not shown]
[GlobalISel] Add `or_and_xor_to_or` pattern from SelectionDAG (#204614)
PR #201108 was merged and then reverted due to a failing test. This PR
fixes the tests that failed.
[lldb] Don't enable Objective-C in expressions on unsupported formats (#204639)
Evaluating any expression against a WebAssembly target aborted LLDB:
```
(lldb) expr (int)sizeof(Point)
LLVM ERROR: Objective-C support is unimplemented for object file format
```
WebAssembly can't JIT expressions (RuntimeDyld doesn't support the Wasm
object format, so ProcessWasm sets CanJIT to false), but it can handle
simple expressions that can be IR interpreted.
When setting up the expression's language options, LLDB speculatively
enables Objective-C, which trips up the fatal error as Objective-C code
generation only supports Mach-O, ELF, and COFF.
Add ObjCLanguageRuntime::IsSupportedForArchitecture and disable
Objective-C in the expression's language options when the target's
[2 lines not shown]