[FileCheck] Add a diff output option for FileCheck (#187120)
This patch adds a `--diff` flag to FileCheck to address the readability
of traditional FileCheck output which can be difficult to parse by
human, especially when dealing with multiple substitutions or large
input files with many mismatches and additional context. This feature
provides a more familiar, scannable format for developers by rendering
mismatches as diffs.
There are two diff modes, split and unified both with substitution and
no-substitution version however to make it easier for reviewer, this
only have unified with no substitution.
Functional description of PR-
`getDiffContext` -
It provides the surrounding context for the mismatch. It uses the
`SourceMgr` to find the Line using `LineNo`. Once it found the pointer,
it scans forward until it hits a newline (\n or \r) or the end of the
[67 lines not shown]
[OFFLOAD] Introduce OpenMP math wrappers for SPIRV backend (#192139)
This PR is a first step to introduce math OpenMP wrappers for SPIRV
backend.
As a first step only API that either mapped to existing SPIRV API or has
straightforward implementation is introduced.
[clang] Improve diagnostics for `__builtin_align` builtins with floating/member pointer operands (#192650)
Improve diagnostics for `__builtin_align_up`, `__builtin_align_down`,
and `__builtin_is_aligned` when the first operand has an invalid type.
Clang already emits `err_typecheck_expect_scalar_operand` for
unsupported operands, but the message is generic. This patch adds
follow-up notes to clarify three common invalid cases:
* floating point operands (“floating point types are not allowed here”)
* C++ member pointer operands (“member pointers are not allowed here”)
* plain-function-pointer ("function pointers are not allowed here")
[LoongArch] Select `V{AND,OR,XOR,NOR}I.B` for bitwise with byte splat immediates (#192217)
The `V{AND,OR,XOR,NOR}I.B` instructions operate on byte elements and
accept an 8-bit immediate. However, when the same byte splat constant is
used with wider vector element types (e.g. v8i16, v4i32, v2i64),
instruction selection currently falls back to materializing the constant
in a temporary register.
```
vrepli.b -1
vxor.v
```
even though the immediate form is available:
```
vxori.b 255
```
[11 lines not shown]
Revert "[lldb] Rally around triple rather than arch in the API tests (#191416)" (#192763)
Temoprarily reverting while we look at the TestMacCatalyst.py and
TestRosetta.py fails introduced by this PR, to unblock the CI.
This reverts commit 86397f49c7725f35a51517a8290cb4207c97771d.
[BPF] Handle aliases in CodeGenModule::EmitExternalDeclaration. Fixes #192365 (#192374)
Adds handling of global aliases in
CodeGenModule::EmitExternalDeclaration. This fixes a clang crash on some
real code, see llvm#192365.
[lldb] Store the dummy target in the selected execution context (#190496)
Store the dummy target in the selected execution context. There's no
reason for everybody to have to independently fall back to the dummy
target.
[mlir][tensor] Remove unit-stride restriction in InsertSliceOp folding (#192600)
This PR replaces manual offset/size resolution with `affine::mergeOffsetsSizesAndStrides`, simplifying the code and extending subview-of-subview folding to support non-unit strides.
[UBSan][test] Make aggregate alignment test precise for Darwin
Darwin adds an alignment check on dest, which was causing test failure.
rdar://120802910
[NewPM] Port AArch64RedundantCondBranch to the new pass manager (#190897)
Adds a newPM pass for AArch64RedundantCondBranch
- Refactors base logic into an Impl class
- Renames old pass with the "Legacy" suffix
- Adds the new pass manager pass using refactored logic
- Updated existing .mir tests to also test with the New Pass Manager.
Context and motivation in
https://llvm.org/docs/NewPassManager.html#status-of-the-new-and-legacy-pass-managers
[NVPTX] Constant fold blockDim when reqntid is specified (#191575)
Currently, NVPTX cannot fold the `ntid.x/y/z` intrinsic calls into const
values when `reqntid` is specified, which prevents the code from further
optimization.
Therefore, in this change, we extend the `NVVMIntrRange` pass to:
- Tighten `ntid.x/y/z` intrinsic calls to one value range, which can be
const folded in later InstCombine pass
- Tighten `tid.x/y/z` range attributes to use per-dimension reqntid
bounds
- When .reqntid exceeds hardware limits, garbage-in/garbage-out
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[AMDGPU][GlobalIsel] Add regbank support for cvt_scalef32_sr_pk_f6_f116/32 intrinsics (#192745)
This patch adds register bank legalization rules for
cvt_scalef32_sr_pk_f6_f116/32 intrinsics in the AMDGPU GlobalISel
pipeline.