[CodeGen] Preserve big-endian trunc in concat_vectors (#190701)
A transform from `concat_vectors(trunc(scalar), undef)` to
`scalar_to_vector(scalar)` is only equivalent for little-endian targets.
On big-endian, that would put the extra upper bytes ahead of the desired
truncated bytes. This problem was seen on Rust s390x in [RHEL-147748].
[RHEL-147748]: https://redhat.atlassian.net/browse/RHEL-147748
Assisted-by: Claude Code
[AMDGPU][MLIR][NFC] moved enc computation to a dedicated method (#189339)
Tried to adapt `GlobalPrefetchOp` for projects like Triton that do not
use `memref`s but they can still use enums exposed to the AMDGPU
dialects. Therefor, they could benefit from just calling a static method
which converts a bunch of enums to a correct `i32` value expected by the
AMDGCN backend.
Also renamed `TemporalHint` to `LoadTemporalHint` because it turned out
there are temporal hints for store operations (for example,
`buffer_store`) which have slightly different enum values (e.g., `WB`
(write-back) instead of `LU`)
[mlir][OpenMP] Separate OutlinableInterface from taskloop LoopWrapper (#188068)
Separate taskloop context and loop lowering into different operations.
This allows us to have separate operations representing the outlinable
interface and the loop wrapper interface so that there is somewhere
better than the loop body to put task-local allocations:
```
omp.taskloop.context {
llvm.alloca ...
omp.taskloop {
omp.loop_nest ... {
...
}
}
omp.terminator
}
```
[11 lines not shown]
[lldb] Skip local variable declarations at start of Wasm function (#190093)
In WebAssembly, a function starts with a number of local variable
declarations, sometimes called a function header. These declarations are
*not* instructions, but they are considered to be part of the function,
meaning we can't just pretend like the function starts on the first
instruction. Instead, we treat them like a prologue, albeit one that you
cannot disassemble or set a breakpoint on.
With this PR, we now correctly disassemble the function, matching the
output of `objdump` and breakpoints resolve to the first instruction.
Fixes #189960
[XeVM] Refactor the SPIR-V generation to use SPIR-V backend API. (#189494)
Currently, we use 2 different approach to generate SPIR-V based on
compilation target. If compilation target is `assembly/isa`, an MLIR
interface `translateToISA` is used to convert an LLVM module to SPIR-V
text. For other cases (`bin/fatbin` compilation target) SPIR-V backend
API is used to generate SPIR-V binary.
SPIR-V backend API is more powerful, as it lets one pass the necessary
extensions which is a must if one is using any advanced or
vendor-specific SPIR-V features.
This PR discontinues the usage of MLIR API and consolidates to use
SPIR-V API.
It also ensures that SPIR-V generated from MLIR side is always in binary
format (for both XeVM target and SPIR-V target).
[bazel] Make nanobind link on macOS (#190687)
Previously the mlir libraries that are marked as shared didn't link on
macOS since undefined symbols error by default. This uses nanobind's
list of acceptable undefined python symbols to make these link.
[lldb] Support comparing FileSpec against Python strings (#190690)
We got a bug report where someone was iterating over the modules and
wanted to verify that the module name was empty and noticed it didn't
trigger.
```
for module in target.module_iter():
if module.file is None or module.file == "":
# Do something
```
My initial hypothesis was that we were somehow skipping modules, but
upon further investigation, it was the string comparison that was the
culprit. The reporter (reasonably) expected the `file` property to
return a string, but in reality it returns a SBFileSpec.
This could be avoided by explicitly comparing with an empty FileSpec,
but that seems needlessly tedious.
[9 lines not shown]
[mlir][OpenMP] Separate OutlinableInterface from taskloop LoopWrapper
Separate taskloop context and loop lowering into different operations.
This allows us to have separate operations representing the outlinable
interface and the loop wrapper interface so that there is somewhere
better than the loop body to put task-local allocations:
```
omp.taskloop.context {
llvm.alloca ...
omp.taskloop {
omp.loop_nest ... {
...
}
}
omp.terminator
}
```
[11 lines not shown]
[SPARC][IAS] Make 64-bit instructions available in 32-bit mode on V9 (#187534)
When the ISA level is V9, 64-bit instruction definitions should be available
even if currently it's not used by any patterns.
This should allow usage of 64-bit instructions, like `sllx`/`srlx`, in inline
assembly snippets in a source file otherwise intended to target V9 processors
running in 32-bit mode, as found in, for example, the Linux kernel.
[lldb][AIX] Enable NativeProcessAIX Manager for lldb-server (#190173)
This PR is in reference to porting LLDB on AIX. Ref discusssions: [llvm
discourse](https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640) and
https://github.com/llvm/llvm-project/issues/101657.
Complete changes together in this draft:
- https://github.com/llvm/llvm-project/pull/102601
Description:
This change enables proper AIX processes integration with lldb-server,
ensuring correct loading and handling of AIX target architectures.
It also retrieves the target process architecture from the host and
configures NativeProcessAIX accordingly.
[mlir][OpenMP] Fix taskloop outlined step handling (#190198)
The outlined taskloop preheader still used the original function's
casted step value when computing the canonical loop trip count. When
lb/ub/step were defined outside the taskloop body, the outlined function
ended up referring to an instruction from another function, which
crashed LLVM IR verification and finalization.
Reload the task step from the outlined task shareds, alongside lb and
ub, and use that value for the trip-count division. Update the MLIR
taskloop checks and add a regression for outer-scope variable bounds.
Fortran reproducer:
```
subroutine test(lb, ub, step)
integer :: i, lb, ub, step
!$omp taskloop
do i=lb,ub,step
[6 lines not shown]
[InstSimplify] Fix Compilation Hang in simplifyExtractValueInst (#190279)
Jump Threading can create self-referential insertvalues which are
allowed by the verifier in unreachable code. These self-referential
insertvalues cause the compilation to hang in simplifyExtractValueInst.
This PR adds a check to break out of the loop if it detects it is a
self-referential insertvalue and adds the reproducer's bitcode as a
test.
Fixes: https://github.com/llvm/llvm-project/issues/187381
[pdb] Store symbol names without null terminators in PublicsStreamTest (#190790)
to catch any bugs where code assumes these names are null terminated.
This would have caught (at least in ASan builds) #163755 and the bug
fixed in #190133.
[flang] Disambiguate derived component accesses in AliasAnalysis. (#189516)
This change introduces an AccessPath representation inside the
AliasAnalysis
Source object that tracks the sequence of named component accesses
and pointer/allocatable dereferences from the root variable to the
queried
memory location. The access path is built during the backward walk
in getSource and enables more precise alias analysis for Fortran derived
types.
Previously, accesses to different components of the same derived-type
variable (such as x%a and x%b) were reported as MayAlias
because the analysis could not distinguish them once they traced back
to the same origin. With the access path, the analysis can now identify
when two accesses diverge at a named component step
and return NoAlias for disjoint subobjects.
This patch does not get rid of `followingData` and `isData` completely.
Assisted by Claude.
[flang] Enable speculation of fir.convert with memref<> type. (#190413)
Such `fir.convert`s may appear after FIRToMemref conversion and it would
be good to be able to speculate them.
[llvm-mca][RISC-V] Remove duplicated use of SP from `c.addi4spn` (#189980)
`c.addi4spn` instruction implicitly uses the X2 (SP) register, but in
addition to being present in the Uses list, it is also modeled as an
explicit operand with the SP register class. This duplication causes
missed bypasses in llvm-mca when the instruction needs to read the SP
value written by a previous instruction.
For example, on a `sifive-u74` CPU, the following timeline excerpt
shows that the `c.addi4spn` is issues 2 cycles later than expected by
the GPR bypass:
```
Timeline view:
Index 012345678
[0,0] DeeE . . mv sp, a0
[0,1] . DeeE . addi a1, sp, 12
```
[5 lines not shown]
[libc] Fix return code after rewriting GPU printf support (#190797)
Summary:
This just blindly accumulated the return values without checking if they
were errors. printf returns `-1` on failure and fwrite returns the
number successfully written. Because we split these up we need to handle
that correctly.
HIPSPV: a fix for Assertion `isFilename() && "Invalid accessor."' failed (#187655)
AFAICT, this assertion failure was introduced by #181870 and #182930.
These PRs introduced linker options that got passed down to
HIPSPV::Linker which wasn't prepared for any non-file inputs.
Fixed by ignoring non-file arguments.