[NFC][VPlan] Add initial tests for future VPlan-based stride MV
I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
The are two test files with the same content. One is for VPlan dump change of
the future transformation alone (I'll update `-vplan-print-after` in the next
PR), another is for the full vectorizer pipeline. The latter have two `RUN:`
lines:
* No multiversioning, so the next PR diff can show the transformation itself
* Stride multiversionin performed in LAA, so that we can compare future
VPlan-based transformation vs old behavior.
[ConstraintElim] Do not model negative nuw-only GEP offset as signed. (#203620)
decomposeGEP added the GEP's constant offset to the unsigned
decomposition using its signed value (getSExtValue()). For a GEP that
only carries nuw (without nusw/inbounds), the indices must be
interpreted as unsigned.
Alive2 Proof of mis-compile https://alive2.llvm.org/ce/z/7G8uE3
PR: https://github.com/llvm/llvm-project/pull/203620
[NFC][VPlan] Split `makeMemOpWideningDecisions` into subpasses
The idea is to have handling of strided memory operations (either from
https://github.com/llvm/llvm-project/pull/147297 or for VPlan-based
multiversioning for unit-strided accesses) done after some mandatory
processing has been performed (e.g., some types **must** be scalarized)
but before legacy CM's decision to widen (gather/scatter) or scalarize
has been committed.
And in longer term, we can uplift all other memory widening decision to
be done here directly at VPlan level. I expect this structure would also
be beneficial for that.
[clang-cl] Add cl compiler build deterministic options for compatibility. (#194779)
Added the following options to clang-cl:
* `/experimental:deterministic`
The original CL's option enables emitting of warnings on usage of
non-deterministic macros `__DATE__`, `__TIME__` and `__TIMESTAMP__` and
provides few additional operations to help produce a deterministic
output:
- sets .obj COFF header timestamp (offset 4) to a hash based on a path
to the source file.
- removes a host name from the hash gen for the anon namespace and
lambdas.
- zeroed PE timestamps, when passed to the linker.
- sets PDB Guid field to `{FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}`.
- sets PDB Signature field to `1`.
Currently `clang-cl` does not use a hostname to generate the symbols
[23 lines not shown]
[mlir][memref] Use access interfaces in address extraction (#198421)
Rework extract-address-computation patterns to use
IndexedAccessOpInterface for direct memref accesses and
VectorTransferOpInterface update hooks for transfer ops.
These rewrites are limited to operations that declare in-bounds indices
(so no vector.load currently) for now so that we always create valid
`memref.subview` opp.
As a consequence of this PR, the Memref dialect no longer depends on the
GPU and NVGPU dialects.
Note: if you used this pass downstream, you may need to start registering the external interface implementations for IndexedAccessOpInterface on the GPU, NVGPU, etc. dialects.
AI: Codex wrote the first draft, I simplified it a bunch and made sure
the names of internal functions made sense.
---------
Co-authored-by: Codex <codex at openai.com>
[NFC][LLVM] Drop redundant verifier checks for masked load/stores (#204359)
All current verifier checks for masked load/store intrinsics are
redundant as they are covered by the generic intrinsic signature
verification. Drop them.
Rename verifier `intrinsic-bad-arg-type.ll` lit test to
`masked-load-store.ll` and extend it to cover cases corresponding to the
dropped checks to verify that generic intrinsic signature verification
will still flag them.
AMDGPU/Tests: Remove redundant explicit data layouts from AMDGPU tests (#203447)
These all look like either cargo culting of outdated requirements or
test cases that were not fully reduced. Since the data layout evolves
over time with new address spaces being added, it seems good practice to
avoid hard-coding it in tests that don't need it.
[NFC][LLVM] Tighten overload types for `@llvm.get.active.lane.mask` (#204356)
Change return type to `llvm_any_vector_int_ty` and the 2 argument types
to `llvm_any_scalar_int_ty`.
[AArch64] Fix swapped operands in tryFoldCselToFMaxMin (#203230)
These swapped operands will treat nan the wrong way, make sure we only
use the matching direction when converting to fminnm/fmaxnm.
[RISCV][P-ext] Replace v4i8/v2i16 mul and v4i8 mulh patterns on RV32 with custom lowering. (#204382)
Instead of emitting 2 instructions from an isel pattern, custom lower to
PWMUL(S)(U)+SRL+TRUNC.
This only slightly reduces the number of isel patterns today, but we
will need the WMUL patterns for intrinsics. Exposing the SRL+TRUNC to
DAG combine may allow additional optimizations.
Assisted-by: Claude Sonnet 4.6
[CtxProf] emit fatal usage error when flatten-prethinlink runs without guid metadata (#194383)
`ctx-prof-flatten-prethinlink` can call `AssignGUIDPass::getGUID()` on
defined functions even when the GUID metadata is missing. In that case,
LLVM currently asserts on missing `!guid` metadata.
Replace the assertion with `reportFatalUsageError()` so the pass fails
with a clear user-facing error, and add a regression test for invoking
only `ctx-prof-flatten-prethinlink`.
Fixes #194185
[InstCombine] Register manually created assumes in the AssumptionCache. (#204416)
Instructions inserted via Result->insertInto() bypassing the IRBuilder
inserter that would otherwise register new @llvm.assume calls in the
AssumptionCache.
Register the assume when inserting such a Result, mirroring what the
IRBuilder inserter does.
PR: https://github.com/llvm/llvm-project/pull/204416
[lldb][test] Don't export all symbols when building Wasm test inferiors (#204426)
WASI.rules linked test inferiors with -Wl,--export-all and the
reactor-style -Wl,--no-entry. I cargo-culted them and neither flag is
needed as our test inferiors all have a main and crt1 provided _start.
The --export-all flag was also harmful: for a no-argument main(),
wasi-libc renames the user's main to __original_main and emits a small
`main` trampoline. The --export-all flag caused that trampoline to an
exported `main` symbol, so `break main` resolved to two locations.
Drop both flags but keep --allow-undefined for inferiors that reference
runtime-provided symbols.
[ProfileData] Add traits for on-disk function offset hash table (NFC) (#202110)
This patch introduces serialization helper classes (traits) for the
on-disk chained hash table that will be used to index function offsets
in the SecFuncOffsetTable section:
- FuncOffsetHashTableWriterInfo (in SampleProfWriter.h) for writing.
- FuncOffsetHashTableInfo (in SampleProfReader.h) for reading.
These traits map a 64-bit function name GUID to a 32-bit byte offset
pointing into the SecLBRProfile section. This index structure is
intended to replace the flat layout of the SecFuncOffsetTable section
in the upcoming v104 format. This will allow the compiler to query the
offset of a function sample without having to parse the entire
SecFuncOffsetTable section at startup.
While these two trait classes share identical boilerplate for type
definitions and key comparison, they are kept separate to maintain
clean interface separation between the reader and writer headers. We
[9 lines not shown]
[LifetimeSafety] Propagate loans through the comma operator (#204379)
VisitBinaryOperator had no comma case, so a comma expression carried
none of its right operand's loans and a borrow used via a comma result
(e.g. `g = (f(), p)`) was silently dropped. Flow the RHS's origin into
the result.
Assisted-by: Claude Opus 4.8
[libc++] Create issues in batch in libcxx/utils/conformance (#204428)
Also improve the output as a drive-by. Creating issues in batch makes it
easier to automate the creation of these issues.
Assisted by Claude, tweaked and reviewed by hand
[libc] Implement _SC_ARG_MAX, _SC_OPEN_MAX, and _SC_PHYS_PAGES (#204364)
This commit adds the sysinfo syscall wrapper and implements the
_SC_ARG_MAX, _SC_OPEN_MAX, and _SC_PHYS_PAGES options in sysconf.
- Added sysinfo inline syscall wrapper.
- Implemented _SC_ARG_MAX, _SC_OPEN_MAX, and _SC_PHYS_PAGES.
- Added integration tests.
[LifetimeSafety] Support C Language in LifetimeSafety (#203270)
There are a few constraints that make supporting C a bit cumbersome:
* C assignment expressions are rvalues, unlike C++ assignment
expressions. The analysis has to account for the different origin shape
of the assignment result by stripping an origin from `LHSExpr`.
* Function addresses in C do not need lifetime tracking. Taking `&f`
should not create origins because functions do not have local object
lifetime (unlike in C++).
* GNU C permits `void*` subscripting/pointer arithmetic. Expressions
like `bytes[0]` (where `bytes` is `void*`) have type `void` and do not
produce an addressable object with origins, even though `void*` itself
can carry pointer origins.
* Some C subscripts, such as vector subscripts, are not GLValues, so
they do not have storage origins to track.
* `va_arg(ap, array_type)` is undefined behavior, so we skip it instead
of trying to model origins for it.
* C does not have a spelling for `[[gsl::Owner]]` / `[[gsl::Pointer]]`,
[21 lines not shown]