[libc][stdlib] Simplify getenv_test by using strcmp instead of custom helper (#163055)
[libc][stdlib] Simplify getenv_test by using inline_strcmp instead of custom helper
Replace the custom `my_streq` helper function with LLVM libc's
`inline_strcmp` utility from `src/string/memory_utils/inline_strcmp.h`.
Changes:
- Remove 18-line custom `my_streq` implementation
- Use `inline_strcmp` with a simple comparator lambda for string comparisons
- Replace `my_streq(..., nullptr)` checks with direct `== nullptr` comparisons
- Maintain identical test coverage while reducing code duplication
Benefits:
- Uses existing, well-tested LLVM libc infrastructure
- Clearer test assertions with standard comparison functions
- More concise code (reduced from ~48 to ~33 lines)
- Consistent with LLVM libc coding practices
[6 lines not shown]
[AArch64] Generalize CSEL a, b, cc, SUBS(SUB(x,y), 0) -> CSEL a, b, cc, SUBS(x,y) transform to peephole (#167527)
This transform should have never been done in ISel in the first place.
It should have been done in peephole, but a few cases were missing.
[AMDGPU] Lower S_ABSDIFF_I32 to VALU instructions (#167691)
Added support for lowering the scalar S_ABSDIFF_I32 instruction to
equivalent VALU operations.
[MLIR] [Python] `ir.Value` is now generic in the type of the value it holds (#166148)
This makes it similar to `mlir::TypedValue` in the MLIR C++ API and
allows users to be more specific about the values they produce or
accept.
Co-authored-by: Maksim Levental <maksim.levental at gmail.com>
[ROCDL] Added missing s.get.named.barrier.state op (gfx1250) (#167876)
This patch introduces some missing s.get.named.barrier.state
instructions in the ROCDL dialect
[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647)
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.
This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
[MLIR Attr] Allow LocationAttr to be used as an operation attribute (#167690)
Enables locations to be used as operation attributes.
In contrast to the implicit source location every operation carries
(`Operation::getLoc()`)—which may be fused or modified during
transformations—a `LocationAttr` used as an operation attribute has
explicit semantics defined by the operation itself.
For example, in our Zig-like language frontend (where types are
first-class values), we use a location attribute on struct type
operations to store the declaration location, which is part of the
type's semantic identity. Using an explicit attribute instead of
`Operation::getLoc()` ensures this semantic information is preserved
during transformations.
[Float2Int] Make sure the CFP can be represented in the integer type (#167699)
When `convertToInteger` fails, the integer result is undefined. In this
case, we cannot use it in the subsequent steps.
Close https://github.com/llvm/llvm-project/issues/167627.
[MLIR] Replace LLVM_Type in bar.warp.sync and cp.async ops with I32 (#167826)
This patch replaces generic `LLVM_Type` with specific `I32` type in NVVM
operations.
`NVVM_SyncWarpOp`: Change mask parameter from `LLVM_Type` to `I32`.
`NVVM_CpAsyncOp`: Change cpSize parameter from `Optional<LLVM_Type>` to
`Optional<I32>`.
Signed-off-by: Dharuni R Acharya <dharunira at nvidia.com>
[X86] Add widenBuildVector to create a wider build vector if the scalars are mergeable (#167667)
See if each pair of scalar operands of a build vector can be freely
merged together - typically if they've been split for some reason by
legalization.
If we can create a new build vector node with double the scalar size,
but half the element count - reducing codegen complexity and potentially
allowing further optimization.
I did look at performing this generically in DAGCombine, but we don't
have as much control over when a legal build vector can be folded -
another generic fold would be to handle this on insert_vector_elt pairs,
but again legality checks could be limiting.
Fixes #167498
[Headers][X86] Update FMA3/FMA4 scalar intrinsics to use __builtin_elementwise_fma and support constexpr (#154731)
Now that #152455 is done, we can make all the scalar fma intrinsics to
wrap __builtin_elementwise_fma, which also allows constexpr
The main difference is that FMA4 intrinsics guarantee that the upper
elements are zero, while FMA3 passes through the destination register
elements like older scalar instructions
Fixes #154555
[AArch64] Use SVE fdot for partial.reduce.fadd for NEON types. (#167856)
We only seem to use the SVE fdot for fixed-length vector types when they
are larger than 128bits, whereas we can also use them for 128bits
vectors if SVE2p1/SME2 is available.
Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)" (#167854)
Reverts llvm/llvm-project#161651 due to downstream bad codegen reports
[AArch64] Fix SVE FADDP latency on Neoverse-N3 (#167676)
This patch fixes the latency of the SVE FADDP instruction for the
Neoverse-N3 SWOG. The latency of flaoting point arith, min/max pairwise
SVE FADDP should be 3, as per the N3 SWOG.
[CodeGen] Use VirtRegOrUnit where appropriate (NFCI) (#167730)
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`,
and in functions/classes dealing with register pressure.
Static type checking revealed several bugs, mainly in MachinePipeliner.
I'm not very familiar with this pass, so I left a bunch of FIXMEs.
There is one bug in `findUseBetween()` in RegisterPressure.cpp, also
annotated with a FIXME.