[MLIR][NVVM] Add sqrt Ops (#197422)
Adds two NVVM dialect ops covering all 14 floating-point `sqrt` forms:
- `nvvm.sqrt` -- IEEE-compliant sqrt with explicit rounding mode
(`sqrt.<RM>[.ftz].{f32,f64}`), 12 forms.
- `nvvm.sqrt.approx` -- fast approximate sqrt (`sqrt.approx[.ftz].f32`),
2 forms; uses the `NVVM_F32UnaryApproxOp` base class.
The two ops are split because the rounded forms require an explicit rounding mode and support both f32 and f64, while the approx forms have no rounding mode and are f32-only.
[flang-rt][test] Fix write01.f90 missing LD_LIBRARY_PATH (introduced in #187662)
The test binary was run without setting LD_LIBRARY_PATH, causing
libflang_rt.runtime.so to not be found at runtime. Match the pattern
used by exec.f90 and ctofortran.f90.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
[libc][math] Fix pow() subnormal base exponent computation (#198134)
For subnormal inputs, get_exponent() returns -1023. The code subtracted
64 after normalizing but didn't recompute e_x from the normalized value.
This set e_x to -1087 for every subnormal.
To fix, compute e_x from the normalized value.
powf() doesn't have this bug because it adds
`x_u >> FloatBits::FRACTION_LEN` to ex, where x_u is `x_u =
FloatBits(x).uintval();` with `x` being the normalized value. Added
subnormal base tests for powf to show that it works fine as-is.
Fixes #197212.
[llvm-debuginfo-analyzer] Fix missed 'else' (LVCodeViewReader / LVDWARFReader) (#192923)
Issues found PVS studio static analyzer.
LVCodeViewReader.cpp. PR for #170117.
4. Potent UB: manipulation of invalid object.
The PVS-Studio warning: V519 The 'FeaturesValue' variable is assigned
values twice successively.
The original patch caused fail tests for linker error and it was reverted.
This PR include original (#188578) + necessary edit in CMakeList.
Committed on behalf of @Seraphimt
[CUF] Fix CompilerGeneratedNamesConversion renaming managed companion globals
CUFAddConstructor creates a companion pointer global (e.g. foo.managed.ptr)
for each non-allocatable managed variable. When CompilerGeneratedNamesConversion
ran after CUFAddConstructor, it replaced the dots with 'X',
so CUFOpConversionLate could no longer find the companion by name and fell back
to CUFGetDeviceAddress with the wrong host pointer, causing cudaErrorInvalidSymbol.
Fix: mark the companion global with a cuf.managed_ptr unit attribute in
CUFAddConstructor and skip it in CompilerGeneratedNamesConversionPass.
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[BPF] Remove getMaxNumArgs() from BPFTargetTransformInfo (#198223)
The function getMaxNumArgs() hardcoded the maximum number of function
arguments to 5. LLVM now supports more than 5 arguments with stack
argument support. Remove this leftover.
[mlir][linalg] Add splat transpose canonicalization patterns (#195991)
All elements in a dense splat are identical, transposing it only changes
the shape, but still maintaining the value. Add a pattern where it would
replace the `linalg.transpose` of a splat constant with a
`arith.constant` of the transposed result shape.
Assisted-by: Cursor (GPT-5.5)
[Clang] Instantiate ParmVarDecls on-demand for FunctionParmPackExpr (#196919)
This is missed when we implemented CWG2369, where their instantiations
should be built in place when they are needed.
Fixes #173086
[MemoryBuiltins] Consistently infer and use MallocFamily
MallocFamily (the enum and StringRef) are used alongside AllocFnsTy.
The latter is picked up from the tables while the former is encoded in
the IR. While they should be merged at some point (see TODO), this
commit makes sure we consistently initialize the MallocFamily String and
pass it to users.
[Instrumentor] Add call instrumentation support
We can now instrument call instructions and extract information about
the arguments, (de)allocation, intrinsic kind, etc.
[DA] Delete early return in accumulateCoefficientsGCD (NFCI) (#197935)
This patch resolved one TODO comment in `accumulateCoefficientsGCD`
regarding an early return. I think this early return doesn't change the
final result because:
- The presence/absence of this early return can only affect whether
`CurLoopCoeff` is set.
- Regardless the value of `CurLoopCoeff`, if `RunningGCD` equals 1, the
result of caller side while loop doesn't change.
Deleting this early return is somewhat beneficial, because it allows us
to merge `analyzeCoefficientsForGCD` into this function.
[MemoryBuiltins][NFC] Allow users to retrieve detailed (de)allocation info
There are some helpers to inspect a value or call but not all
information about the (de)allocation are made available outside of
MemoryBuiltins.cpp. The two new functions allow users a more in-depth
view of (de)allocations through a single API. To help with this, we now
read the alloc_align attribute to provide better alignment information
to users. alloc-family is used as well. Two new helpers provide argument
numbers, rather than values.
[MemoryBuiltins] Consistently infer and use MallocFamily
MallocFamily (the enum and StringRef) are used alongside AllocFnsTy.
The latter is picked up from the tables while the former is encoded in
the IR. While they should be merged at some point (see TODO), this
commit makes sure we consistently initialize the MallocFamily String and
pass it to users.
[MemoryBuiltins][NFC] Allow users to retrieve detailed (de)allocation info
There are some helpers to inspect a value or call but not all
information about the (de)allocation are made available outside of
MemoryBuiltins.cpp. The two new functions allow users a more in-depth
view of (de)allocations through a single API. To help with this, we now
read the alloc_align attribute to provide better alignment information
to users. alloc-family is used as well. Two new helpers provide argument
numbers, rather than values.
[MemoryBuiltins] Consistently infer and use MallocFamily
MallocFamily (the enum and StringRef) are used alongside AllocFnsTy.
The latter is picked up from the tables while the former is encoded in
the IR. While they should be merged at some point (see TODO), this
commit makes sure we consistently initialize the MallocFamily String and
pass it to users.
[MemoryBuiltins][NFC] Allow users to retrieve detailed (de)allocation info
There are some helpers to inspect a value or call but not all
information about the (de)allocation are made available outside of
MemoryBuiltins.cpp. The two new functions allow users a more in-depth
view of (de)allocations through a single API. To help with this, we now
read the alloc_align attribute to provide better alignment information
to users. alloc-family is used as well. Two new helpers provide argument
numbers, rather than values.
[clang-tidy] Fix crash in misc-static-initialization-cycle (#198155)
This commit fixes `misc-static-initialization-cycle` crashing on `catch
(...)`.
Catch-all handlers have no exception declaration, so traversal of
`CXXCatchStmt` can call `TraverseDecl(nullptr)`. The check previously
passed that null pointer to `DeclContext::containsDecl`. This commit
fixes the problem by adding a null guard.
Closes #198150
[compiler-rt] Fix StackDepot benchmark thread barrier (#197633)
Use Param.Threads (number of worker threads) as barrier threshold
instead of Param.UniqueThreads (boolean that controls input generation).
This also silences
[-Wbool-integral-comparison](https://github.com/llvm/llvm-project/pull/194180)
warning I'm working on.