[clang][FlowSensitive] Do a quick check and bail early for massive CFGs (#186808)
Bail out early if the visiting each reachable basic block once would
have exceeded the MaxBlockVisits limit. If that is the case, then
actually visiting and doing the dataflow analysis would hit the limit,
but we would have wasted a lot of time.
Another possibility is that we run out of memory (OOM) and the process
crashes. We've seen example of CFGs with # of blocks that are 2-8x the
visit limit. Those examples also have lots of `Locs`, which we track in
MapVectors for each BB. Since the maps do not share memory across BBs,
this leads to non-linear memory usage and OOMing before hitting the max
visit limit. With this, we can avoid OOMing, and at least get some
results for the other CFGs in the TU, instead of losing all results from
the process crashing.
Merge tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix cache_request leak in cache_release()
- Fix heap overflow in the NFSv4.0 LOCK replay cache
- Hold net reference for the lifetime of /proc/fs/nfs/exports fd
- Defer sub-object cleanup in export "put" callbacks
* tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix heap overflow in NFSv4.0 LOCK replay cache
sunrpc: fix cache_request leak in cache_release
NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd
NFSD: Defer sub-object cleanup in export put callbacks
[flang] [flang-rt] Subscript overrun could occur in namelists during a READ command. (#176959)
NOTE: This is a new pull request, as the prior didn't have labels
properly applied.
If a bad subscript is provided in a namelisted record, the
HandleSubscripts() routine can read off into infinity. This patch
ensures that a read will not go beyond the rank of the expected
variable.
The failure will then be captured in the return status (IOSTAT) of the
READ.
The small test demonstrates the failure before and after the fix.
---------
Co-authored-by: Kevin Wyatt <kwyatt at hpe.com>
[clang] Backport: use canonical arguments for checking function template constraints
Backport from #186889
This is a partial revert of #161671, restoring the original behaviour
where the canonical template arguments are used for function template
constraint checking in diagnostics.
This reverts the fix from #183010, which attempted to fix #182344
but it causes regressions. These regressions now have test cases
included.
The attempt at #183010 is flawed because in the general case we can't
check satisfaction for constraints which have unsubstituted template
arguments, even if they don't affect the canonical type (ie they are
purely
syntactical), because these types can still turn out to be invalid after
substitution.
[20 lines not shown]
[lld][ELF] Fix crash when relaxation pass encounters synthetic sections
In LoongArch and RISC-V, the relaxation pass iterates over input sections
within executable output sections. When a linker script places a synthetic
section (e.g., .got) into such an output section, the linker would crash
because synthetic sections do not have the relaxAux field initialized.
The relaxAux data structure is only allocated for non-synthetic sections
in initSymbolAnchors. This patch adds the necessary null checks in the
relaxation loops (relaxOnce and finalizeRelax) to skip sections that
do not require relaxation.
A null check is also added to elf::initSymbolAnchors to ensure the
subsequent sorting of anchors is safe.
Fixes: #184757
Reviewers: MaskRay
[3 lines not shown]
[libc++] Fix iostream size ABI break (#185839)
In #124103 we changed the size of various iostream objects, which turns
out to be ABI breaking when compiling non-PIE code.
This ABI break is safe to fix, since for any programs allocating more
memory for the iostream objects, the remaining bytes are simply unused
now.
Fixes #185724
(cherry picked from commit c1d26c3c25106be2bc5b2b5a440faa5b93488de5)
[AArch64] Ensure FPR128 callee-save stack offsets are aligned (#184314)
This was benign for Linux targets (as when dividing by the scale the
offset would be correctly truncated), so only resulted in failures with
`-DLLVM_ENABLE_ASSERTIONS=On`. On Windows, this was a miscompile as the
lack of alignment would result in the FPR128 callee-save getting
assigned to the same offset as the previous GPR.
Fixes: #183708
(cherry picked from commit 327f1adef8df6afc07f6c88cfa380c97399af3dc)
[lldb][bytecode] Compile pick ops using unsigned literal (#187376)
The `pick` op requires an unsigned integer index. Use the `u` suffix
when generating `pick` operations in the Python->formatter-bytecode
compiler.
[clang][CUDA] Define _NV_RSQRT_SPECIFIER for glibc-2.42/cuda-13.2 compatibility (#185701)
CUDA-13.2 defines _NV_RSQRT_SPECIFIER to make its headers compileable
with glibc 2.42+. However, clang does not include the header that
defines the macro, and has to define it by itself.
(cherry picked from commit ab048ac6c0339c631ea4a1064b675318867a3853)
[flang] Use integer arith.max/min operations for max/min lowering. (#186466)
arith.maxsi/maxui/minsi/minui are more concise than cmp+select
and probably allow more folding, so we should use it in Flang lowering.
[clang] fix crash related to missing source locations for converted template arguments
This adds a way to attach source locations to trivially created template
arguments such as packs, or converted expressions when there is no
expression anymore.
This also avoids crashes due to missing source locations.
In a few places where this matters, we already create expressions
from the converted arguments, but this requires access to Sema,
where currently creating trivial typelocs only requires access to
to the ASTContext.
So this creates a new storage kind for TemplateArgumentLocs, where
a single SourceLocation is stored, embedded in the pointer where
possible.
As a drive-by, strenghten asserts by enforcing the TemplateArgumentLocs
are created with the right kinds of locations.
[2 lines not shown]
[MLIR][XeVM] Update HandleVectorExtractPattern (#186247)
isExtractContiguousSlice:
- Check if mask size is not greater than the vector size of the operand.
- Check if mask values do not exceed vector size.
HandleVectorExtractPattern:
- Narrow the scope of matching to,
- Source shuffle doing contiguous extract
- Source shuffle with at least the same mask size.
[CIR] Fix CFG flattening for loops with cleanup in special regions (#187369)
If a loop required a cleanup scope in the condition or step region of
the loop, we crashed during CFG flattening because the flattening of the
cleanup scope created multiple blocks in the region, but we were
assuming there would only be one block.
This change updates the CFG flattening code to look for the
cir.condition or cir.yield operation in the last block of the region.
[MLIR][XeVM] Add truncf and mma_mx op. (#180055)
truncf op converts 16 bit floats to 8 bit or 4 bit floats.
mma_mx op does cooperative matrix multiply accumulate on
8 or 4 bit float type with 8bit scale value.
[VPlan] Fix masked_cond expansion.
masked_cond is used to combine early-exit conditions with masks from
predicate. The early-exit condition should only be evaluated if the mask
is true. Emit the mask first, to avoid incorrect poison propagation.
Fixes https://github.com/llvm/llvm-project/issues/187061.
[MLIR][Python] Add optional emit reset to exportSMTLIB (#187366)
Previously, the MLIR's python binding `smt.export_smtlib(...)` always
emit `(reset)` to the end of smtlib string as a solver terminator.
This PR added an option to suppress this trailing, as downstream users
like python z3 module don't need it.
[RISCV] Fix IDiv/IRem scheduling data for RV32 cores that use the SiFive7 model (#187331)
The integer division and remainder instructions on a 32-bit core that
uses SiFive7 scheduling model should have the same latency and
throughput as its word counterparts on a 64-bit SiFive7 core.
This patch fixes those scheduling entries by adding a new SchedPred that
predicates on `Feature64Bit` to toggle the SchedVariant that is attached
on the affected integer division / remainder instructions.