[SLP] Add test to demonstrate %stride = %x * %const issue (#194735)
For a runtime strided loads/stores, the current approach doesn't recognize cases where the stride is the result of a multiplication.
[VPlan] Strip WidenStore handling in narrowToSingleScalars (#184765)
Although the codepath seems to be reached in a few cases, it doesn't
seem to be making any changes. The utility of the transform is in
question.
Fix metadirective implicit-nothing candidate ordering
Preserve whether a metadirective variant was explicitly
specified so selection can distinguish explicit nothing
from an omitted directive variant. Order explicit candidates
before implicit nothing candidates when invoking the OpenMP
context scorer, matching the metadirective tie-break rule.
Add standalone and begin/end metadirective regression tests
where an implicit nothing candidate appears before an
otherwise-tied explicit directive variant.
Reference:
OpenMP 5.0 [2.3.4] says that if multiple when clauses have
compatible context selectors with the same highest score, and
at least one of them specifies a directive variant, "the first
directive variant specified in the lexical order of those when
clauses" replaces the metadirective.
Turn lease delete into a one to one operation, requiring one IP and optionally one type. Since multiple parameters are required for IPv6 leases we cannot really batch or mix IP address families anymore.
Fix trait-property mapping and improve metadirective tests
- In processTraitProperties, restrict the device_isa___ANY fallback to
only isa selectors. Unknown properties under arch, kind, or vendor
now produce an invalid trait so the variant does not match. Previously,
device={arch("neon")} would incorrectly match via ISA target-feature
checking.
- Add metadirective-nothing tests for OpenMP version >= 5.1.
- Add explicit -triple flags to ISA tests so AArch64 features run
under an aarch64 triple and x86 features under an x86_64 triple.
- Split device={arch()} tests into metadirective-device-arch.f90
- Add omp.terminator checks for begin/end metadirective match cases.
- Remove begin-metadirective.f90 TODO test (now supported).
Assisted with copilot
[AArch64][GlobalISel] Tighten up some legal types (#194785)
This tightens up some of the legal types from scalar any types to the
correct
integer or floating point types. Some are still not changed, like trunc
and
zext/sext. Type independant operations like loads, stores, vector
operations,
selects etc all still correctly use scalar any types.
[flang] Avoid exponential traversal in deep type extensions (#191955)
`HasDestruction()` and `IsFinalizable()` walked component iterators that
already descend into parent scopes, and then also recursed through
derived-type components. With deep type extension chains, that caused
the same inheritance structure to be traversed repeatedly and compile
time to grow exponentially.
Iterate only over the current type scope instead. The scope contains the
type's own components plus its parent component, so the existing
recursion through derived-type components still handles inheritance
without double traversal.
Add a regression test with a deep type extension hierarchy.
Co-authored-by: Sairudra More <moresair at pe31.hpc.amslabs.hpecorp.net>
__HAVE_PMAP_PHYSSEG an old-68k-pmap construct, so put it and associated
declarations into <m68k/pmap_motorola.h>, and remove it from all of the
m68k vmparam.h's.
Centralize the definitions of MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ,
an MAXSSIZ across all m68k platforms. Notable callouts:
- default values for 68010 come directly from sun2, but will be suitable
for any additional 68010 systems that may appear in the future.
- Sun3 MMU dictates more conservative limits than the rest of the 68020+
crowd.
- Amiga is odd-one-out and keeps the previous values (it has an odd-ball
USRSTACK, too).
[clangd] Resolve __builtin_offsetof designator components precisely (#194407)
Building on the new TraverseOffsetOfNode hook in RecursiveASTVisitor and
the OffsetOfNode DynTypedNode kind, teach SelectionTree, FindTarget, and
the explicit-references collector to address each designator component
individually. Cursor positions inside a nested designator (for example
the 'B' in __builtin_offsetof(A, B.c)) now resolve to the corresponding
field instead of always picking the innermost component.
- SelectionTree: wrap each OffsetOfNode visit in traverseNode so it
becomes a selectable node alongside its enclosing OffsetOfExpr.
- FindTarget::allTargetDecls: resolve OffsetOfNode (Field kind) to its
FieldDecl, and drop the OffsetOfExpr fallback so non-component
selections do not guess a field target.
- ExplicitReferenceCollector: emit one ReferenceLoc per component via a
new VisitOffsetOfNode hook, replacing the manual component loop in
refInStmt.
Tests:
[12 lines not shown]
[AMDGPU][MC] update USER_SGPR_COUNT bits for GFX1250 (#192579)
When we work on the triton kernel with tensor descriptor created on the
host side, there is a error message `amdgpu_user_sgpr_count smaller than
than implied by enabled user SGPRs`.
After some debugging, we find the `USER_SGPR_COUNT` is not updated with
GFX125 and this patch updates it for USER_SGPR_COUNT from
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc2-gfx6-gfx12-table.
On GFX125, COMPUTE_PGM_RSRC2::USER_SGPR_COUNT is 6 bits wide. The MC
helper S_00B84C_USER_SGPR only masks to 5 bits; when the true user SGPR
count is 32 or more, the masked value wraps (e.g. 32 -> 0).
`AMDGPUAsmPrinter` then emits a .amdhsa_user_sgpr_count with 0, that
disagrees with the implied count from enabled user SGPRs (including
kernarg preload), and finally assembling llc output with `llvm-mc` fails
in `AMDGPUAsmParser`
---------
Co-authored-by: Shilei Tian <i at tianshilei.me>
[clang][Fuchsia] Factor getFuchsiaDefines out of FuchsiaTargetInfo class (#194775)
Most of the template class's getOSDefines definition is not
template-dependent, so move it to a shared subroutine that's
outside the header file and reused by all the FuchsiaTargetInfo
instantiations.
[mlir][xevm] Fix greedy rewriter crash in HandleVectorExtractPattern matches shuffles on block arguments (#192213)
`HandleVectorExtractPattern` could report `success()` without rewriting
the IR when `llvm.shufflevector` extracted a contiguous slice from a
**block argument** (no defining op). The greedy rewriter’s expensive
checks then aborted with *“pattern returned success but IR did not
change”*.
The pattern only performs work when the shuffle’s operand is defined by
another op (`FPExt`, `FPTrunc`, `bitcast`, nested `shufflevector`, or
`load`). For operands like function arguments, `getDefiningOp()` is
null, so nothing is rewritten; the function still fell through to
`return success()` without changing the IR and would crash when
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` is on. `mlir-opt
--convert-xevm-to-llvm --split-input-file
mlir/test/Conversion/XeVMToLLVM/xevm_mx-to-llvm.mlir` no longer hits the
fatal error.
Assisted-by: Cursor (Composer 2)
Fix memory corruption bugs in BSM record parsing
fetch_newgroups_tok(3): clamp group count to AUDIT_MAX_GROUPS before the
loop to prevent a stack buffer overflow when a crafted record specifies
more than 16 groups.
fetch_execarg_tok(3), fetch_execenv_tok(3): add a bounds check at the
top of the string-walking loop to prevent an out-of-bounds read when the
previous string's nul byte is the last byte of the record buffer.
fetch_sock_unix_tok(3): clamp the memchr search length to the number of
bytes remaining in the buffer to prevent an out-of-bounds read on short
tokens. Also clamp slen to sizeof(path) to prevent a one-byte overflow
when no nul byte is found within the path data.
fetch_socket_tok: fix copy-paste error where the remote address was
written into l_addr instead of r_addr.
Previously reported by: @haginara
[13 lines not shown]
mac_seeotheruids: allow specificgid to be a list of groups
The specificgid functionality has historically allowed only a single
group to be exempt, but in practice one might want a few services to
be exempt for reasons. From a security perspective, we probably don't
want to encourage unrelated users to be grouped together solely for
this purpose, as that creates one point of shared access that could be
used for nefarious purposes.
Normalize the group list as we do cr_groups to allow for linear matching
rather than quadratic, we just need to account for the differences in
FreeBSD 15.0+ where cr_groups is entirely supplementary groups vs.
earlier versions, where cr_groups[0] is the egid and the rest is
sorted.
Reviewed by: csjp, des (earlier version)
Sponsored by: Klara, Inc.
(cherry picked from commit b675ff8eedc9ac93cdf1cfe33185b7a1a027df37)
mac_seeotheruids: allow specificgid to be a list of groups
The specificgid functionality has historically allowed only a single
group to be exempt, but in practice one might want a few services to
be exempt for reasons. From a security perspective, we probably don't
want to encourage unrelated users to be grouped together solely for
this purpose, as that creates one point of shared access that could be
used for nefarious purposes.
Normalize the group list as we do cr_groups to allow for linear matching
rather than quadratic, we just need to account for the differences in
FreeBSD 15.0+ where cr_groups is entirely supplementary groups vs.
earlier versions, where cr_groups[0] is the egid and the rest is
sorted.
Reviewed by: csjp, des (earlier version)
Sponsored by: Klara, Inc.
(cherry picked from commit b675ff8eedc9ac93cdf1cfe33185b7a1a027df37)
Fix memory corruption bugs in BSM record parsing
fetch_newgroups_tok(3): clamp group count to AUDIT_MAX_GROUPS before the
loop to prevent a stack buffer overflow when a crafted record specifies
more than 16 groups.
fetch_execarg_tok(3), fetch_execenv_tok(3): add a bounds check at the
top of the string-walking loop to prevent an out-of-bounds read when the
previous string's nul byte is the last byte of the record buffer.
fetch_sock_unix_tok(3): clamp the memchr search length to the number of
bytes remaining in the buffer to prevent an out-of-bounds read on short
tokens. Also clamp slen to sizeof(path) to prevent a one-byte overflow
when no nul byte is found within the path data.
fetch_socket_tok: fix copy-paste error where the remote address was
written into l_addr instead of r_addr.
Previously reported by: @haginara
[13 lines not shown]
[RISCV] Rename rvp-ext-rv32/64.ll to rvp-simd-32/64.ll. Shorten check prefixes. NFC (#194770)
The rv32/rv64 here were the length of the vector types. The
rvp-ext-rv32.ll test has rv32 and rv64 RUN lines. Rename to make this
clearer.
I want to add rv32 RUN lines to the rvp-simd-64.ll, but we need to fix
some crashes first.
cron: log when a crontab path is too long
Log via syslog when snprintf truncates the crontab path, instead of
silently skipping the entry.
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Reviewed by: bcr, kevans
Differential Revision: https://reviews.freebsd.org/D56235
[CodeGen] Use SmallMapVector for SpillPlacement::Node::Links (#194653)
Previously, `SpillPlacement::Node::Links` was implemented as a
`SmallVector` of `(Weight, BundleNo)` pairs.
This patch replaces the `SmallVector` with a `SmallMapVector<unsigned,
BlockFrequency, 4>`, which stores `(BundleNo, Weight)` pairs. This
allows for more efficient lookups and weight accumulations when multiple
links to the same bundle are added.