[AArch64] Optimize lowering of i1 vector reduction (#187912)
This pr optimizes code generation in AArch64 for comparing bitcast <N x
i1> results with 0 or -1 using eq/ne. For scenarios where only "all
zeros or all 1s" are considered, the backend no longer prioritizes
materializing packed bitmasks and using vector reduction methods like
addv/umaxv/uminv. Instead, it expands the boolean vector into a regular
integer vector and performs the comparison via a more direct scalar
comparison path. For 64-bit and 128-bit cases that can be folded down to
64-bit, a lighter instruction sequence like fmov + cmp/cmn + cset/csel
is generated.
Functionally, this brings several benefits:
- Optimized the `setcc(bitcast(vNi1), 0/-1)` and
`select_cc(bitcast(vNi1), 0/-1, ...)` modes.
- In scenarios like <8 x i1>, a zero/all-one check can be performed
directly on the entire comparison result, avoiding additional vector
[6 lines not shown]
adds `<cstdlib>` to `ProgramStack.cpp` (#194249)
Building `LLVMSupport` with libc++ causes Clang to error because
`malloc` and `free` aren't declared before they're used in
`ProgramStack.cpp`.
hwpmc: Add IBS capability control policy
Reject unsupported AMD IBS and PMU control bits before programming the
MSRs.
Initialize IBS fetch/op allow masks from CPUID feature bits and validate
user-provided IBS control values against those masks. Keep the
load-latency filter dependency on L3MissOnly, but avoid decoding fields
that are already constrained by the mask.
Apply the same reserved-bit policy to the AMD PMU raw-config path by
checking core, L3, and data fabric configs against subclass-specific
masks.
Fix the IBS CPUID feature bit definitions used by the policy.
Reviewed by: mhorne, Ali Mashtizadeh <ali at mashtizadeh.com>
Sponsored by: AMD
Signed-off-by: Andre Silva <andasilv at amd.com>
Pull Request: https://github.com/freebsd/freebsd-src/pull/2140
hwpmc: Add extra_mask sysctls per counter type
Expose kern.hwpmc.{ibs_fetch,ibs_op,amd_core,amd_l3,amd_df}_extra_mask
as RWTUN uint64s that OR into the CPUID-derived allow mask at
validation time. Default 0, so the strict policy applies unless an
administrator opts bits back in — intended for testing the wrmsr_safe
path in PR #2157.
Reviewed by: mhorne, Ali Mashtizadeh <ali at mashtizadeh.com>
Sponsored by: AMD
Signed-off-by: Andre Silva <andasilv at amd.com>
Pull Request: https://github.com/freebsd/freebsd-src/pull/2140
hwpmc_ibs: Add external error handling
Add EXTERR_CAT_HWPMC_IBS to the external error categories and replace generic
EINVAL returns in ibs_allocate_pmc() with EXTERROR() calls that provide
detailed error messages.
This will be augmented with additional cases in the near future.
Reviewed by: mhorne
Sponsored by: AMD
Signed-off-by: Andre Silva <andasilv at amd.com>
Pull Request: https://github.com/freebsd/freebsd-src/pull/2134
[SLP][NFC]Skip large mostly-trivial trees with up to one vector compute node
Track non-load/store, non-PHI, non-split vector compute nodes separately
in isTreeTinyAndNotFullyVectorizable. Allow skipping a tree when it
contains at most one such vector node and at most one load/store node,
provided the tree is large enough relative to their combined count
(VectorizableTree.size() > LimitTreeSize * (StoreLoadNodes.size() +
VectorNodes.size())).
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/194703
[Clang] Simplify target specific implementations in gpuintrin.h (#194669)
Summary:
Previously we had a three-way dance where the top-level defined an impl
then the lower level defined potentially an override and then we
stitched those together.
This simplifies that, instead just defining a macro that says whether
or not we need it. This works because the target-specific portions are
always included first, so this basically says "Do we need to make a
default version".
[WebAssembly] Remove unnecessary code after #187484 (NFC) (#194188)
This code was written under the assumption that
`fixCallUnwindMismatches` runs before `fixCatchUnwindMismatches`, but
since #187484 it's not true anymore.
[CIR] Add implicit return zero handling (#194490)
This adds code to emit a store of zero to the return value for functions
that are identified as having the implicit return zero propery (such as
main() in C programs).
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
Revert "Reland "[llvm-profgen] Add support for ETM trace decoding"" (#194695)
Reverts llvm/llvm-project#194465
Caused tools/llvm-profgen/etm-opencsd.test to fail on some systems.
*/*: Replace CONFIGURE_ARGS with MESON_ARGS
Meson based ports that uses CONFIGURE_ARGS "works" as MESON_ARGS is
appended however framework and documentation expects that MESON_ARGS
is used for Meson based ports
PR: 294808
Approved by: blanket
Mk/Uses/meson.mk: Silence warning during do-configure stage
Add setup to CONFIGURE_ARGS to silence following warning:
"WARNING: Running the setup command as `meson [options]` instead of
`meson setup [options]` is ambiguous and deprecated."
While at it add a safeguard when CONFIGURE_ARGS is used instead of
MESON_ARGS
PR: 294808
Reviewed by: desktop (arrowd), previous interation
[clang] fix crash with c-style casts involving dependent member-pointer types
A dependent member-pointer type doesn't necessarily have a class declaration.
This simplifies the check performed in a helper for diagnosing a cast which removes qualifiers,
so it doesn't rely on this assumption.
Fixes #194524
[ARM][MVE] Change MVE getSetCCResultType to always use i1 vectors (#192531)
This mirrors what SVE does on the AArch64 side, where because we have
predicate vectors we can use them fall all types, and let legalization handle
where it is not the case. This fixes a clmul expansion that was failing to lower
the resulting intrinsic.
[clang] fix crash with c-style casts involving dependent member-pointer types
A dependent member-pointer type doesn't necessarily have a class declaration.
This simplifies the check performed in a helper for diagnosing a cast which removes qualifiers,
so it doesn't rely on this assumption.
Fixes #194524
textproc/treemd: update to 0.5.11
[0.5.11] - 2026-04-28
Fixed
Toggle details no-op after section navigation - In interactive mode, pressing Enter on certain <details> blocks reported "✓ Toggled details" but produced no visible change. InteractiveState::element_states is keyed only by ElementId { block_idx, sub_idx }, so a previous section's Table state at a given block_idx silently blocked a fresh Details from initializing at the same key (the indexer used HashMap::entry().or_insert(), a no-op when present). toggle_details then matched no Details variant and silently failed. Indexer now overwrites stale wrong-variant entries while preserving same-section toggle state. Regression test added.
--filter and --level ignored in --tree mode - CLI now honors both flags when rendering the tree output (c3c3fcd)
--at-line not wired up; -s mismatched formatted headings - --at-line resolves to the enclosing heading; section selection (-s) now matches headings that contain inline formatting (36c4e60)
Changed
Upgraded all dependencies to latest - Refreshed clap_complete 4.6.2 → 4.6.3, mermaid-rs-renderer 0.2.1 → 0.2.2, turbovault-parser 1.4.0 → 1.4.1, turbovault-core 1.4.0 → 1.4.1, open 5.3.3 → 5.3.4, plus transitive refreshes (plist, wasm-bindgen, tokio, libc, js-sys, cc, etc.)
Tests
Added end-to-end CLI integration suite covering --tree, --list, --filter, --level, --at-line, and -s (471d9d5)
Added coverage for JSON output builder and config loading (ef250da)
Added coverage for document tree/search and palette command matching (f185c4b)
[3 lines not shown]
[HIP][MacOS] Mach-O support and Darwin toolchain fixes (#183991)
This PR adds support for HIP on macOS: Mach-O section naming, Darwin
host toolchain initialization guards, and HIPSPV behavior when Darwin is
the host.
This has been verified using chipStar on MacOS via the PoCL OpenCL
implementation.
## Uninitialized target workaround
Darwin’s toolchain is only initialized when its own TranslateArgs runs.
For HIP/CUDA device jobs, Darwin is used as the HostTC and never gets
its args translated, so its target stays uninitialized. The new checks
avoid asserting on that uninitialized state. A better long-term fix is
to initialize Darwin earlier (see the FIXME in Driver.cpp
BuildJobsForAction).
- [ ] Initialize Darwin toolchain during construction instead of lazily
in TranslateArgs. See Driver.cpp BuildJobsForAction FIXME.
[2 lines not shown]