[llvm-pdbutil] Add DXContainer support to `llvm-pdbutil dump` (#200485)
This patch adds `--dxcontainer` option that attempts to parse a
`DXContainer` from stream 5 data (generated by DirectX tools) of a PDB
file, and if successful, dumps the basic info about it. If `DXContainer`
wasn't parsed, shows that it is not present in the file.
[mlir][MemRefToLLVM] fix incorrect `nuw` on `GEP/mul` when lowering `memref.load/store` with negative strides (#204309)
`MemRefToLLVM` was unconditionally emitting `getelementptr inbounds|nuw`
(and consequently `mul overflow<nsw,nuw>` on every intermediate index
computation inside `getStridedElementPtr`) for all `memref.load` and
`memref.store` lowerings.
This is _unsound_ when any stride is negative or dynamic.
`getStridedElementPtr` propagates `GEPNoWrapFlags::nuw` to
`IntegerOverflowFlags::nuw` on every intermediate `llvm.mul` and
`llvm.add` it emits. With a negative stride (e.g. `-1`, which is
`2^64-1` unsigned), an access like index=5 produces `mul nuw 5,
(2^64-1)`, which unsigned-overflows and yields poison per LangRef —
regardless of whether the final offset happens to be non-negative.
This issue came up in the discussion in PR #202118. Thanks to
@banach-space for the detailed discussion.
This PR hopefully concludes the path to fix the regression related to
[6 lines not shown]
AMDGPU: Use module flags to control xnack and sramecc
This ensures these ABI details are encoded in the IR module
rather than depending on external state from command-line flags.
Previously, these were encoded as function-level subtarget features.
The code object output was a single target ID directive implied
by the global subtarget. The backend would previously check if a
function's subtarget feature mismatched the global subtarget. This
is avoided by making xnack and sramecc module-level properties from
the start. This also provides proper linker compatibility
enforcement, moving the error point earlier.
The old encoding was also an abuse of the subtarget feature system.
Subtarget features are a bitvector, and later features in the string
can override earlier ones. The old handling added a special case
where explicit settings were preserved: ordinarily +feature,-feature
should result in the feature being disabled, but +xnack,-xnack would
preserve the explicit "-xnack" state, which differs from the absence
of any xnack setting.
[25 lines not shown]
[clang][bytecode][NFC] Mark results as non-empty when taking a value (#204568)
This was missing and all the EvaluationResults always ended up being
empty even though their APValue was set. Since the assert(!empty()) was
missing from `takeAPValue()`, nobody noticed though.
NAS-141435 / 27.0.0-BETA.1 / Fix public API method docstrings (RST, doc references, JSON-RPC examples) (#19157)
## Summary
Fix docutils errors and poor RST rendering on several API docs pages,
and document conventions for public method docstrings in CLAUDE.md.
This PR does NOT represent a sweep of every public method docstring for
formatting improvements. Its primary aim is to fix all docutils errors,
format JSON embedded in descriptions, and fix other improper RST
formatting. Other improvements like RST admonitions are limited in scope
to these methods.
## Motivation
Public method docstrings are rendered into the API reference
documentation. They need to be written with the generated docs and
general API consumers in mind rather than developer comfort. We should
move away from mixing styles: Google/NumPy-style
[47 lines not shown]
databases/mongodb70: re-enable python3.12
The patch was disabled temporarily because it was in the way of the recent upgrades.
NB: a test build was already running when this PR came in.
- Remove jobs_unsafe from a flavour. The build timed out with this.
- no portrevision bump as no pkg content changes.
PR: 296127
jimtcl: bump BUILDLINK_API_DEPENDS to 0.83nb1.
The current devel/openocd (0.12.0) fails to compile with jimtcl < 0.83,
but since the BUILDLINK_API_DEPENDS was 0.80nb1, you would not notice
the failure unless you had an older version of jimtcl already installed.
So, jimtcl is now 0.83nb1, BUILDLINK_API_DEPENDS set to match, and openocd
bumped to 0.12.0nb1 to force the correct dependency to be recorded.
graphics/mesa-{dri,libs}: Fix vaapi for AMD
VA bits end up being compiled into the libgallium.so giant library, which is
shipped by mesa-libs. This means that we should make libva an unconditional
build dep for mesa-libs. In turn this makes no sense to disable VA in mesa-dri.
Luckily, libva is a pretty thin dep.
Reported by: flo
[libc++] Use public os_sync API instead of private __ulock on newer Apple platforms (#202519)
The atomic wait and wake implementation on Apple platforms currently
relies on `__ulock_wait` and `__ulock_wake`, which are private kernel
APIs. This is a problem for anyone shipping apps through the App Store
since Apple flags private symbol usage during review.
Starting with macOS 14.4 and iOS 17.4, Apple ships public replacements
through `os_sync_wait_on_address` and `os_sync_wake_by_address_any/all`
in `<os/os_sync_wait_on_address.h>`. These cover the same functionality
and are documented, stable, and safe for App Store submissions.
This patch makes use of the public APIs instead of the private ones
whenever the underlying OS permits it.
This takes over #182947.
Fixes #182908
Fixes #146142
Co-authored-by: Bbn08 <atrancendentbeing at gmail.com>
[libc] Include linux headers to get ioctl macros (#204555)
Linux has many existing ioctls and keeps adding them, so a
hand-maintained list would always be out of date. Additionally, some
ioctls have architecture specific numbers (some in a very subtle way --
by having the number depend on the size of a structure).
asm/ioctls.h and linux/sockios.h are pretty clean, and are already
included by glibc, so we can just do the same to get the latest
definitions.
sysutil/u-boot-rpi*: zap CONFIG_ENV_FAT_DEVICE_AND_PART
This is no longer needed with modern U-Boot and it's inaccurate for the
modern RPi. Leave the config var around for now as a hint in case
someone cares, but empty it out to avoid breaking things.
PR: 268630
Approved by: uboot (manu)
[AMDGPU] Mark all instructions in WWM region as convergent
Mark instructions between ENTER_STRICT_WWM and EXIT_STRICT_WWM as
convergent, so they don't get moved out of the whole wave mode region
(see the licm-wwm.mir test). This doesn't automagically fix all our
woes, since things can still be moved out of the region before we even
run si-wqm, but there are rumours about moving WWM formation earlier
anyway.
This is not a substitute for proper WWM support - in particular, this
would inhibit most optimizations inside WWM regions with complex control
flow. Right now most WWM is relatively limited in size and complexity,
so I think this is acceptable until we get a more principled solution.
I haven't thought too much about whether or not we need this for WQM as
well.
Assisted by: Claude Sonnet
commit-id:9204c7e2
[AMDGPU][doc] Refactor Barrier Execution Model
Remove everything that has to do with named barriers and put it in a series of model extensions specific to /sbarrier/named-barriers.
I had to change a few things to make it fit, in summary:
Base Model:
* Stylistic changes that make it easier to refer to specific rules. Each rule is in a rubric instead of a bullet point.
* (-) No longer defines `barrier-mutually-exclusive`
* (-) No longer defines barrier `join` and any associated rule.
New named barrier extensions
* Define "named barrier" as a sub-type of barrier objects. This makes barrier-mutually-exclusive redundant.
* Define barrier join as an op that can exclusively be done on `named barrier objects`.
* Define rules relating to join and its ordering with other barrier operations
Following these changes, the target tables changed a bit as well.
[2 lines not shown]
[libomp] Parse OMP_DEFAULT_DEVICE with new device trait parser
... but do not yet expose the new functionalities to the user. This is a
backward compatible update that is going to be followed by the step to
the OpenMP 6.0 semantics as defined in 4.3.8.
AMDGPU: Add subtarget feature for controllable xnack modes
This replaces the previously removed xnack-any-only feature,
with the inversion xnack-on-off-modes. All pre-gfx12.5 xnack
targets support the controllable mode. Ignore explicitly
set xnack settings the same way as is done for xnack requests
on other unsupported targets.
[AMDGPU] Mark all instructions in WWM region as convergent
Mark instructions between ENTER_STRICT_WWM and EXIT_STRICT_WWM as
convergent, so they don't get moved out of the whole wave mode region
(see the licm-wwm.mir test). This doesn't automagically fix all our
woes, since things can still be moved out of the region before we even
run si-wqm, but there are rumours about moving WWM formation earlier
anyway.
This is not a substitute for proper WWM support - in particular, this
would inhibit most optimizations inside WWM regions with complex control
flow. Right now most WWM is relatively limited in size and complexity,
so I think this is acceptable until we get a more principled solution.
I haven't thought too much about whether or not we need this for WQM as
well.
Assisted by: Claude Sonnet
commit-id:9204c7e2
Repurpose MIFlag::NoConvergent
The NoConvergent MIFlag allows us to mark specific instances of
convergent (as indicated by their MCID) MachineInstrs as not convergent.
Sometimes it's useful to do the opposite as well - mark certain
instances of instructions that are not normally convergent as
convergent (for instance inside WWM regions on AMDGPU).
This patch renames the NoConvergent flag to OverrideConvergence. This
can be set to communicate that if the opcode is usually convergent, then
this particular instance of it isn't, and the other way around. When
changing the opcode of an instruction, we first check if the new opcode
has the same "convergence" as the old one - if it does, then we preserve
the flag, otherwise we clear it since we can get the correct convergence
from the opcode now.
Assisted by: Claude Sonnet
commit-id:93c99000
[X86] Fix stale kill flag when folding VPMOV*2M + KMOV to VMOVMSK (#204342)
tryCompressVPMOVPattern folds VPMOV*2M + KMOV into a single VMOVMSK in
place: it changes the KMOV's opcode and repoints its source operand from
the mask k-register to the XMM source via MachineOperand::setReg().
setReg() only changes the register number and keeps the operand's other
flags, so the kill flag computed for the mask ("killed $k0") is reused
for the XMM source. When the source is still live this marks it killed,
which the machine verifier reports as a use of an undefined register.
We should instead use the kill flag from the VPMOV's source operand. The
forward scan already guarantees the source is not redefined between the
VPMOV and the KMOV, so the VPMOV's flag is correct at the relocated
read.
Found via @jlebar's X86 LLVM bug-hunt / FuzzX effort:
https://github.com/SemiAnalysisAI/FuzzX/tree/master/x86/bugs/043-compress-evex-vpmov-srcvec-clobber-kmov
cc @jlebar