[ELF] Add target-specific relocation scanning for x86 (#178846)
Implement scanSection/scanSectionImpl for i386 and x86-64 to
* enable devirtualization of getRelExpr calls
* eliminate abstraction overhead for PLT-to-PCRel optimization, TLS
relocations
* optimize for R_X86_64_PC32 and R_X86_64_PLT32: they consist of 95%
relocations in `lld/ELF/**/*.o` files.
* enable future optimization to remove `loc` from `getRelExpr` (only
used by X86.cpp `R_386_GOT32[X]`)
at the cost of more boilerplate.
TLS relocation handling is inlined into scanSectionImpl. Also,
- Remove getTlsGdRelaxSkip
- Replace TLS-optimization-specific expressions:
- R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_LD_TO_LE, R_RELAX_TLS_IE_TO_LE →
[16 lines not shown]
[RISCV] Remove RISCVISD::WMACC*. Match during isel. NFC (#181197)
I think we may want to be able to fold ADDD nodes independent of the MUL
in some cases. For example turning NSRAI into NSRARI.
If we fold ADDD into WMACC we would need to be able to extract it again.
Keep the nodes separate avoids this.
Code change was assisted by AI.
[IndVarSimplify] Add safety check for getTruncateExpr in genLoopLimit (#181296)
getTruncateExpr may not always return a SCEVAddRecExpr when truncating
loop bounds. Add a check to verify the result type before casting, and
bail out of the transformation if the cast would be invalid.
This prevents potential crashes from invalid casts when dealing with
complex loop bounds.
Co-authored by Michael Rowan
Resolves
[https://github.com/llvm/llvm-project/issues/153090](https://github.com/llvm/llvm-project/issues/153090)
[MLIR][XeGPU][TransformOps] set_op_layout_attr supports setting anchor layout (#172542)
Changes `transform.xegpu.set_op_layout_attr` to support xegpu anchor
layouts. By default, if `result` and `operand` bool arguments are unset,
this transform op sets the op's anchor layout, if the op supports it
(otherwise emits a silenceable failure).
In contrast to the earlier implementation, setting the operand layout
now requires setting the new `operand` argument.
nvmm(4): Extract out nvmm_x86_internal.h from nvmm_x86.h
Similar to nvmm_internal.h, extract the kernel-only bits from nvmm_x86.h
and put into a separate 'nvmm_x86_internal.h'.
nvmm(4): Enable selective CR0 write intercept in the SVM backend
Similar to the VMX backend [1], enable selective CR0 write intercept in
the SVM backend to force CR0_ET/CR0_NE to 1, and CR0_CD/CR0_NW to 0.
This addresses the severe performance issue observed in UEFI guests [2].
When booting a DragonFly installation ISO on my AMD 3700X, it previously
took 50-60 seconds from VM power-on before the kernel started loading,
and around 17 *minutes* to reach the login prompt. Even when the guest
OS was otherwise idle, the qemu process consumed 40-50% CPU.
Note that the selective CR0 write intercept is enabled only when the CPU
supports the DecodeAssists feature, as the intercept handling relies on
the decoded information provided in EXITINFO1. A diagnostic message is
printed in svm_ident() when DecodeAssists is unavailable.
Meanwhile, rename 'VMCB_CTRL_INTERCEPT_CR0_SPEC' to
'VMCB_CTRL_INTERCEPT_CR0_SEL' to better align with
'VMCB_EXITCODE_CR0_SEL_WRITE'.
[9 lines not shown]
[CIR][LoweringPrepare] Emit guard variables for static local initialization (#179828)
This implements the lowering of static local variables with the Itanium C++ ABI
guard variable pattern in LoweringPrepare. This is initial support, errorNYI covering all that hasn't been added just yet.
When a GlobalOp has the static_local attribute and a ctor region, this pass:
1. Creates a guard variable global (mangled name from AST)
2. Inserts the guard check pattern at each GetGlobalOp use site:
- Load guard byte with acquire ordering
- If zero, call __cxa_guard_acquire
- If acquire returns non-zero, inline the ctor region code
- Call __cxa_guard_release
3. Clears the static_local attribute and ctor region from the GlobalOp
Once the new design doc lands I'll add more information over there.
nvmm(4): Tweak os_atomic_load_uint() to use relaxed semantic
The original NetBSD code uses atomic_load_relaxed(), so this macro
should be "atomic_load_int()", i.e., without the "acquire" semantic.
Also, the relaxed semantic suffices for the current use cases.
[CIR][LoweringPrepare] Emit guard variables for static local initialization
This implements the lowering of static local variables with the Itanium C++ ABI
guard variable pattern in LoweringPrepare.
When a GlobalOp has the static_local attribute and a ctor region, this pass:
1. Creates a guard variable global (mangled name from AST)
2. Inserts the guard check pattern at each GetGlobalOp use site:
- Load guard byte with acquire ordering
- If zero, call __cxa_guard_acquire
- If acquire returns non-zero, inline the ctor region code
- Call __cxa_guard_release
3. Clears the static_local attribute and ctor region from the GlobalOp
[clang][bytecode] Add a path to MemberPointers (#179050)
Add a path and an `IsDerivedMember` member to our `MemberPointer` class.
Fix base-to-derived/derived-to-base casts. Add tests and unit tests,
since regular tests allow a lot and we want to check the path size
exactly.
graphics/py-pycairo: rename and update to 1.29.0
Changelog: https://pycairo.readthedocs.io/en/latest/changelog.html#v1-29-0
Notably, experimental support for free-threaded Python (3.13t, soon 3.14t)
Port and PKGNAME renamed to match Python package metadata, and to
properly build with USE_PYTHON=pep517. Test suite also now exposed.
Remove PORTSCOUT since this does not follow the even-odd version
split.