clang/AMDGPU: Do not look for rocm device libs if environment is llvm (#180922)
clang/AMDGPU: Do not look for rocm device libs if environment is llvm
Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.
I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway.
[libc] Add RPC helpers for dispatching functions to the host (#179085)
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
[HLSL] Implement Sample* methods for Texture2D (#179322)
This commit implement the methods:
- SampleBias
- SampleCmp
- SampleCmpLevelZero
- SampleGrad
- SampleLevel
They are added to the Texture2D resource type. All overloads except for
those with the `status` argument.
Part of https://github.com/llvm/llvm-project/issues/175630
Assisted-by: Gemini
---------
Co-authored-by: Helena Kotas <hekotas at microsoft.com>
[ExpandIRInsts] Support saturating fptoi (#179710)
Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.
The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:
* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.
Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
[flang][NFC] Converted five tests from old lowering to new lowering (part 17) (#180869)
Tests converted from test/Lower: goto-do-body.f90, mixed_loops.f90,
while_loop.f90
From test/Lower/forall: degenerate.f90, forall-2.f90
Add function for atomic_replace
This commit adds a helper function to use openat2 and
renameat2 to atomically replace files whilst providing
symlink race resistance.
NOTE: temp_dir and target have to be on same filesystem
and neither path can contain symlinks.
qlnxe: Overhaul setting the multicast MAC filters
When operating the multicast MAC filters, the current usage of
ECORE_FILTER_ADD and ECORE_FILTER_REMOVE are rather misleading.
ECORE_FILTER_ADD reads "adding new filter", but it actually removes
any existing filters and then addes a new one. ECORE_FILTER_REMOVE
reads "removing a filter", but it actually removes all filters.
Let's use ECORE_FILTER_REPLACE and ECORE_FILTER_FLUSH instead to
avoid confusion.
In the current implementation, only one MAC address is passed to
ecore_sp_eth_filter_mcast() and any previously installed filters are
removed, hence it breaks the multicast function. That can be observed
via either assigning new IPv6 addresses to the interface or putting
the interface as a member of lagg(4) interface with LACP aggregation
protocol. Fix that by calculating the multicast filter bins directly
from multicast MAC addresses and replace the filters every time
the bins changes.
[21 lines not shown]
qlnxe: Avoid reinitializing the interface when it is already initialized
qlnx_init_locked() unconditionally uninitialize the interface thus is
actually reinitializing the interface. Well the init routine qlnx_init()
is to initialize the interface by net stack when assigned with the first
inet or inet6 address. The ioctl SIOCSIFADDR for the first inet6 address
is handled by ether_ioctl() thus the interface is reinitialized no matter
it was initialized or not.
Add a driver status check for that to avoid reinitializing. Further plan
is removing SIOCSIFADDR ioctl from the driver and let ether_ioctl() handle
it.
Reviewed by: kbowling
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D54887
(cherry picked from commit c10e6bc0f0079e90cb484323ad71d437f1882422)
(cherry picked from commit 8731ff4871d5397bae65bf184c44629a52c0e97b)
qlnxe: Refactor setting the promiscuous and allmulti mode
There are two entry points to set the promiscuous and allmulti mode.
One is ioctl, and another is the init routine. Given they share almost
the identical logic, refactor a little to make the code more clear.
While here, for the ioctl, translate the error to EINVAL to avoid
confusing the net stack.
Reviewed by: kbowling
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D54890
(cherry picked from commit 45b1718fadae7d56051ba04ef9d7a175a602a226)
(cherry picked from commit b8d2c1c367465506b66a1696483caec1d04b2ea0)
qlnxe: Let ether_ioctl() handle SIOCSIFADDR ioctl
Since the change [1], the init routine qlnx_init() works as intended.
Let ether_ioctl() handle SIOCSIFADDR to simplify the code.
Combined with the change [1], this shall be a better fix for PR 287445.
[1] c10e6bc0f007 qlnxe: Avoid reinitializing the interface when it is already initialized
PR: 287445
Reviewed by: kbowling
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D54888
(cherry picked from commit 4012b63889e40bb877bc0e4c8da1792bce472c08)
(cherry picked from commit 0f383f74b7398161c12a290e50b060baf45d2800)
qlnxe: Remove a pointless copy back from the link-layer address
On ifnet attaching, ether_ifattach() makes the link-layer address by
shadow copying the ha->primary_mac. Well, the link-layer address will
not be altered during attaching, thus it is pointless to copy it back.
No functional change intended.
Reviewed by: kbowling
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D54883
(cherry picked from commit 4ac3081b282800158df7abe93f307d76e1b5b808)
(cherry picked from commit 23ffd1650cc431e762387d384ede99ae085bc130)
qlnxe: Fix setting the unicast MAC filter of RX path
When an Ethernet interface is added to lagg(4) as a child interface, its
type, aka if_type, is changed from IFT_ETHER to IFT_IEEE8023ADLAG. Well
changing the link-layer address of the lagg(4) interface will be
propagated to all child interfaces, hence the drivers of child interfaces
shall not presume the type of the interface will not be changed.
Meanwhile, on initializing, an ifnet has been fully attached and it is
guaranteed to have non-null link-layer address so stop NULL checking for
it.
Reviewed by: kbowling
Fixes: 792226e53023 qlnxe: Allow MAC address override
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D54885
(cherry picked from commit f250852c9a0c1021c3be4b498e27cfc7b42a81db)
(cherry picked from commit 6d138e958ffb318595eec29b910cada414e2f86d)
[AArch64] Lower factor-of-2 interleaved stores to STNP (#177938)
This patch prioritizes lowering to `stnp` over `st2` store instructions
marked !nontemporal.
From performance perspective, we should conservatively prioritize STNP
lowering for non-temporal stores, because currently NT stores requires
explicit usage of `__builtin_nontemporal_store()` intrinsic, so I think
its reasonable to assume the developer explicitly intends to optimize
D-cache usage of some hot non-temporal execution. He can rollback if it
doesnt help.
The cost here is it adds a few instructions for code size (thus we
predicate when not optimizing for code size), few extra fast
instructions to execute, few extra short dep chains - should be commonly
handled by OOO execution, I-cache alignment effects, few extra
registers. In the future we can may be able to approximate a cost model
to select by.
[3 lines not shown]
clang/AMDGPU: Remove dead code in RocmInstallationDetector (#180920)
The defaulted constructor argument isn't used anywhere, so
this path is unreachable.
[lldb][windows] switch to using std::string instead of std::wstring in Python setup (#180786)
This patch changes the return type of methods returning `std:wstring` to
`std::string` in `PythonPathSetup.cpp`.
This follows lldb's style of converting to `std::wstring` at the last
moment.
[Hexagon] Fix signed constant creation in EmitVAArgFromMemory (#180385)
Use ConstantInt::getSigned instead of ConstantInt::get when creating a
negative alignment mask in EmitVAArgFromMemory. This is the same fix as
commit 8546294db95d (PR #176115) which addressed the issue in
EmitVAArgForHexagonLinux.
Added a test case that exercises the EmitVAArgFromMemory alignment path
using a struct that is both >8 bytes (to trigger EmitVAArgFromMemory)
and has 8-byte alignment (to trigger the alignment masking code).
Add atomic_write context manager as drop-in replacement for open(filename, 'w')
Adds a reusable utility to middlewared.utils.io that writes to a temporary
file and atomically renames it to the target path on successful exit.
If an exception occurs, the temp file is cleaned up and the original
file remains unchanged.
Refactors truenas-grub.py to use the new atomic_write utility.
https://claude.ai/code/session_01TxV6JvrChiY9jG5GmLZndU