Merge tag 'wq-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Improve workqueue stall diagnostics: dump all busy workers (not just
running ones), show wall-clock duration of in-flight work items, and
add a sample module for reproducing stalls
- Fix POOL_BH vs WQ_BH flag namespace mismatch in pr_cont_worker_id()
- Rename pool->watchdog_ts to pool->last_progress_ts and related
functions for clarity
* tag 'wq-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Rename show_cpu_pool{s,}_hog{s,}() to reflect broadened scope
workqueue: Add stall detector sample module
workqueue: Show all busy workers in stall diagnostics
workqueue: Show in-flight work item duration in stall diagnostics
workqueue: Rename pool->watchdog_ts to pool->last_progress_ts
workqueue: Use POOL_BH instead of WQ_BH when checking pool flags
Merge tag 'cgroup-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Hide PF_EXITING tasks from cgroup.procs to avoid exposing dead tasks
that haven't been removed yet, fixing a systemd timeout issue on
PREEMPT_RT
- Call rebuild_sched_domains() directly in CPU hotplug instead of
deferring to a workqueue, fixing a race where online/offline CPUs
could briefly appear in stale sched domains
* tag 'cgroup-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Don't expose dead tasks in cgroup
cgroup/cpuset: Call rebuild_sched_domains() directly in hotplug
[mlir] Replace MLIR_ENABLE_ROCM_CONVERSIONS with LLVM_HAS_AMDGPU_TARGET (#182652)
`LLVM_HAS_NVPTX_TARGET` is already defined in `llvm/Config/Targets.h`
and used to gate NVPTX-related code in MLIR. The same macro exists for
AMDGPU as `LLVM_HAS_AMDGPU_TARGET`, but MLIR defined its own
`MLIR_ENABLE_ROCM_CONVERSIONS` variable for this purpose. This PR
removes `MLIR_ENABLE_ROCM_CONVERSIONS` and replaces it with
`LLVM_HAS_AMDGPU_TARGET`, bringing parity with the NVPTX target.
---------
Co-authored-by: William Moses <gh at wsmoses.com>
[LoopUnroll] Remove `computeUnrollCount()`'s return value (#184529)
`computeUnrollCount()`'s return value is used to communicate whether
unrolling was explicitly requested. However, each of
`computeUnrollCount()`'s two callers can compute this directly:
- `LoopUnrollAndJamPass` already checks for loop unrolling metadata
[before calling
`computeUnrollCount()`](https://github.com/llvm/llvm-project/blob/43dbcdea98f5bb04ae967bdd81ece2d2144f4661/llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp#L308).
The return value only added handling for `-unroll-count`, a testing flag
with no UnrollAndJam test coverage.
- `tryToUnrollLoop()` can use `PragmaInfo(L).ExplicitUnroll` directly at
the `setLoopAlreadyUnrolled()` call site.
- In all but one case where `computeUnrollCount()` explicitly `return`s
`false` instead of `ExplicitUnroll`, `UP.Count = 0` is set. This causes
`tryToUnrollLoop()` to early-exit before reaching
`setLoopAlreadyUnrolled`.
- The remaining case that `return`s false, but does not set `UP.Count =
[7 lines not shown]
[HLSL][DirectX] Add `transpose` HLSL intrinsic and DXIL lowering of `llvm.matrix.transpose` (#186263)
Fixes #184922
- [x] Implement `transpose` clang builtin in `Builtins.td`
- [x] Link `transpose` clang builtin with `hlsl_alias_intrinsics.h`
- [x] Add sema checks for `transpose` to `CheckHLSLBuiltinFunctionCall`
in `SemaHLSL.cpp`
- [x] Add codegen for `transpose` to `EmitHLSLBuiltinExpr` in
`CGHLSLBuiltins.cpp`
- `transpose` lowers to the `llvm.matrix.transpose` intrinsic
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/transpose.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/transpose-errors.hlsl`
- [x] Implement lowering of the `llvm.matrix.transpose` intrinsic in the
DXIL backend in `DXILIntrinsicExpansion.cpp`
- The intrinsic lowers to a shufflevector like in DXC
https://hlsl.godbolt.org/z/Gj959q6sq
[3 lines not shown]
[libc] Support ls in printf (#178841)
Add support for %ls in printf by calling internal string converter and
add relevant end-to-end sprintf test. Additionally, modified printf
parser for recognizing length modifier. This also disables wide string
support on windows
and other unsupported platforms.
Co-authored-by: shubhe25p <shubhp at mbm3a24.local>
Merge tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE()
annotations for lock-free accesses to module parameters and dsq->seq
- Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and
above) when passed through the int sched_class interface
- Documentation updates: scheduling class precedence, task ownership
state machine, example scheduler descriptions, config list cleanup
- Selftest fix for format specifier and buffer length in
file_write_long()
* tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags
[7 lines not shown]
[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)
The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.
Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.
This improves on my original change 66e8270e8.
[coro] [async] There needs to be a one-to-one corespondance between the async resume function value and the suspend intrinsic (#186436)
We need to mark both the async.resume intrinsic function and the
supsend.async function as not duplicatable. The async.resume function
models the continuation after a suspend. It is non sense to not have a
one-to-one correspondance between the two: if the suspend intrinsic is
cloned so needs to be the matching async.resume intrinsic call.
rdar://172130181https://github.com/swiftlang/swift/issues/87719
[CodeGen] Drop uses of BranchInst (#186391)
Largely a straight-forward replacement with occasional simplifcations.
For AMDGPU, I assumed that unconditional branches are always uniform and
therefore "simplified"/changed AMDGPUAnnotateUniformValues to only
annotate conditional branches.
Target-specific FastISel only selects conditional branches,
unconditional branches are already handled by the non-target-specific
code.
Remove unicode character from AttrDocs.td
PR #185225 introduced a single unicode character, which is the only
unicode character in this file. Change this to a ASCII/Latin1 letter.
update to pokerth-2.0.6, from Josh Grosse (maintainer)
small tweaks, move WANTLIB to the usual location, drop RelWithDebInfo
as DEBUG_PACKAGES sets this automatically on debug pkgs archs
[flang][openacc][cuda] Fix array section and implicit device attribute (#186513)
When CUDA Fortran is enabled, the copySymbolBinding block in
genACCHostDataOp did not handle ArrayElement designators (e.g.,
use_device(a(:,:,i))), causing a crash in getDataOperandBaseAddr
due to the copied symbol missing its IR binding.
[lldb] Fix liblldb linkage in libllvm build after 5eaf19a15129 (#186515)
Referencing libSupportHTTP under LINK_LIBS of add_lldb_library() pulls
in the archive even in a build configuration with
LLVM_LINK_LLVM_DYLIB=On, where libSupportHTTP is part of libLLVM. This
patch moves it to LINK_COMPONENTS to fix the issue.
EC2: Don't use unicode in boot loader
The boot loader menu is disabled by default in EC2, but if it is ever
turned on, the default (unicode) output breaks EC2's web interface to
the serial console.
Set loader_menu_frame="ascii" instead.
MFC after: 3 days
Sponsored by: Amazon
Update version of external mbedtls to v4
Build option does not currently work, because the security/mbedtls4
package does not enable MBEDTLS_THREADING_PTHREAD and
MBEDTLS_THREADING_C.