[CIR] Include union tail pad in getTypeSizeInBits (#198361)
Padded CIR unions (e.g. libstdc++ `std::string` SSO layout) carry a
trailing byte-array member so the record matches the AST layout size.
`RecordType::getTypeSizeInBits` was returning only the largest-aligned
member and ignored that tail, so the CIR view of the union was 8 bytes
smaller than what `LowerToLLVM` emits. Parent structs then picked up
a spurious trailing pad via `insertPadding`, arrays of those structs
used the wrong stride, and heap allocations could be overrun (Eigen's
`array_of_string` hits this directly).
The fix adds the padding member's size when the union is marked
`padded`, so struct size, GEP strides, and `new T[n]` allocation sizes
match OGCG. Regression test models the SSO-shaped record and checks
the 96-byte `new` for three elements.
[OpenCL] Add Intel subgroup buffer prefetch and local block I/O builtins (#199258)
Add cl_intel_subgroup_buffer_prefetch and
cl_intel_subgroup_local_block_io
declarations to OpenCLBuiltins.td and cover them with header-free SPIR
tests.
This keeps the generated OpenCL builtins in sync with opencl-c.h for the
Intel subgroup buffer prefetch and local block I/O extensions.
Per the cl_intel_subgroup_local_block_io specification, the _ui local
aliases (intel_sub_group_block_read_ui*, intel_sub_group_block_write_ui*
with __local pointer) are declared under
FuncExtIntelSubgroupLocalBlockIO
alone, without a char/short/long prerequisite. A dedicated test
(intel-subgroup-local-block-io-ui-without-char-short-long.cl) verifies
that
they resolve when only cl_intel_subgroup_local_block_io is active.
[6 lines not shown]
[OpenCL] Fix image2d_t qualifier for intel_sub_group_block_write_ui (#199232)
The intel_sub_group_block_write_ui[2,4,8] overloads for image2d_t were
declared with a read_only qualifier, both in opencl-c.h and in
OpenCLBuiltins.td. A write operation cannot target a read_only image,
and
the base intel_sub_group_block_write together with the analogous _us,
_uc
and _ul aliases all correctly use write_only image2d_t.
Per the cl_intel_subgroups_short [1], cl_intel_subgroups_char [2] and
cl_intel_subgroups_long [3] specifications, the _ui aliases are added
"for
naming consistency [...] There is no change to the description or
behavior
of these functions" relative to the cl_intel_subgroups base, which uses
write_only image2d_t for writes.
The typo was introduced in b833bf6ae14f and preserved across all
[18 lines not shown]
ZTS: update sanity.run file
Several of the tests included in the sanity.run file are no
longer quick. In fact, the pyzfs tests can take over 5 minutes
to run which exceeds the allowed default timeout resulting the
the testing being killed.
Perform a little housekeeping and drop any test which takes more
than 10 seconds to run. This brings things back a little closer
to the original intent of having a battery of useful test cases
which can be run in ~10 minutes.
ZFS-CI-Type: quick
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #18576
[offload] Use device memory for the multithreaded kernel lanuch test (#199132)
This commit modifies the multithreaded kernel launch test to use device
memory instead of managed memory. The test is reported to be failing
intermittently in systems where concurrent managed memory access is
not supported. This is the case for NVIDIA devices that do not support
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS.
The concept of concurrent and coherent managed memory access should
be exposed to liboffload users somehow, e.g., adding it as device property,
so it is clear what execution patterns are allowed with managed memory.
However, this test is just testing concurrent kernel launches. This commit
fixes it until we decide how to proceed with the guarantees on that type of
allocations.
[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. (#142599)
Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if
* zext (C + A)<nsw> >=s 0 and
* A >=s V.
For now this is limited to cases where the first operand is a constant,
so the SExt can be folded to a new constant. This can be relaxed in the
future.
The initial version checks for non-negative manually to limit compile-time,
supporting only A = smax(C2, ..) where C2 >= abs(C)
Alive2 proof of the general pattern and the test changes in zext-nuw.ll
(times out in the online instance but verifies locally)
https://alive2.llvm.org/ce/z/_BtyGy
PR: github.com/llvm/llvm-project/pull/142599
[clang-doc][nfc] Silence tidy warning about anonymous namespace (#198071)
clang-tidy complains that we should prefer static over the anonymous
namespace, despite the API being static in addition to being in the
anonymous namespace. We can silence the diagnostic by simply removing
the namespace declaration.
[MLIR] Fix mlir-doc build, add missing "-dialect nvgpu" (#199279)
Was broken with
> when more than 1 dialect is present, one must be selected via
'-dialect'
unit/test_zap: a trivial ZAP unit test suite
This commit adds the bones of a unit test suite for the ZAP subsystem.
The actual tests themselves don't do much, just ZAP creation and
destruction and basic KV ops. At this point its intended to be enough to
demonstrate what tests under this framework would look like.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18564
unit: dnode/dbuf/dmu_tx mocks
Some simple initial mock for key DMU structures. It's hard to say this
early how generalisable these are, however they are enough for the ZAP
unit tests (next commit).
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18564
unit: a unit testing framework
This commit establishes a unit test framework for OpenZFS, and
integrates it into the build.
It includes:
- the "munit" unit test framework (munit.c, munit.h)
- some light extensions to munit and glue for OpenZFS (unit.c, unit.h)
- make targets for running tests and generating coverage reports
- a document explaining the what, how and why
This is a first step; I expect we will extend all of this as we use it
more places and gain experience with it.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18564
Merge tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"Another batch of driver fixes from Johan fixing error handling paths,
plus another from Felix. We also have a new device ID added in the DT
bindings for SpacemiT K3"
* tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dt-bindings: fsl-qspi: support SpacemiT K3
spi: ti-qspi: fix use-after-free after DMA setup failure
spi: sprd: fix error pointer deref after DMA setup failure
spi: qup: fix error pointer deref after DMA setup failure
spi: mtk-snfi: Fix resource leak in mtk_snand_read_page_cache()
spi: ep93xx: fix error pointer deref after DMA setup failure
Reapply [SimplifyCFG] Extend jump-threading to allow live local defs (#197850)
Restore "Extend jump-threading to allow live local defs" #135079. Long
compilation time with reduce.cu in hipcub/warp was partially addressed
in #195744. Compilation time for reduce.cu with this PR (after #195744)
is 6 minutes 40 seconds. Without (#195744) compilation time was several
hours.
Long compilation time in reduce.cu was only exposed by jump-threading.
In my view the primary causes were due to inlining, SROA tripling the IR
code size, and SSA updating 26K phi-nodes resulting in an O(N^2) search
for duplicates. #195744 limits phi search times.
This reverts commit a76750e6de6aba2223097dc505578556ec245d50.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
Merge tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes here, one very minor Kconfig fix and a fix for a
nasty issue with error reporting in the tps65219 driver"
* tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps65219: fix irq_data.rdev not being assigned
regulator: Kconfig: fix a typo in help
unbound: Update to 1.24.2
Merge commit 'ec5b94f552d7cb2a9d456c67e9941bcf5e3698bf'
This is purely cosmetic as we already had the functional changes.
MFC after: 1 week
CI: run full CI when a workflow YAML changes
FULL_RUN_REGEX in generate-ci-type.py covered .github/workflows/scripts/
but not the workflow YAML files, so a PR that only edited zfs-qemu.yml
got "quick" CI and never tested its own matrix change. Add the YAML
files to the list.
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18577