[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards
WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
[AMDGPU] Set WMMA source-operand reuse bits in SIPreEmitPeephole
gfx1250 WMMA instructions can set matrix_a_reuse / matrix_b_reuse bits
that keep the A or B source operand in a high-temporality state in the
VALU source-operand cache, so a later WMMA reusing the same registers
hits in the cache instead of re-reading the register file.
Add a late, post-RA peephole in the existing pre-emit peephole pass that
scans each basic block and, for every WMMA, sets the A/B reuse bit when
one of the next few WMMAs reuses the same physical registers as its A or B
operand and those registers are not redefined in between.
Stale sticky entries in the cache are cleared when a register is used in
an instruction without a reuse bit being set. Therefore, the final WMMA
use of the same source should not set the bit.
Merge tag 'hardening-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- lkdtm:
- Add case to provoke a crash in EFI runtime services (Ard Biesheuvel)
- add PPC_RADIX_TLBIEL test and missed isync (Sayali Patil)
- stddef: Document designated initializer semantics for
__TRAILING_OVERLAP() (Gustavo A. R. Silva)
- strarray: drop redundant allocation, add __counted_by_ptr (Thorsten
Blum)
* tag 'hardening-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/powerpc: add PPC_RADIX_TLBIEL test for radix MCE validation
lkdtm/powerpc: add isync after slbmte to enforce SLB update ordering
lkdtm: Add case to provoke a crash in EFI runtime services
lib/string_helpers: annotate struct strarray with __counted_by_ptr
[3 lines not shown]
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
- Drop the last architecture-specific implementation of MD5
- Mark clmul32() as noinline_for_stack to improve codegen in some cases
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: gf128hash: mark clmul32() as noinline_for_stack
lib/crypto: powerpc/md5: Drop powerpc optimized MD5 code
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Accelerate CRC64-NVME for 32-bit ARM by refactoring the arm64 NEON
intrinsics implementation to be shared by 32-bit and 64-bit.
Also apply a similar cleanup to the 32-bit ARM NEON implementation of
xor_gen(), where it now reuses code from the 64-bit implementation"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: aegis128 - Use neon-intrinsics.h on ARM too
lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
lib/crc: Turn NEON intrinsics crc64 implementation into common code
xor/arm64: Use shared NEON intrinsics implementation from 32-bit ARM
xor/arm: Replace vectorized implementation with arm64's intrinsics
ARM: Add a neon-intrinsics.h header like on arm64
Merge tag 'v7.2-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Drop support for off-CPU cryptography in af_alg
- Document that af_alg is *always* slower
- Document the deprecation of af_alg
- Remove zero-copy support from skcipher and aead in af_alg
- Cap AEAD AD length to 0x80000000 in af_alg
- Free default RNG on module exit
Algorithms:
- Fix vli multiplication carry overflow in ecc
- Drop unused cipher_null crypto_alg
- Remove unused variants of drbg
- Use lib/crypto in drbg
- Use memcpy_from/to_sglist in authencesn
- Allow authenc(hmac(sha{256,384}),cts(cbc(aes))) in FIPS mode
- Disallow RSA PKCS#1 SHA-1 sig algs in FIPS mode
[41 lines not shown]
Merge tag 'slab-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Support for "allocation tokens" (currently available in Clang 22+)
for smarter partitioning of kmalloc caches based on the allocated
object type, which can be enabled instead of the "random"
per-caller-address-hash partitioning.
It should be able to deterministically separate types containing a
pointer from those that do not (Marco Elver)
- Improvements and simplification of the kmem_cache_alloc_bulk() and
mempool_alloc_bulk() API. This includes adaptation of callers
(Christoph Hellwig)
- Performance improvements and cleanups related mostly to sheaves
refill (Hao Li, Shengming Hu, Vlastimil Babka)
[21 lines not shown]
Merge tag 'docs-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux
Pull documentation updates from Jonathan Corbet:
"Things have calmed down a bit on the docs front, with no earthshaking
changes this time around:
- Ongoing work on the Japanese and Portuguese translations
- Better integration of the MAINTAINERS file into the rendered
documents, including a search interface
- A seemingly infinite supply of fixes for typos, minor grammatical
issues, and related problems that LLMs find with abandon"
* tag 'docs-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (93 commits)
docs: pt_BR: Translate 3.Early-stage.rst into Portuguese
docs: pt_BR: update "Purpose of Defconfigs" section in maintainer-soc.rst
Documentation: bug-hunting.rst: fix grammar
docs/ja_JP: translate submitting-patches.rst (interleaved-replies)
[17 lines not shown]
Merge tag 'fbdev-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
"Beside the removal of the Hercules monochrome ISA graphics driver and
the corresponding text console driver, there is just the typical
maintanance with smaller driver fixes and cleanups:
Removal of drivers:
- Hercules monochrome ISA graphics adapter driver (Ethan Nelson-Moore)
- Hercules mdacon console driver (Ethan Nelson-Moore)
Changes affecting many drivers at once:
- possible memory leak fixes in various drivers (Abdun Nihaal)
- many conversions to use strscpy() (David Laight)
- Use named initializers in drivers (Uwe Kleine-König)
Code fixes:
- fbcon: don't suspend/resume when vc is graphics mode (Lu Yao)
- modedb: fix a possible UAF in fb_find_mode() (Tuo Li)
[44 lines not shown]
[AMDGPU] Update f8f6f4-wmma hazard tests regarding matrix format, NFC (#204037)
Need to map the matrix format suffix to the register size correctly in
the MIR tests. For example, 'F4' format needs v8i32 register class.
Merge tag 'mmc-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Validate host's max_segs to fail gracefully
MMC host:
- davinci:
- Avoid potential NULL dereference in the IRQ handler
- Call mmc_add_host() in the correct order during probe
- dw_mmc-exynos:
- Increase DMA threshold for exynos7870
- renesas_sdhi:
- Add support for RZ/G2E, RZ/G2N and R-Car M3Le variants
- sdhci-msm:
- Add support for Hawi, Eliza and Shikra variants
- sdhci-of-k1:
- Add support for SD UHS-I modes
- Add support for tuning for eMMC HS200 and SD UHS-I"
[23 lines not shown]
Merge tag 'hwmon-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers for the following chips:
- Analog Devices LTC4283 Swap Controller
- Analog Devices MAX20830
- Analog Devices MAX20860A
- ARCTIC Fan Controller
- Delta E50SN12051
- Luxshare LX1308
- Microchip EMC1812/13/14/15/33
- Monolithic MP2985
- Murata D1U74T PSU
New chip support added to existing drivers:
- asus-ec-sensors: Support for ROG MAXIMUS Z790 EXTREME, ROG STRIX
B850-E GAMING WIFI, and ROG STRIX B650E-E GAMING WIFI
- dell-smm: Add Dell Latitude 7530 to fan control whitelist
- nct6683: Support for ASRock Z890 Pro-A
[71 lines not shown]
Merge tag 'watchdog-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull watchdog updates and fixes from Guenter Roeck:
"Subsystem:
- Unregister PM notifier on watchdog unregister
- Various documentation fixes and improvements
Removed drivers:
- Remove AMD Elan SC520 processor watchdog driver
- Drop SMARC-sAM67 support
- Remove driver for integrated WDT of ZFx86 486-based SoC
New drivers:
- Driver for Andes ATCWDT200
- Driver for Gunyah Watchdog
Added support to existing drivers:
- Add "apple,t8103-wdt" and "apple,t8122-wdt" compatibles to Apple
watchdog driver
[48 lines not shown]
Merge tag 'spi-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"This has been quite a busy release, mainly due to the subsystem wide
work Johan Hovold has done to modernise resource allocation for the
subsystem on probe, the subsystem did some very clever allocation
management pre devm which didn't quite mesh comfortably with managed
allocations and made it far too easy to introduce error handling and
removal bugs.
- Cleanup and simplification of controller struct allocation, moving
everything over to devm and making the devm APIs more robust, from
Johan Hovold
- Support for spi-mem devices that don't assert chip select and
support for a secondary read command for memory mapped flashes,
some commits for this are shared with mtd.
- Support for SpacemiT K1"
[23 lines not shown]
Merge tag 'regulator-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"The development of the regulator subsystem continues to be quite
quiet, we've got several new devices, removal of one old device and
some kernel wide cleanup of platform devices but nothing in the core.
- Cleanups of platform_device_id usage
- Filling out and fixing of the description of the MediaTek MT6359
- Removal of the PCAP regulator driver, the MFD has been removed
- New device support for Qualcomm Nord RPMH, PM8109, PM8150 and
PMAU0102, and SG Micro SGM3804"
* tag 'regulator-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (23 commits)
regulator: dt-bindings: mt6311: Convert to DT schema
regulator: qcom_smd-regulator: Add PM8019
[19 lines not shown]
Merge tag 'regmap-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap update from Mark Brown:
"This time around we just have a single fix for a sparse warning"
* tag 'regmap-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-i2c: fix sparse warning in regmap_smbus_word_write_reg16
Fix behavior of ')' used in a range when setence reaches EOF.
For a sentence spanning more than one line at the end of the file,
when the cursor is placed at the first character of any line except
for the last one, running '!)<cmd>' won't affect the last line.
From Walter Alejandro Iglesias
Merge tag 'i2c-7.2-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux
Pull i2c updates from Andi Shyti:
"This pull request is mostly made of cleanups and small infrastructure
improvements across the I2C core, drivers and bindings. It also adds
support for three drivers and a few new compatibles.
Two major cleanup across drivers and core code:
- use named initializers in device ID tables
- replace dev_err() with dev_err_probe()
Drivers:
- at24: use named initializers for arrays of i2c_device_data
- at91: add MCHP_LAN966X_PCI dependency
- cadence: add shutdown callback
- k1: enable by default on SpacemiT
- mxs: improve documentation
- qcom-geni: use pm_runtime_force_suspend/resume for system sleep
- tegra:
[44 lines not shown]
Merge tag 'pwrseq-updates-for-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing updates from Bartosz Golaszewski:
"A set of extensions to the M.2 pwrseq driver allowing it to work with
more cards than just the one from Qualcomm we supported initially.
There's also a tweak to debugfs output and a new function that will be
used by a bluetooth driver in the next cycle.
Power Sequencing core:
- Add a helper allowing consumers to access the struct device object
associated with a pwrseq provider
- Print the power sequencing device's parent in debugfs to add more
debugging information
Driver updates:
- Extend/rework the M.2 power sequencing driver in order to allow it
[11 lines not shown]
Check for E_CLRFLAG in ecp->cmd->flags, not ecp->iflags.
Fixes a problem where an extra line is printed at the end of the
output when the "number" command is given the "l" (literal display)
flag.
From Jeremy Mates Walter Alejandro Iglesias and