ComfyUI Zluda KSampler Crash: AMD Gfx1035 GPU Fixes

by Admin 52 views
ComfyUI Zluda KSampler Crash: AMD gfx1035 GPU Fixes

Understanding the ComfyUI Zluda KSampler Crash on AMD gfx1035

Alright, guys, if you've landed here, chances are you're pulling your hair out over a nasty ComfyUI Zluda KSampler crash on your AMD Radeon GPU, specifically a gfx1035 model, while trying to generate some awesome AI art. You're not alone! This isn't just a random hiccup; it's a pretty specific issue tied to how Zluda interacts with your AMD hardware and its underlying libraries. The core problem, as you might have seen in your logs, boils down to your system misidentifying your powerful gfx1035 GPU as a gfx1030 during the Triton architecture auto-detection process. This misidentification then leads to a critical failure during the KSampler step, throwing up a cryptic CUBLAS_STATUS_NOT_SUPPORTED error when attempting a cublasGemmEx operation with CUDA_R_16F. It's a real buzzkill when you're all set to create, and your system decides to play hard to get. We're talking about a situation where everything seems fine – Zluda detects your GPU, initializes PyTorch, reports your VRAM, and then, boom, right when the heavy lifting of sampling begins, it crashes. This crash, observed in Stability Matrix 2.15.4 but not in 2.14.3 (which used ROCm 6.2.4), strongly points towards a compatibility regression or an issue with the newer Zluda integration or ROCm libraries. The fact that the system also logs :: Set TRITON_OVERRIDE_ARCH=gfx1030 after detecting your AMD Radeon(TM) Graphics as an "Unknown GPU model" is a major red flag, indicating that Zluda isn't correctly leveraging your GPU's full capabilities and might be trying to run operations that are simply not supported by the fallback architecture it chose. Understanding this log output is our first big step in troubleshooting this frustrating ComfyUI Zluda KSampler crash.

Let's break down the key pieces of information from your logs. First, we see :: Auto-detecting AMD GPU architecture for Triton... Detected GPU via Windows registry: AMD Radeon(TM) Graphics :: Unknown GPU model: AMD Radeon(TM) Graphics, using default gfx1030 :: Set TRITON_OVERRIDE_ARCH=gfx1030. This is where the core issue begins. Triton, a crucial part of Zluda that helps optimize deep learning operations, is failing to correctly identify your gfx1035 architecture. Instead, it falls back to a default gfx1030. While your GPU might share some similarities with gfx1030, being forced into this older profile means certain advanced features or optimizations specific to your gfx1035 might be disabled or, worse, calls intended for gfx1035 are being incorrectly translated for gfx1030. Then, we get to the actual crash: !!! Exception during processing !!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmEx( handle, opa, opb, m, n, k, alpha_ptr, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, beta_ptr, c, CUDA_R_16F, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP). This is a low-level error indicating that a fundamental matrix multiplication operation, handled by CUBLAS (CUDA Basic Linear Algebra Subprograms) – which Zluda translates to HIP/ROCm – is not supported. Specifically, it's failing when trying to perform operations with CUDA_R_16F, which refers to 16-bit floating-point numbers (FP16 or bfloat16). This suggests that the combination of the misidentified architecture (gfx1030), the specific ROCm libraries you have (possibly 6.4), and the requested FP16 operation is causing a compatibility breakdown. This isn't just an inconvenience; it's a hard stop that prevents ComfyUI from leveraging your GPU for its primary task: generating images. Understanding these log details is paramount, as they directly point us to potential solutions involving architecture overrides, library versions, and possibly even forcing different precision settings.

Diving Deeper into the gfx1030 vs. gfx1035 Misdetection

Alright, let's peel back another layer and talk about this gfx1030 versus gfx1035 misdetection that's causing so much grief. For those of you new to AMD's GPU architecture naming, gfx103x refers to different versions within the RDNA2 family. Your AMD Radeon(TM) 680M (part of the AMD 7735hs processor) is indeed a gfx1035 architecture. This specific identifier tells the underlying software, like ROCm and by extension Zluda, exactly what capabilities and optimizations your GPU possesses. When Zluda's Triton auto-detection kicks in and says Unknown GPU model: AMD Radeon(TM) Graphics, using default gfx1030, it's essentially saying, "Hey, I don't know what this specific GPU is, so I'm going to play it safe and assume it's an older, more common, but less capable version." This default gfx1030 might not fully support all the instructions and features that your gfx1035 GPU offers, or it might try to use outdated methods for operations. The impact of this TRITON_OVERRIDE_ARCH=gfx1030 is significant. Imagine having a top-tier sports car, but the navigation system thinks it's a basic sedan. It'll still drive, but it won't be able to use its advanced performance features or even find routes optimally. In the AI world, this means potential for slower performance, or in our case, outright ComfyUI Zluda KSampler crash because certain operations optimized for gfx1035 (especially with lower precision like FP16) simply aren't correctly mapped or supported when running under a gfx1030 profile. It's like trying to speak a dialect that the system doesn't fully understand.

Now, let's talk about the role of ROCm libraries in all this, especially since you mentioned replacing them for gfx1035 in your Program Files/AMD/rocm/6.4 folder. ROCm (Radeon Open Compute platform) is AMD's answer to NVIDIA's CUDA. It's the framework that allows software like PyTorch and Zluda to directly talk to and utilize your AMD GPU. Zluda, in particular, acts as a compatibility layer, translating CUDA calls from PyTorch into ROCm/HIP calls. If your ROCm libraries are specifically tailored for gfx1035 but Triton is operating under gfx1030, you have a mismatch. It's possible that the newer ROCm 6.4 libraries, while intended for gfx1035, might have a different expected behavior or a subtle incompatibility when used with an older Triton architecture override, or perhaps Zluda's translation layer itself isn't fully updated for the specific gfx1035/ROCm 6.4 combination under the gfx1030 fallback. The fact that an older version of Stability Matrix (2.14.3) with ROCm 6.2.4 worked perfectly is a critical piece of the puzzle here. This strongly suggests that either Stability Matrix 2.15.4 introduced changes that broke compatibility, or the ROCm 6.4 libraries have an issue that surfaces when your gfx1035 is incorrectly identified, especially when it comes to those CUDA_R_16F operations that cublasGemmEx is trying to perform. The problem isn't necessarily your physical GPU; it's the software stack's interpretation and utilization of it. Pinpointing where this chain breaks – be it Zluda's auto-detection, Triton's compatibility with gfx1035 in this specific Zluda build, or the ROCm 6.4 libraries themselves in conjunction with the fallback – is key to finding a lasting solution for your ComfyUI Zluda KSampler crash.

The Dreaded CUBLAS_STATUS_NOT_SUPPORTED Error

Okay, guys, let's zero in on the exact moment things fall apart: the CUBLAS_STATUS_NOT_SUPPORTED error. This isn't just a generic failure; it's a very specific message coming from what would typically be a CUDA environment (remember, Zluda translates CUDA to HIP/ROCm). CUBLAS stands for CUDA Basic Linear Algebra Subprograms, and it's NVIDIA's library for high-performance matrix computations, which are absolutely essential for deep learning models like those used in ComfyUI. When Zluda translates a request for cublasGemmEx – a function for performing general matrix multiplication – and the system throws CUBLAS_STATUS_NOT_SUPPORTED, it means the underlying hardware or its supporting libraries (in this case, your AMD GPU and its ROCm drivers) cannot perform the requested operation. The critical detail here is CUDA_R_16F. This specifies that the operation is trying to use 16-bit floating-point numbers, commonly known as FP16 or half-precision. Modern GPUs, especially your gfx1035, are designed to handle FP16 operations very efficiently, often with dedicated hardware. However, if the software stack (Zluda, Triton, ROCm) isn't correctly configured for your specific gfx1035 architecture or if it's operating under the misidentified gfx1030 profile, it might attempt to call an FP16 operation that the gfx1030 profile doesn't officially support in the way the current ROCm 6.4 libraries expect it, or perhaps the Zluda translation for this specific combination is flawed. It's a classic case of incompatible instructions or features. Your GPU can do it, but the software thinks it can't because of the layers of abstraction and misidentification. This leads directly to the ComfyUI Zluda KSampler crash, stopping your image generation dead in its tracks.

This issue becomes even more perplexing when we consider your experience: older version of stability matrix worked (2.14.3) . It used rocm 6.2.4. This single piece of information is a massive clue, guys! It tells us that your hardware is perfectly capable of running ComfyUI Zluda with an AMD GPU; it's not an inherent hardware limitation. Instead, it points directly to a regression or incompatibility introduced with Stability Matrix 2.15.4 or the ROCm 6.4 libraries, or perhaps the specific build of Zluda bundled with the newer Stability Matrix. When the system detects your gfx1035 but then defaults to gfx1030 for Triton, and this is combined with the updated ROCm 6.4 libraries, it appears to create a scenario where the cublasGemmEx call with CUDA_R_16F precision is no longer supported or correctly translated. This wasn't an issue with ROCm 6.2.4 and the older Zluda/Triton setup, suggesting a change in how FP16 operations are handled, or a bug in the translation layer for gfx1035 when under a gfx1030 override. The problem could be in Zluda's CUDA-to-HIP translation for this specific operation on gfx1035 when it's being "tricked" into thinking it's gfx1030, or it could be an actual bug within the ROCm 6.4 implementation for gfx1035 when certain flags or architecture settings aren't perfectly aligned. Either way, this CUBLAS_STATUS_NOT_SUPPORTED error is the final symptom of a deeper architectural and compatibility mismatch. Our troubleshooting steps will leverage this crucial insight about the working older version to guide us toward a resolution for your ComfyUI Zluda KSampler crash.

Step-by-Step Troubleshooting for Your ComfyUI Zluda Crash

Alright, folks, now that we've dug deep into why this ComfyUI Zluda KSampler crash is happening, it's time to roll up our sleeves and get to the fixes! Don't fret, we've got a roadmap to get your generative AI workflow back on track. The good news is, since we know it used to work on an older version, we're likely dealing with a software conflict rather than a hardware fault, which is usually much easier to resolve. Our goal here is to either correctly identify your GPU, bypass the problematic gfx1030 override, or find a stable software combination that supports your AMD Radeon gfx1035. Let's start with the basics and move to more advanced solutions.

First up, Verify Your GPU and Drivers. Guys, it might sound obvious, but ensure your AMD graphics drivers are absolutely up-to-date. Head directly to AMD's official website, find your specific GPU model (AMD Radeon 680M or your 7735hs processor's integrated graphics), and download the latest recommended drivers. Sometimes, even if Windows Update says you're current, AMD's site will have newer, more optimized versions. Outdated or corrupted drivers can cause all sorts of low-level hardware communication issues that might manifest as a CUBLAS_STATUS_NOT_SUPPORTED error. A clean driver install (using DDU, Display Driver Uninstaller, if you're feeling adventurous and want to be extra thorough) can often iron out subtle conflicts. This ensures your operating system is speaking the most current and correct language to your gfx1035 GPU, which is foundational for Zluda and ROCm.

Next, and probably the most immediate fix if you're in a hurry, is Zluda/Stability Matrix Version Rollback. Since you clearly stated that Stability Matrix 2.14.3 (which used ROCm 6.2.4) worked, this is your golden ticket. The easiest path to resolve this specific ComfyUI Zluda KSampler crash is to revert to that known working configuration. If Stability Matrix allows you to switch between installed versions or download older packages, go for 2.14.3. If not, you might need to uninstall 2.15.4 and then install the older version. This isn't just a workaround; it tells us definitively that a change between 2.14.3 and 2.15.4 (or their bundled Zluda/ROCm versions) is the culprit. While it doesn't solve the underlying gfx1030 misdetection in the new version, it gets you back to generating art immediately.

Now, for a more direct approach to the architecture issue: Manual Triton Architecture Override (Correctly). The logs show :: Set TRITON_OVERRIDE_ARCH=gfx1030. We need to try and force this to gfx1035. You'll typically do this by setting an environment variable before launching ComfyUI, or by adding a launch parameter within Stability Matrix if it supports custom environment variables for packages. For example, you might add TRITON_OVERRIDE_ARCH=gfx1035 to your system environment variables or try to pass it directly via the command line that launches ComfyUI, depending on how Stability Matrix handles launch configurations. Be warned: If Zluda/Triton truly doesn't have a gfx1035 profile compiled in or expects something different, this could still lead to crashes or instability. However, it's worth trying, as it addresses the root of the misidentification. Consult Stability Matrix or Zluda documentation for the exact method to set this variable.

Another crucial area to investigate is Check ROCm Installation and Compatibility. You mentioned replacing rocm libraries for gfx1035 in Program Files/AMD/rocm/6.4. Ensure that these libraries are not just present but are also fully compatible with the specific Zluda build in Stability Matrix 2.15.4. Sometimes, Zluda expects a very specific version of ROCm or a particular compilation of its libraries. If you've manually replaced parts of ROCm, there's a chance of mismatch. You might try completely uninstalling ROCm and letting Stability Matrix or Zluda install its preferred version, or find a Zluda build specifically tested with your gfx1035 and ROCm 6.4. Also, verify if there's an environment variable like ROCM_PATH that needs to point to your specific rocm/6.4 directory.

Finally, let's look at ComfyUI Settings & Optimizations that might offer a workaround. Your logs already mention Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention and Set vram state to: LOW_VRAM. While these are generally good optimizations, the CUBLAS_STATUS_NOT_SUPPORTED error with CUDA_R_16F points to precision. The issue might be with the system trying to use 16-bit floating point (FP16/bfloat16) operations, which are faster but more sensitive to hardware/software support. As a last resort, you might be able to force ComfyUI or PyTorch to use 32-bit floating point (FP32) operations, which are slower and use more VRAM, but are generally more universally supported. This usually involves setting torch.set_default_dtype(torch.float32) in your Python script or checking if ComfyUI has a global precision setting. This is a bit advanced and might require diving into ComfyUI's internal configuration or custom nodes, but it could bypass the cublasGemmEx crash if it's purely a precision-related incompatibility under the gfx1030 override.

If all else fails, Community and Support are your best friends. The Stability Matrix team (LykosAI) and the Zluda developers are actively working on these kinds of issues. Provide them with your detailed logs, including the specific gfx1035 GPU, Stability Matrix version, ROCm version, and the exact steps to reproduce. The more information you give, the faster they can diagnose and fix it. Check their forums, GitHub issues, or Discord channels. You might find others with the same gfx1035 CUBLAS_STATUS_NOT_SUPPORTED problem and a ready solution.

Preventing Future ComfyUI Zluda KSampler Crashes

Alright, folks, once we get your current ComfyUI Zluda KSampler crash sorted, let's talk about how to keep these headaches from coming back. Prevention is always better than cure, especially when it comes to complex AI setups. Keeping your system stable and your creative flow uninterrupted requires a proactive approach, especially with cutting-edge software like Zluda and ever-evolving platforms like Stability Matrix. We're dealing with a rapidly advancing field where new versions, drivers, and libraries drop constantly, bringing both exciting features and potential new incompatibilities. So, let's arm ourselves with some best practices to safeguard your AMD Radeon gfx1035 setup.

First and foremost, Stay Informed. This might seem obvious, but it's crucial. Actively monitor the official channels for LykosAI (Stability Matrix), the Zluda project, and AMD's driver releases. Sign up for newsletters, follow their social media, or regularly check their GitHub repositories and forums. Developers are often quick to release patches for critical issues like your CUBLAS_STATUS_NOT_SUPPORTED error, especially when it affects a common GPU architecture like gfx1035. Being aware of upcoming updates, known bugs, or hotfixes related to ComfyUI Zluda or Triton override issues can save you hours of troubleshooting. Sometimes, a fix is just around the corner, and knowing when it arrives can prevent you from going down a rabbit hole of manual tweaks.

Second, a golden rule for any AI enthusiast: Backup Your Working Environment. Seriously, guys, before you hit that "update" button on Stability Matrix or manually upgrade any ROCm libraries, make a backup! If Stability Matrix 2.14.3 was working perfectly for you, consider making a copy of its entire installation directory. This way, if a new update breaks things (like your current gfx1035 KSampler crash), you can quickly revert to a stable version. This could mean backing up the Stability Matrix data folder, specific Zluda installations, or even creating a system restore point. For those managing multiple AI environments, virtual machines or Docker containers can also offer isolated, easily revertable setups, though that might be overkill for most users.

Third, always Test New Versions Cautiously. The excitement of new features can be intoxicating, but when dealing with a complex stack like ComfyUI, Zluda, and AMD GPUs, it's wise to be patient. Instead of immediately upgrading your main working environment, consider setting up a separate, experimental installation for new Stability Matrix versions or Zluda builds. Or, if you can't do a full separate install, at least generate a simple test image immediately after an update to ensure basic functionality, like the KSampler step, is still working. This allows you to catch breaking changes, like the gfx1030 override issue or the CUBLAS_STATUS_NOT_SUPPORTED error, without disrupting your ongoing projects.

Fourth, Understand Your Hardware. Knowing your GPU architecture inside and out, like recognizing that your AMD Radeon is gfx1035, is incredibly valuable. This knowledge helps you interpret logs like TRITON_OVERRIDE_ARCH=gfx1030 and understand why they might be problematic. Researching your GPU's specific capabilities, supported ROCm versions, and any known quirks can empower you to make more informed decisions when troubleshooting or configuring your AI software. This understanding often simplifies the process of diagnosing compatibility issues and finding targeted solutions.

Finally, Contribute to the Community. If you encounter new issues or find a novel solution, share it! The open-source nature of many of these projects thrives on community input. Reporting bugs with detailed logs (just like you did!) and clear steps to reproduce, or sharing your successful fixes, helps not just other users but also the developers. By contributing, you play a vital role in making the AI ecosystem more robust and accessible for everyone, especially those of us leveraging AMD hardware with tools like ComfyUI Zluda. Together, we can make these KSampler crashes a thing of the past and ensure a smoother, more reliable creative experience for all.