AI恶意文件静态检测模型训练中的错误及解决方案

释放双眼，带上耳机，听听看~！

本文讨论了在AI恶意文件静态检测模型训练中遇到的GPU内存溢出和参数名不一致的问题，并提供了解决方案。

【AI】恶意文件静态检测模型检验及小结

因为样本在某台机子上，又恰逢有其他模型在训练，因此 GPU 资源被占满了，不过测试这个模型的话，CPU 也绰绰有余了，当我准备使用 CPU 训练时，却遇到了问题；

分析

1、model.to(device) 不会影响 torch.load()；

我一开始以为只要使用 model.to 就算是使用上 CPU 了；

device = torch.device("cpu") model = ... model = model.to(device) model_savedir_ = '' if os.path.exists(model_savedir_): print("model load.") state_dict = torch.load(model_savedir_) model.load_state_dict(state_dict)

事实证明，我想的太简单了…

RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

这个问题很显而易见，就是 GPU 的内存溢出了，但是按我的思路，用的应该是 CPU 啊，所以我怀疑是 torch.load() 这个函数出了问题，查询了一番资料后，发现是要这样使用的 state_dict = torch.load(model_savedir_, map_location=device)；

2、GPU 与 CPU 训练时参数名不一致

当我以为大功告成，点击运行之时，不料，又报错了：

RuntimeError: Error(s) in loading state_dict for ..model..: Missing key(s) in state_dict: "fc.weight", "fc.bias", "features.0.0.weight", "features.0.1.weight", "features.0.1.bias", "features.0.1.running_mean", "features.0.1.running_var", "features.1.conv.0.weight", "features.1.conv.1.weight", "features.1.conv.1.bias", "features.1.conv.1.running_mean", "features.1.conv.1.running_var", "features.1.conv.3.weight", "features.1.conv.4.weight", "features.1.conv.4.bias", "features.1.conv.4.running_mean", "features.1.conv.4.running_var", "features.1.conv.5.fc.0.weight", ...

根据理解，就是说找不到参数，因此，我将字典部分内容打印了一下：

for k, v in state_dict.items(): print(k, v) break

发现问题了，在多 GPU 上训练的模型，保存时会在参数名前多加了一个 module. 前缀，因此在用 CPU 进行加载时，需要把这个前缀去掉：

if os.path.exists(model_savedir_): print("model load.") state_dict = torch.load(model_savedir_, map_location=device) from collections import OrderedDict state_dict_new = OrderedDict() for k, v in state_dict.items(): name = k[7:] # 去掉 `module.` state_dict_new[name] = v model.load_state_dict(state_dict_new)

这样就能够在 CPU 上加载多 GPU 训练的模型了！

后记

以上就是 【问题解决】解决如何在 CPU 上加载多 GPU 训练的模型 的全部内容了，希望对大家有所帮助！

📝 上篇精讲：【问题解决】解决 Docker 二次重启 MySQL 8 遇到的一些问题

💖 我是 𝓼𝓲𝓭𝓲𝓸𝓽，期待你的关注；

👍 创作不易，请多多支持；

🔥 系列专栏：问题解决 AI

本网站的内容主要来自互联网上的各种资源，仅供参考和信息分享之用，不代表本网站拥有相关版权或知识产权。如您认为内容侵犯您的权益，请联系我们，我们将尽快采取行动，包括删除或更正。

{{userData.name}}已认证

AI恶意文件静态检测模型训练中的错误及解决方案

分析

后记

深度学习中的扩散模型及应用

NewBing使用前置条件及重定向方法

GeoSpy.ai

Globe Explorer

即梦Dreamina

Luma Dream Machine

Motionshop

Kling AI | Sora-Like Video Model

归档

{{userData.name}}已认证

分析

后记

深度学习中的扩散模型及应用

NewBing使用前置条件及重定向方法

在云环境怎么训练LoRA模型

如何利用抱抱脸（HuggingFace）进行人工智能模型训练

Hugging Face Hub 强大的机器学习数据源及解决下载问题的方法

ATC模型转换动态shape问题案例及解决方法