當(dāng)前位置：首頁 > news >正文

b2b電子商務(wù)的模式有哪些seo是指什么崗位

news 2025/7/14 2:50:01

b2b電子商務(wù)的模式有哪些,seo是指什么崗位,互聯(lián)網(wǎng)軟件開發(fā)是什么工作,佛山網(wǎng)站快照優(yōu)化公司使用Pytorch訓(xùn)練出的模型權(quán)重為fp32，部署時(shí)，為了加快速度，一般會(huì)將模型量化至int8。與fp32相比，int8模型的大小為原來的1/4, 速度為2~4倍。 Pytorch支持三種量化方式： 動(dòng)態(tài)量化（Dynamic Quantization&…

使用Pytorch訓(xùn)練出的模型權(quán)重為fp32，部署時(shí)，為了加快速度，一般會(huì)將模型量化至int8。與fp32相比，int8模型的大小為原來的1/4, 速度為2~4倍。
Pytorch支持三種量化方式：

動(dòng)態(tài)量化（Dynamic Quantization）: 只量化權(quán)重，激活在推理過程中進(jìn)行量化
靜態(tài)量化（Static Quantization）: 量化權(quán)重和激活
量化感知訓(xùn)練（Quantization Aware Training，QAT）: 插入量化算子后進(jìn)行訓(xùn)練，主要在靜態(tài)量化精度不滿足需求時(shí)進(jìn)行。
大多數(shù)情況下，我們只需要進(jìn)行靜態(tài)量化，少數(shù)情況下在量化感知訓(xùn)練不滿足時(shí)使用QAT進(jìn)行微調(diào)。所以本篇只重點(diǎn)講靜態(tài)量化，并且理論部分先略過（后面再專門總結(jié)），只關(guān)注實(shí)操。
注：下面的代碼是在pytorch1.10下，后面Pytorch對(duì)量化的接口有調(diào)整
官方文檔：Quantization — PyTorch 1.10 documentation

動(dòng)態(tài)模式（Eager Mode）與靜態(tài)模式（fx graph）

Pytorch支持用2種方式量化，一種是動(dòng)態(tài)圖模式，也是我們?nèi)粘Ｊ褂肞ytorch訓(xùn)練所使用的方式，使用這種方式量化需要自己手動(dòng)修改網(wǎng)絡(luò)結(jié)構(gòu)，在支持量化的算子前、后插入量化節(jié)點(diǎn)，優(yōu)點(diǎn)是方便調(diào)試。靜態(tài)模式則是由pytorch自動(dòng)在計(jì)算圖中插入量化節(jié)點(diǎn)，不需要手動(dòng)修改網(wǎng)絡(luò)。
網(wǎng)絡(luò)上大部分的教程都是基于靜態(tài)模式，這種方式比較大的問題就是需要手動(dòng)修改網(wǎng)絡(luò)結(jié)構(gòu)，官方教程里的網(wǎng)絡(luò)是屬于demo型, 其中的QuantStub和DeQuantStub就分別是量化和反量化的節(jié)點(diǎn)：

# define a floating point model where some layers could be statically quantized
class M(torch.nn.Module):def __init__(self):super(M, self).__init__()# QuantStub converts tensors from floating point to quantizedself.quant = torch.quantization.QuantStub()self.conv = torch.nn.Conv2d(1, 1, 1)self.relu = torch.nn.ReLU()# DeQuantStub converts tensors from quantized to floating pointself.dequant = torch.quantization.DeQuantStub()def forward(self, x):# manually specify where tensors will be converted from floating# point to quantized in the quantized modelx = self.quant(x)x = self.conv(x)x = self.relu(x)# manually specify where tensors will be converted from quantized# to floating point in the quantized modelx = self.dequant(x)return x

Pytorch對(duì)于很多網(wǎng)絡(luò)層是不支持量化的（比如很常用的Prelu），如果我們用這種方式，我們就必須在這些不支持的層前面插入DeQuantStub，然后在支持的層前面插入QuantStub。筆者體驗(yàn)下來，體驗(yàn)很差，個(gè)人覺得不太實(shí)用，會(huì)破壞原來的網(wǎng)絡(luò)結(jié)構(gòu)。
而靜態(tài)圖模式，我們只需要調(diào)用Pytorch提供的接口將原模型轉(zhuǎn)換一下即可，不需要修改原來的網(wǎng)絡(luò)結(jié)構(gòu)文件，個(gè)人認(rèn)為實(shí)用性更強(qiáng)。

靜態(tài)模式量化

1. 載入fp32模型，并轉(zhuǎn)成fx graph

其中量化參數(shù)有‘fbgemm’和‘qnnpack’兩種，前者在x86運(yùn)行，后者在arm運(yùn)行。

model_fp32 = torch.load(xxx)
model_fp32_quantize = copy.deepcopy(model_fp32)
qconfig_dict = {"": torch.quantization.get_default_qconfig('fbgemm')}
model_fp32_quantize.eval()
# preparemodel_prepared = quantize_fx.prepare_fx(model_fp32_quantize, qconfig_dict)
model_prepared.eval()

2.讀取量化數(shù)據(jù)，標(biāo)定（Calibration）量化參數(shù)

標(biāo)定的過程就是使用模型推理量化圖片，然后統(tǒng)計(jì)權(quán)重和激活分布，從而得到量化參數(shù)。量化圖片一般來源于訓(xùn)練集（幾百張左右，根據(jù)測(cè)試情況調(diào)整）。量化圖片可以通過Pytorch的Dataloader讀取，也可以直接自行實(shí)現(xiàn)讀圖片然后送入網(wǎng)絡(luò)。

### 使用dataloader讀取
for i, (data, label) in enumerate(train_loader):data = data.to(torch.device("cpu:0"))outputs = model_prepared(data)print("calibrating {}".format(i))if i > 1000:break

3. 轉(zhuǎn)換為量化模型并保存

quantized_model = quantize_fx.convert_fx(model_prepared)
torch.jit.save(torch.jit.script(quantized_model), "quantized_model.pt")

速度測(cè)試

量化后的模型使用方法與fp32模型一樣：

import torch
import cv2
import numpy as np
torch.set_num_threads(1)fused_model = torch.jit.load("jit_model.pt")
fused_model.eval()
fused_model.to(torch.device("cpu:0"))img = cv2.imread("./1.png")
img_fp32 = img.astype(np.float32)
img_fp32 = (img_fp32-127.5) / 127.5
input = torch.from_numpy(img).permute(2, 0, 1).unsqueeze(0).float()def speed_test(model, input):# warm upfor i in range(10):model(input)import timestart = time.time()for i in range(100):model(input)end = time.time()print("model time: ", (end-start)/100)time.sleep(10)# quantized model
quantized_model= torch.jit.load("quantized_model.pt")
quantized_model.eval()
quantized_model.to(torch.device("cpu:0"))speed_test(fused_model, input)
speed_test(quantized_model, input)

實(shí)測(cè)fp32模型單核運(yùn)行120ms, 量化后47ms

結(jié)語

本文介紹了fx graph模式下的Pytorch的PTSQ方法，并實(shí)測(cè)了一個(gè)模型，效果還比較不錯(cuò)。
1_995567224_161_79_3_732056265_62005da0d7c1b531a6cf91ea587d312e.jpg

查看全文

http://m.risenshineclean.com/news/62761.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网