loss函数之KLDivLoss
KL散度KL散度,又叫相对熵,用于衡量两个连续分布之间的距离,公式如下:DKL(p∥q)=∑i=1Np(xi)⋅(logp(xi)−logq(xi))D_{K L}(p \| q)=\sum_{i=1}^{N} p\left(x_{i}\right) \cdot\left(\log p\left(x_{i}\right)-\log q\left(x_{i}\right)\right)DKL(
KL散度
KL散度,又叫相对熵,用于衡量两个分布(离散分布和连续分布)之间的距离。
设 p ( x ) p(x) p(x) 、 q ( x ) q(x) q(x) 是离散随机变量 X X X的两个概率分布,则 p p p 对 q q q 的KL散度是:
D K L ( p ∥ q ) = E p ( x ) log p ( x ) q ( x ) = ∑ i = 1 N p ( x i ) ⋅ ( log p ( x i ) − log q ( x i ) ) D_{K L}(p \| q)=E_{p(x)} \log \frac{p(x)}{q(x)}=\sum_{i=1}^{N} p\left(x_{i}\right) \cdot\left(\log p\left(x_{i}\right)-\log q\left(x_{i}\right)\right) DKL(p∥q)=Ep(x)logq(x)p(x)=i=1∑Np(xi)⋅(logp(xi)−logq(xi))
KLDivLoss
对于包含 N N N个样本的batch数据 D ( x , y ) D(x, y) D(x,y), x x x是神经网络的输出,并且进行了归一化和对数化; y y y是真实的标签(默认为概率), x x x与 y y y同维度。
第 n n n个样本的损失值 l n l_{n} ln计算如下:
l n = y n ⋅ ( log y n − x n ) l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right) ln=yn⋅(logyn−xn)
class KLDivLoss(_Loss):
__constants__ = ['reduction']
def __init__(self, size_average=None, reduce=None, reduction='mean'):
super(KLDivLoss, self).__init__(size_average, reduce, reduction)
def forward(self, input, target):
return F.kl_div(input, target, reduction=self.reduction)
pytorch中通过torch.nn.KLDivLoss
类实现,也可以直接调用F.kl_div
函数,代码中的size_average
与reduce
已经弃用。reduction有四种取值mean
,batchmean
, sum
, none
,对应不同的返回
ℓ
(
x
,
y
)
\ell(x, y)
ℓ(x,y)。 默认为mean
L = { l 1 , … , l N } L=\left\{l_{1}, \ldots, l_{N}\right\} L={l1,…,lN}
ℓ ( x , y ) = { L , if reduction = ’none’ mean ( L ) , if reduction = ’mean’ N ∗ mean ( L ) , if reduction = ’batchmean’ sum ( L ) , if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}\operatorname L, & \text { if reduction }=\text { 'none' } \\ \operatorname{mean}(L), & \text { if reduction }=\text { 'mean' } \\ N*\operatorname {mean}(L), & \text { if reduction }=\text { 'batchmean' } \\ \operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array} \right. ℓ(x,y)=⎩⎪⎪⎨⎪⎪⎧L,mean(L),N∗mean(L),sum(L), if reduction = ’none’ if reduction = ’mean’ if reduction = ’batchmean’ if reduction = ’sum’
例子:
import torch
import torch.nn as nn
import math
def validate_loss(output, target):
val = 0
for li_x, li_y in zip(output, target):
for i, xy in enumerate(zip(li_x, li_y)):
x, y = xy
loss_val = y * (math.log(y, math.e) - x)
val += loss_val
return val / output.nelement()
torch.manual_seed(20)
loss = nn.KLDivLoss()
input = torch.Tensor([[-2, -6, -8], [-7, -1, -2], [-1, -9, -2.3], [-1.9, -2.8, -5.4]])
target = torch.Tensor([[0.8, 0.1, 0.1], [0.1, 0.7, 0.2], [0.5, 0.2, 0.3], [0.4, 0.3, 0.3]])
output = loss(input, target)
print("default loss:", output)
output = validate_loss(input, target)
print("validate loss:", output)
loss = nn.KLDivLoss(reduction="batchmean")
output = loss(input, target)
print("batchmean loss:", output)
loss = nn.KLDivLoss(reduction="mean")
output = loss(input, target)
print("mean loss:", output)
loss = nn.KLDivLoss(reduction="none")
output = loss(input, target)
print("none loss:", output)
输出:
default loss: tensor(0.6209)
validate loss: tensor(0.6209)
batchmean loss: tensor(1.8626)
mean loss: tensor(0.6209)
none loss: tensor([[1.4215, 0.3697, 0.5697],
[0.4697, 0.4503, 0.0781],
[0.1534, 1.4781, 0.3288],
[0.3935, 0.4788, 1.2588]])
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)