from_logits=True
logit和概率的区别
BinaryCrossentropy
class
tf.keras.losses.BinaryCrossentropy(
from_logits=False, label_smoothing=0, reduction="auto", name="binary_crossentropy"
)
Computes the cross-entropy loss between true labels and predicted labels.
Use this cross-entropy loss for binary (0 or 1) classification applications. The loss function requires the following inputs:
y_true
(true label): This is either 0 or 1.y_pred
(predicted value): This is the model's prediction, i.e, a single floating-point value which either represents a logit, (i.e, value in [-inf, inf] whenfrom_logits=True
) or a probability (i.e, value in [0., 1.] whenfrom_logits=False
).
搬运说得很清楚了,logit 和 probability的区别:
logit: 输出是
probability:[0,1]
- reduction:类型为
tf.keras.losses.Reduction
,对loss进行处理,默认是求平均; - name: op的name
CategoricalCrossentropy
class
tf.keras.losses.CategoricalCrossentropy(
from_logits=False,
label_smoothing=0,
reduction="auto",
name="categorical_crossentropy",
)
Computes the crossentropy loss between the labels and predictions.
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot
representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy
loss. There should be # classes
floating point values per feature.
In the snippet below, there is # classes
floating pointing values per example. The shape of both y_pred
and y_true
are [batch_size, num_classes]
.
对应二分类,这个categoricalCrossentropy是用于多分类。标签需要是one_hot。如果不是one_hot,可以用SparseCategoricalCrossentropy
loss。
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.99, epsilon=1e-08, decay=0.0)
- lr:float> = 0.学习率
- beta_1:float,0 <beta <1。一般接近1。一阶矩估计的指数衰减率
- beta_2:float,0 <beta <1。一般接近1。二阶矩估计的指数衰减率
- epsilon:float> = 0,模糊因子。如果None,默认为K.epsilon()。该参数是非常小的数,其为了防止在实现中除以零
- decay:float> = 0,每次更新时学习率下降
学习率决定了学习进程的快慢(也可以看作步幅的大小)。如果学习率过大,很可能会越过最优值,反而如果学习率过小,优化的效率可能很低,导致过长的运算时间,Adam大概的思想是开始的学习率设置为一个较大的值,然后根据次数的增多,动态的减小学习率,以实现效率和效果的兼得。
更多推荐
所有评论(0)