激活函数

Linear Activation

$f(x) = x$

$f(x) = \max(0,\, x)$

隐藏层中最常用，计算高效，缓解梯度消失

$f(x) = \frac{1}{1+\mathrm{e}^{-x}}$

二分类输出层，输出概率值

$f(x) = \tanh(x) = \frac{\mathrm{e}^{x}-\mathrm{e}^{-x}}{\mathrm{e}^{x}+\mathrm{e}^{-x}}$

RNN 隐藏层，输出范围 [-1, 1]，零中心化

f(x) = \begin{cases} x, & x\ge 0\\[4pt] \alpha x, & x<0 \end{cases} \quad (\alpha\in(0,1))

深层网络中替代 ReLU，提升训练稳定性

$f(x) = \ln\!\bigl(1+\mathrm{e}^{x}\bigr)$

ReLU 的平滑近似，用于需要平滑激活的场景，如概率模型

f(x)= \begin{cases} x, & x\ge 0\\[4pt] \alpha\,(\mathrm{e}^{x}-1), & x<0 \end{cases}

加速收敛，输出均值接近 0，适合深层网络

f(x)=\lambda \begin{cases} x, & x\ge 0\\[4pt] \alpha\,(\mathrm{e}^{x}-1), & x<0 \end{cases}

类似 ELU，但斜率更大，自归一化网络中，保持层间分布稳定
其中 $\lambda\approx 1.050\ 7$ ， $\alpha\approx 1.673\ 3$ 。

f(x)=x\cdot\sigma(\beta x)=x\cdot\frac{1}{1+\mathrm{e}^{-\beta x}} \quad (\beta=1 时称为 SiLU)

深度模型中性能优于 ReLU，如 EfficientNet

$f(x)=x\cdot\Phi(x)=\frac{x}{2}\left[1+\mathrm{erf}\!\left(\frac{x}{\sqrt{2}}\right)\right]$

类似 Swish，但更平滑，Transformer、BERT、GPT 等模型中广泛使用