GELU activation
Gaussian Error Linear Unit: GELU(x) = x · Φ(x), where Φ is the Gaussian CDF.
Gaussian Error Linear Unit: GELU(x) = x · Φ(x), where Φ is the Gaussian CDF. Smoother than ReLU, used in GPT-1's feed-forward layers.
Gaussian Error Linear Unit: GELU(x) = x · Φ(x), where Φ is the Gaussian CDF.
Gaussian Error Linear Unit: GELU(x) = x · Φ(x), where Φ is the Gaussian CDF. Smoother than ReLU, used in GPT-1's feed-forward layers.