This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
Why there is no square root at area_temperature? #1900
Open
Description
In typical dot product attention, logit which is the input matrix of softmax supposed to be divided by square rooted temperature like the equation below.
However, in this code, logit is just divided with temperature without a square root. Is it correct or wrong? If it is correct, could you explain why you didn't add square root?
Metadata
Metadata
Assignees
Labels
No labels