Homoglyph Detection
Definition
Comparing strings using a variety of techniques to determine if a deceptive or malicious string is being presented to a user.
How it works
A homoglyph, in this context, is a deceptive string or word which looks like a trusted word, but is composed of different characters, for example: goooogle.com versus google.com. This is commonly found in phishing and typo squatting attacks where a human exploiting through a social engineering campaign.
Considerations
- In very large environments processing DNS queries can be computationally expensive due to the amount of traffic that is generated
- Legitimate companies and products use non-dictionary words in their names that could result in many false positives
References
The following references were used to develop the Homoglyph Detection knowledge-base article.
(Note: the consideration of references does not imply specific functionality exists in an offering.)
Computer-implemented methods and systems for identifying visually similar text character strings
MITRE Comments
Text input is compared to an engine of look-alike sets of text characters. An estimate of similar characters based on the engine is conducted, and an alert is triggered if the estimated similarity is lower than a given threshold.
System and method for detecting homoglyph attacks with a siamese convolutional neural network
MITRE Comments
This patent describes a mechanism to detect homoglyph strings that involves training a Siamese convolutional neural network to compare images of strings. Strings of legitimate URLs for websites along with known suspicious stings are converted to images during the training process to create an index. New strings are converted to images and then compared to the index for similarity, if the string deviates beyond a threshold an alert is triggered.