How Token Analysis works. Watchguard XCS

Add to My manuals
458 Pages

advertisement

How Token Analysis works. Watchguard XCS | Manualzz

Intercept Anti-Spam

Token Analysis

Token Analysis is a sophisticated method of identifying spam based on statistical analysis of mail content.

Simple text matches can lead to false positives because a word or phrase can have many meanings depending on the context. Token Analysis provides a way to accurately measure how likely any particular message is to be spam without having to specify every word and phrase.

Token Analysis achieves this by deriving a measure of a word or phrase contributing to the likelihood of a message being spam. This is based on the relative frequency of words and phrases in a large number of spam messages. From this analysis, it creates a table of tokens (words associated with spam) and associated measures of how likely a message is spam.

When a new incoming message is received, Token Analysis analyzes the message, extracts the tokens (words and phrases), finds their measures from the table, and aggregates these measures to produce a spam metric for the message. This spam metric is the score assigned by Token Analysis to be used in the Intercept Anti-

Spam decision.

Token Analysis has a built-in weighting mechanism that assigns a value between 0 and 100 to indicate whether a message is spam. A message with a low metric (closer to 0) is considered to be legitimate, while a message with a high metric (closer to 100) is considered to be spam. Token Analysis uses three sources of data to build its run-time database:

ƒ The initial default database based on analysis of known spam.

ƒ Tables derived from an analysis of local legitimate mail. This is referred to as “training”.

ƒ Training provided by spam from Pattern Filter Spam, DNSBL, UBL, SPF, and DomainKeys Intercept components.

How Token Analysis works

Consider the following simple message:

---------------------------------------------------------------

Subject: Get rich quick!!!!

Click on http://getrichquick.com to earn millions!!!!!

---------------------------------------------------------------

Token Analysis will break the message down into the following tokens:

[Get] [rich] [quick!!!] [Click] [on] [http://getrichquick.com] [to] [earn]

[millions!!!!!]

Each token is looked up in the database and a spam metric is retrieved. The token “Click” has a high metric of

91, whereas the word “to” is neutral (indicating neither spam nor legitimate.) These metrics are aggregated using statistical methods to give the overall score for the message of 98.

Mail messages with a spam metric of 90 or greater are very likely to be spam. Lower values (50-60) indicate possible spam, while very low values (20-25) are unlikely to be spam. These spam metrics are the score assigned by Token Analysis as part of the final Intercept Anti-Spam decision.

Token Analysis training

When enabled, Token Analysis will always run in training mode and analyze all local mail. Local mail is assumed to be not spam and the frequency of the words found in this mail may therefore be used to modify the values supplied by WatchGuard’s master list. For example, a mortgage company may use the word

“refinance” quite frequently in its regular mail. The likelihood of this word suggesting spam would therefore be reduced.

190 WatchGuard XCS

advertisement

Related manuals

advertisement

Table of contents