Spamfilter performance

These are old archives. They are kept for historic purposes only.
Post Reply
Entity
Posts: 2
Joined: Wed Apr 02, 2008 10:30 pm

Spamfilter performance

Post by Entity »

Hi all,

I'm searching for a few guidelines for setting up spamfilters. We use them pretty extensively on our network, so this is making a hit on our servers CPUs. Are there some common guidelines to keep the effect as low as possible? Some questions are:

- Is a short message string combined with a * better than a long message match?
- Is an OR'ed expressions better than two separate matches?
- Are more complex expression like [a-z][0-9]{2,4} better than a simple *?
- Does adding matches to the spamfilter.conf file make any difference to the dynamically added ones?

Many thanks for some hints in advance :)

Regards
SpaceDoG
Posts: 301
Joined: Mon Feb 27, 2006 5:44 am
Contact:

Re: Spamfilter performance

Post by SpaceDoG »

Entity
Posts: 2
Joined: Wed Apr 02, 2008 10:30 pm

Re: Spamfilter performance

Post by Entity »

SpaceDoG, thanks for the links but I fail to find any answer to my questions there. The question was not how to add spamfilters but about the factors that affect the performance of spamfilters. Example:

Regex merged by OR:
http://www\.somestupidurl\.(com|org)

Two individual regex:
http://www\.somestupidurl\.org
http://www\.somestupidurl\.com

Which one is faster? The first merged by OR or two individually added ones? Another example:

http://www\.somestupidurl\.\w{2,3}
http://www\.somestupidurl\.[a-z]{2,3}
http://www\.somestupidurl\.\w*
http://www\.somestupidurl\.*

All 4 have the same effect when it comes to matching, but do they also have the same effect when it comes to performance, i.e. cpu load on the ircd?

I'm searching for some general guidelines how to optimize spamfilter regex regarding their performance.
Jobe
Official supporter
Posts: 1180
Joined: Wed May 03, 2006 7:09 pm
Location: United Kingdom

Re: Spamfilter performance

Post by Jobe »

Well for your "2 separate versus one merged" the 1 merged is faster, on the grounds that its parsed once, where as the 2 separate require it to be parsed twice, once for each.
Your IP: Image
Your Country: Image
Post Reply