Page 1 of 1

Spamfilter help

Posted: Wed Apr 14, 2004 9:43 am
by zEkE
I'm trying to figure out how to design a regex, that blocks website advertising, with exceptions
I had one (lost..) that worked with mIRC's $regex identifier, but moved to Unreal, it didn't.
Basically it needs to block http://*, with the exception of http://www.blarg.net*

I think once I can get that, I can do any others of the same nature...

Any help would be appreciated, Thanks

Posted: Wed Apr 14, 2004 11:15 am
by AngryWolf
I don't know the solution, but I think you shouldn't block all URLs. Your users might want to send webpage addresses to their friends pointing to documentations, tutorials, forums and so on, anything that doesn't belong to the spam category, I think you should be more careful with spamfilters.

Posted: Wed Apr 14, 2004 10:33 pm
by codemastr
Hmm, something tells me you used the negative look-ahead assertion (?!stuff-here) ?

Unfortunately, that's a Perl extension. I will however, tell the author of the regex library we use (TRE) to consider adding this. It sounds useful to me.
As of right now, I can't think of any other way to do it with the current system. I'll ask the author as well if he can think of a temporary solution for you.

Posted: Thu Apr 15, 2004 4:33 am
by zEkE
AngryWolf: The decision is not mine. I'm trying to change existing spamfilters. While I too disagree with them being there, I'm still under those who require them. I'm trying to change them so that users can at least discuss internal network issues without problem.

The rule goes that ads must be approved by network staff, and the way the spamfilters are being used enforce that. Whether I like it or not, I have to abide by the decisions of those above me.

Thanks codemastr, it would be really useful to have the functionality, and just because I'm wanting to use it to block/exclude websites, doesn't mean others might find better ways to use it.

Posted: Thu Apr 15, 2004 9:35 pm
by codemastr
Ok, I talked with him. He agreed to put this on his TODO list. However, he gave me a very, very, ugly regexp that should have the same effect:

http://([^w]|w[^w]|ww[^w]|www[^.]|www\.[^b]|www\.b[^l]|www\.bl[^a]|www\.bla[^r]|www\.blar[^g]|www\.blarg[^.]|www\.blarg\.[^n]|www\.blarg\.n[^e]|www\.blarg\.ne[^t])

Something like that hideously ugly regexp should do what you want :)

Posted: Fri Apr 16, 2004 12:50 am
by zEkE
rofl
wow!
i'll agree thats ugly, i'll give it a try, thanks
:):)

Posted: Fri Apr 16, 2004 8:03 am
by AngryWolf
Pretty nice logic, even if the regex is ugly. :-)