Spamfilter help

These are old archives. They are kept for historic purposes only.
Post Reply
zEkE
Posts: 111
Joined: Wed Apr 14, 2004 9:30 am
Location: Harrisonburg, VA
Contact:

Spamfilter help

Post by zEkE »

I'm trying to figure out how to design a regex, that blocks website advertising, with exceptions
I had one (lost..) that worked with mIRC's $regex identifier, but moved to Unreal, it didn't.
Basically it needs to block http://*, with the exception of http://www.blarg.net*

I think once I can get that, I can do any others of the same nature...

Any help would be appreciated, Thanks
NetAdmin - irc.unitedchristianchat.net
http://www2.i-al.net/ircbots/
AngryWolf
Posts: 554
Joined: Sat Mar 06, 2004 10:53 am
Location: Hungary
Contact:

Post by AngryWolf »

I don't know the solution, but I think you shouldn't block all URLs. Your users might want to send webpage addresses to their friends pointing to documentations, tutorials, forums and so on, anything that doesn't belong to the spam category, I think you should be more careful with spamfilters.
codemastr
Former UnrealIRCd head coder
Posts: 811
Joined: Sat Mar 06, 2004 8:47 pm
Location: United States
Contact:

Post by codemastr »

Hmm, something tells me you used the negative look-ahead assertion (?!stuff-here) ?

Unfortunately, that's a Perl extension. I will however, tell the author of the regex library we use (TRE) to consider adding this. It sounds useful to me.
As of right now, I can't think of any other way to do it with the current system. I'll ask the author as well if he can think of a temporary solution for you.
-- codemastr
zEkE
Posts: 111
Joined: Wed Apr 14, 2004 9:30 am
Location: Harrisonburg, VA
Contact:

Post by zEkE »

AngryWolf: The decision is not mine. I'm trying to change existing spamfilters. While I too disagree with them being there, I'm still under those who require them. I'm trying to change them so that users can at least discuss internal network issues without problem.

The rule goes that ads must be approved by network staff, and the way the spamfilters are being used enforce that. Whether I like it or not, I have to abide by the decisions of those above me.

Thanks codemastr, it would be really useful to have the functionality, and just because I'm wanting to use it to block/exclude websites, doesn't mean others might find better ways to use it.
NetAdmin - irc.unitedchristianchat.net
http://www2.i-al.net/ircbots/
codemastr
Former UnrealIRCd head coder
Posts: 811
Joined: Sat Mar 06, 2004 8:47 pm
Location: United States
Contact:

Post by codemastr »

Ok, I talked with him. He agreed to put this on his TODO list. However, he gave me a very, very, ugly regexp that should have the same effect:

http://([^w]|w[^w]|ww[^w]|www[^.]|www\.[^b]|www\.b[^l]|www\.bl[^a]|www\.bla[^r]|www\.blar[^g]|www\.blarg[^.]|www\.blarg\.[^n]|www\.blarg\.n[^e]|www\.blarg\.ne[^t])

Something like that hideously ugly regexp should do what you want :)
-- codemastr
zEkE
Posts: 111
Joined: Wed Apr 14, 2004 9:30 am
Location: Harrisonburg, VA
Contact:

Post by zEkE »

rofl
wow!
i'll agree thats ugly, i'll give it a try, thanks
:):)
NetAdmin - irc.unitedchristianchat.net
http://www2.i-al.net/ircbots/
AngryWolf
Posts: 554
Joined: Sat Mar 06, 2004 10:53 am
Location: Hungary
Contact:

Post by AngryWolf »

Pretty nice logic, even if the regex is ugly. :-)
Post Reply