ErfWiki talk:SpamWars

From ErfWiki

(Difference between revisions)
Jump to: navigation, search
(A couple of other ideas)
(A couple of other ideas)
Line 93: Line 93:
Signs of automation of vandalism: change descriptions set to a random alphabetic sequence involving capital letters. Clearly distinguishable from patterns formed by keyboard-bashing, which tend to cluster around the home keys and have a high incidence of repetition. This is often the final phase of the encounter, but without technical or policy changes the site will eventually be found again.
Signs of automation of vandalism: change descriptions set to a random alphabetic sequence involving capital letters. Clearly distinguishable from patterns formed by keyboard-bashing, which tend to cluster around the home keys and have a high incidence of repetition. This is often the final phase of the encounter, but without technical or policy changes the site will eventually be found again.
 +
 +
Some of the latest addresses are interesting: 91.143.58.1 (AS41822) seems to belong to honest-to-badness criminals. It's running an open proxy on port 80, also.

Revision as of 11:51, 2 May 2011

Contents

So You're Not An Ad-Men and You Want To Fight Spam

Just a few things for all the new spam-fighters: first, we already have a tag to slap on the spam-bots and their pages. Just hit 'em with [ [ category:spammer ] ], and they'll show up on a page we've already made for it.

Second, we've been having a few waves of what I've been calling "Cheerleaders". They'll hit random pages, and either insert or replace content with meaningless fluff like "OMG ur so smatr" Rather than mark the page for deletion, if you could just tag the spammer in their user:talk page and undo the edits, that'd be great.

Third... thank you so much. The more people who help, the easier this becomes. --No one in particular 15:53, 1 May 2011 (UTC)

Captchas & Filters

  • STrRedWolf here from Stalag '99. You should add the ReCaptcha plugin to MediaWiki, which cuts down on 99% of the spam. I'm using it on my Canmephian Library Wiki.
  • Archaic here, webmaster of Bulbagarden / Bulbapedia, founder of the Nintendo Independent Wiki Alliance, and a big fan of the comic. Heard you're having some spam issues, so I checked what some of our network partners are doing in dealing with similar issues. I see you've already got ConfirmEdit and SpamBlacklist, but you might want to consider using QuestyCaptcha for your ConfirmEdit, it makes it a lot stronger than the math questions. Just base the questions off the comic and you should be good to go - http://www.mediawiki.org/wiki/Extension:ConfirmEdit#QuestyCaptcha

Also consider getting the following extensions Abuse Filter: http://www.mediawiki.org/wiki/Extension:AbuseFilter AntiSpoof: http://www.mediawiki.org/wiki/Extension:AntiSpoof Title Blacklist: http://www.mediawiki.org/wiki/Extension:Title_Blacklist TorBlock: http://www.mediawiki.org/wiki/Extension:TorBlock

The AbuseFilter extension seems most useful for the spam you guys are getting. You could prevent non-autoconfirmed accounts from creating pages that consist of a title followed by an external link, which seems to be the form of all the recent spam (spotchecked in recent changes). Someone'd hafta' brush up on their regex, though. Pages that contain the regex ==<center>\[[A-Za-z0-9/_:\.\?+] <big>'''<u>[A-Za-z][A-Za-z\s]*</u>'''</big></center>== might work, you'd need to double-check; it's a first shot.
Other items that seem like they'd help are building some tags users can use to notify admins when things need to be deleted. Something simple like {{delete}} would work best, which categorizes pages into, say, Category:Pages to be deleted. Anything besides the category is somewhat optional, but a box at the top where they can put in a reason for deletion would also help. I'd suggest ripping code out of similar Wikipedia templates for the boxes. Cheers, everyone. 184.36.87.8 14:38, 1 May 2011 (UTC)
Me and Rpeh both wrote up the template at the same time. I saved mine over his, even though it's much uglier and crappier, 'cuz it seemed like he just stole the code from Wikipedia's {{db-meta}}, which wouldn't work all that well without the other templates it builds off of. So, template and category are working, use them as you all like. Cheers. Lifebaka 15:21, 1 May 2011 (UTC)
I went back to the previous version. I took the code from here, not directly from Wikipedia. The version I copied doesn't use anything from WP so there's no problem with it. rpeh •TCE 15:48, 1 May 2011 (UTC)
  • I don't have the webmastery credentials that these other guys do, but I just wanted to point you here for info direct from MediaWiki on fighting spam, in case you haven't seen it yet. Also, I strongly suggest not permitting anonymous editing (such as I'm doing here). --Dachannien
  • Solutions we use over at wiki.pcgen.org - We locked down Editing to Logged in users only - that was the first step; second step was to make it so you had to request a user account from an admin. Spammers were getting to be too annoying to allow open editing or even open registration. We got tired of killing new accounts.


This is for media wiki 1.6


  1. Prevent new user registrations by anyone

$wgGroupPermissions['*']['createaccount'] = false;


  1. Restrict anonymous editing / tools showing

$wgShowIPinHeader = false;


  1. Stop anonymous editing

$wgGroupPermissions['*']['edit'] = false;


  1. Anonymous users can't create talk pages

$wgGroupPermissions['*']['createtalk'] = false;


  1. Anonymous users can't create pages

$wgGroupPermissions['*']['createpage'] = false;

That would appear to be a huge overreaction. If people can't even create accounts you're going to lose 99% of your potential editors. Rpeh 14:36, 1 May 2011 (UTC)


Spam Solution

We had exactly the same spam problem on the Oblivion Mod Wiki and the FancyCaptcha module for ConfirmEdit stopped it dead. Try that one. Rpeh 14:35, 1 May 2011 (UTC)

  • Ever though of making registration mandatory (with some captcha/others) for edition? --An unregistered user

Suggestions from MediaWiki's page on combating spam

MediaWiki has a page on combating spam. Some of the good suggestions from that page:

  • Add a spam regex using wgSpamRegex, including any content that should not appear in a valid page; this includes markup used only by spammers (such as any attempt to hide links, or many types of HTML formatting), and keywords and phrases used only by spammers.
  • Use the ConfirmEdit extension, to force a CAPTCHA on any users attempting to add a new external link, as well as users attempting to register a new account. It looks like this wiki may already do that, using SimpleCaptcha, but you might want to use one of the stronger CAPTCHA modules, as I strongly suspect spammers have automated ways to solve SimpleCaptcha. Try MathCaptcha to make it use images, or ReCaptcha to use that external service. I'd recommend against using QuestyCaptcha, as it only seems to support a fixed list of questions, which won't stop spammers for long.
  • Use an IP address blacklist of known spam IPs, which tracks spammers from other wikis and forums.

--JoshTriplett 02:33, 2 May 2011 (UTC)

http://www.projecthoneypot.org/ can give an indication of whether an IP address has a track record of such behaviour as comment spamming. It is dynamic, not a static blacklist, which is better practice. Feeding data back to such projects can also be a deterrent to spammers.

A couple of other ideas

Consider whether your page content actually needs to contain URLs at all. Intra-wiki links can be done with wiki tags, so if external URLs are rare in your content then blocking page edits which create pages containing them may be more useful than annoying. Humans are good at reading a partial URL and putting it back together, but it defeats any benefit for a link-farmer if their bots can't create URLs that are accessible to a search-engine spider.

Use your robots.txt file to make it harder for link-farmers to benefit. Putting the NOFOLLOW meta tag on your wiki pages will also keep Google from giving link credit for them. Most of the spam is badly behaved robots trying to impress Google's well-behaved one - you can't control the badly behaved bots directly, but by keeping better control of the well-behaved ones you can remove their prize.

N.B. The current robots.txt set-up means the spammer still wins even when you "delete" a spam page. You're allowing search engines to index your page history pages, including the old revision with the spammer's full text. Similar issue for page excerpts on "recent changes" page.

Note, however, that spammers will continue spamming regardless of whether their spam has any benefit. --JoshTriplett 06:55, 2 May 2011 (UTC)

Unlikely to persist forever. In my experience there are two rewards - search engine rank and feeling superior to the site owner. When both dry up, there are plenty of other targets out there. Resources are not infinite for the spammers any more than they are for the defenders. They can certainly afford to waste some and to use some on spite, but they're not really any less concerned with utility vs futility of their actions than the defenders are.

All your current spam contains this string: MjE3fHwxMzA0MTQ1NDIzfHwxOTUyfHwoRU5HSU5FKSBNZWRpYVdpa2k

This may be the tag that (when a spider reaches the site that is the object of the spam) confirms the origin of the successful link-spam. Such mechanisms are used to help automate the process - sites that allow spam to generate hits can be automatically re-used for the next spamming cycle and sites that generate no hits drop from the list.

After a few hours of rapid reversal of these spams, the link farming has been replaced with a human vandal. This is not uncommon as while the spammer is investigating the reasons behind a fall in productivity they often meddle to test the environment, for amusement or spite and to muddy the waters and confuse the defenders. This phase does not generally last very long before some automated actions resume, or the spammer decides not to bother with this target further.

Signs of automation of vandalism: change descriptions set to a random alphabetic sequence involving capital letters. Clearly distinguishable from patterns formed by keyboard-bashing, which tend to cluster around the home keys and have a high incidence of repetition. This is often the final phase of the encounter, but without technical or policy changes the site will eventually be found again.

Some of the latest addresses are interesting: 91.143.58.1 (AS41822) seems to belong to honest-to-badness criminals. It's running an open proxy on port 80, also.

Go To:
Personal tools