ErfWiki:SpamWars

From ErfWiki

(Difference between revisions)
Jump to: navigation, search
(Additional KittenAuth suggestions)
(For Bot Coding)
Line 65: Line 65:
**maybe there are other useful bots in that package too
**maybe there are other useful bots in that package too
*and, for erfs sake, start using [[MediaWiki:Spam-blacklist]]. Spamming will get less if they cant spam the same link twice.--[[User:Baumgeist|Baumgeist]] 18:26, 11 May 2011 (UTC)
*and, for erfs sake, start using [[MediaWiki:Spam-blacklist]]. Spamming will get less if they cant spam the same link twice.--[[User:Baumgeist|Baumgeist]] 18:26, 11 May 2011 (UTC)
 +
 +
* So far I have found 2 types of spam.
 +
**1) replaces the text of a page with a random engrish compliment.  All of these have a long stream of random letters in the comment field.  A bot could probably flag all edits with gibberish for comments as spam.
 +
**2) creates a new page that links to another website.  The comments for these page creations all begin with '''<nowiki>Created page with '==<center></nowiki>'''.  Beginning a page with a centered level 2 heading (==) instead of a left justified level 1 heading (=) appears to be unique to the spambot.  A bot could probably tag any edit that creates a page with this layout as spam.
==For Wiki Work==
==For Wiki Work==

Revision as of 06:21, 13 May 2011

Branched off from main ErfWiki:Maintenance Portal page. (It's a Wiki. You don't like it, move it back.) Abb3w 22:15, 7 May 2011 (UTC)

Contents

From Maintenance Portal

  • As long as pages can be edited by anyone without a captcha-protected login, spambots can do whatever they want
  • Add spambot IPs to [[Category:Spammer]]
    • No IP is being used more than once. All this accomplishes is blacklisting a massive pile of random IPs. --ChroniclerC 21:49, 2 May 2011 (UTC)
      • The spambots are like that, but the wandering vandal i spent a few hours reverting last night does double back over his IP's a bit, probably using a limited number of proxies. Slapping those down would deal with that part of the problem eventually. --Pickled Tink 02:49, 7 May 2011 (UTC)
  • List of steps recommended by MediaWiki's manual:
    • Requiring user logins to edit pages
    • Requiring email and CAPTCHA validations of user creation
    • Requiring CAPTCHA for edits, from users who are not well known.
    • Blocking edits which add specific key words or external links
    • Blocking usernames and page title patterns that are commonly used by bots
    • Blocking registration using known spam domains (mail.ru)
    • Using several blacklist services
    • Cleanup scripts that revert changes caused by recently identified spammers
  • List of steps not recommended by MediaWiki's manual
    • Don't use MediaWiki for low-volume/maintenance wikis with anonymous editing policies

--- Question here, and I'm not sure where else to put this: Do you have an autorevert script in mind or already set up? Do you need help with one? Is there a specific place we could discuss this? A thread on the forum seems like a good idea, but I don't see one. --- -- Oh, and if I could post this, then a spammer can post whatever they want, too. Requirin user registration, and hassling contributers until they're recognized as not spammers might drive off some users, but spam drives many more people off. --68.3.56.10 00:08, 3 May 2011 (UTC)

-- Over at the unofficial exalted wiki, I found that the most successful thing with fixing spam was switching to a straight type-in-the-word captcha on account creation - recaptcha and logic puzzles are pretty much cracked, but the spammers are generally not targeting single wikis, so something like 'type in the name of the wiki' cut my spam down to zero. - Xyphoid

I would suggest something more specific - the KH Wiki asks you to type in the name of the main protagonist, so something like that would be wise. Like "Who is the Perfect Warlord? Parson Gotti."24.13.125.86 05:12, 7 May 2011 (UTC)
I'll second this approach - spambots don't target sites in particular, they're just designed to crack common captchas, so site-unique captchas tend to kill them dead. They won't keep out deliberate spammers, of course, but they're far rarer than bots these days. --Tommy 21:13, 7 May 2011 (UTC)

-- Another ham handed, but simple, method would be to simply create a bot that automatically banned any new user with a username between five and eight characters in length (This is true of almost every single bot), and leave a note on the registration page that this happens. I merely add it here because the lazy option must always be presented. --Pickled Tink 11:21, 7 May 2011 (UTC)


-- I see you're running Apache. Do you have access to change Apache or is this shared hosting? If you can change your Apache configs and install modules I would highly suggest installing mod_security and then ASL Lite (free ruleset for modsec). http://www.gotroot.com Won't stop all your spam but it would cut down some of the nastier stuff. --

Strategery Suggestions

For Manual Hunters

There are probably more effective ways, in the long run. However, there's something to be said for the quality of wetware AI. So, if you care to start hunting through Uncategorized Pages or (ick) all the Recent Changes, you can. Once you find pages with Spam....

  1. Check the page history
    1. If the page is newly created, blank and label with {{delete}} to mark for speedy deletion.
    2. If the page existed before
      1. It may have been only vandalized once since last clean
        1. The history page has a handy "undo" link
        2. To help distinguish yourself from spambots, change "revision" to "vandalism" or variant thereof.
        3. You might also flag it as a minor edit
      2. It may have been vandalized more than once by spam-bot
        1. Start wading back through the changelog to find the last-clean variant
        2. Click timestamp link for that variant
        3. Edit, and give some manner of reverty-description for the summary
  2. Go back to that history, dude
    1. Find the spammer's change
    2. Click the link for the username or IP that made the offending change
    3. Check the associated user and/or talk pages
      1. See if there's already indication there that they're in [[Category:Spammer]] or [[Category:Banned]]
      2. If not, check the ban log to see if they're already banned
      3. Mark already banned accounts/IPs by adding {{banned}} to their talk and/or user page
      4. Mark accounts/IPs to be drawn to the Banhammer's attention by adding {{spammer}} to their talk and/or user page
    4. Check the user contributions; they may have more SPAM to their "credit"
  3. Go back to manual hunting...

For Bot Coding

...err... yeah, someone get on that

  • Install pywikipediabot (needs python
    • use python delete.py -cat:"Candidates for speedy deletion" -always to mass delete all pages in. Maybe there should be a category just for spam.
    • use spamremove.py spamsite.com to find all pages containing spamsite.com and removing the spamlink
    • maybe there are other useful bots in that package too
  • and, for erfs sake, start using MediaWiki:Spam-blacklist. Spamming will get less if they cant spam the same link twice.--Baumgeist 18:26, 11 May 2011 (UTC)
  • So far I have found 2 types of spam.
    • 1) replaces the text of a page with a random engrish compliment. All of these have a long stream of random letters in the comment field. A bot could probably flag all edits with gibberish for comments as spam.
    • 2) creates a new page that links to another website. The comments for these page creations all begin with Created page with '==<center>. Beginning a page with a centered level 2 heading (==) instead of a left justified level 1 heading (=) appears to be unique to the spambot. A bot could probably tag any edit that creates a page with this layout as spam.

For Wiki Work

Spotted... Abb3w 02:46, 8 May 2011 (UTC)

KittenAuth

I've got it! Use Kitten Auth. Normally it shows a bunch of pictures of different animals and asks you to select which one is the kitten, but you can customise it.

So you just load a heap of images of erfworld characters, particularly Wanda, and ask people to select the picture of Wanda. Should stop even the human opperators unless they actually know the comic, and its simple enough to switch to a different erfworld character like Ansom if they start to figure it out. http://www.mediawiki.org/wiki/Extension:KittenAuth --Charles 01:52, 8 May 2011 (UTC)

Looks like KittenAuth was effective for about 15 hours, uncustomized. Abb3w 05:31, 10 May 2011 (UTC)
Another possibility: Take your KittenAuth image library, make LOTS of versions of each picture (subtle changes to a couple of pixels, just recompressing it, etc). Looks like one of the more common ways of bypassing it involve building urlpath or md5 hashes of the good images. More images is better. More "good" images means less chance that an already-used (and identified) "good" image will be presented in this selection run. The core KittenAuth library has already been fingerprinted by a lot of spammers, so use your own custom pictures, and switch them out whenever its success rate starts falling. 68.116.159.234 21:11, 10 May 2011 (UTC)
You could probably generate the images on the fly. Have a large set of base images (and preferrably more than one question, so that the correct answer is not always the same for a given set of base images). Then, when you need to display the auth images, randomly alter (like, say apply some distorting filter) the base images to generate the images to display on that particular login attempt. The image generation would use a bit of CPU, so a downside would be it being a sweet target for DOS attacks.
You can probably reduce the amount of CPU required by the above system by using a small set of "good" images, loaded from the disk in uncompressed format, then adding some live text over them (which is fast when your source image is already uncompressed and no anti-aliasing is used), and JPEG-encoding the result on the fly. If you only modify the good image, a spammer who is really intent on breaking the wiki will have to fingerprint all the other images, then select the one that does not match any known fingerprint. To render statistical analysis unfeasible, you will need to run a timed script that generates a whole new set of bad images every so often (with the same system of adding text to the existing ones and saving them); once a day should be sufficient, unless the attacker is REALLY determined. This saves server CPU power with only a small cost in effectiveness. Incidentally, if you do add KittenAuth, you may want to make all images greyscale, for the benefit of color-blind users. 88.147.16.72 19:41, 11 May 2011 (UTC)
Go To:
Personal tools