WikiApiary talk:Operations/2013/April

Audit Bee Change
Audit Bee got hung up trying to audit Ubuntu-Forum Wiki. It seems that this install has been modified to hide its version information. When queried via the API it returns:

"generator":"MediaWiki MediaWiki"

Audit Bee was throwing an exception on this and never auditing it. I made a change so that this will no longer stop Audit Bee. You will see a log message that looks like this:

Ubuntu-Forum Wiki Unable to determine version from MediaWiki MediaWiki. Auditing without confirming any flags. Operator please check.

This allows the audit process to not get hung up, even though it really cannot be audited automatically.

🐝 thingles (talk) 01:23, 3 April 2013 (UTC)


 * Ouch, I suspected this might happen. I had hoped that Bumble Bee will find the version somewhere. Still it had a good cause since Bumble Bee is better than ever. :) --&#91;&#91;kgh&#93;&#93; (talk) 20:50, 4 April 2013 (UTC)

More memory
I was excited to see today's announcement that Linode had doubled the memory on their instances. I requested the upgrade right away, which ended up causing 15 minutes of unavailability but then I'm back with 2x memory. I expect this will keep WikiApiary a little bit snappier, and I will be tweaking some memory settings further once I see how things shake out.



🐝 thingles (talk) 01:09, 11 April 2013 (UTC)


 * I believe things are more energetic now. :) --&#91;&#91;kgh&#93;&#93; (talk) 17:38, 12 April 2013 (UTC)

Upgraded to 1.20.3
I finally upgraded my farm to 1.20.3. I also updated the majority of extensions via git. If you notice anything off please let me know. 🐝 thingles (talk) 02:09, 11 April 2013 (UTC)

Wikia?
I added a bunch of Farm:Wikia wikis tonight for fun. I’m curious what people think of adding Wikia sites en masse. If nothing special is done it would totally skew extension counts. It would also make the WikiApiary storage and polling needs increase a lot. Thoughts? 🐝 thingles (talk) 03:59, 11 April 2013 (UTC)


 * We should definitely start to separate statistics and counts since it is most and more interesting to see what the others, i.e. other than farms, do in the wikiverse. This provides e.g. "much" better information on which extensions are actually used rather than just installed and probably not used (probably the case for farms). What about starting out with "smaller" farms like Shoutwiki etc. rather than Wikia which seems to be just to big at the moment. --&#91;&#91;kgh&#93;&#93; (talk) 14:30, 11 April 2013 (UTC)

Language categories
I modified Template:General siteinfo to set categories for websites with the language that a wiki is set to. I mostly like to stick with semantic properties, and Property:Has language exists but I think categories will be useful for visitors and queries. Additionally the categories use the language name where the property is using the language code. The category pages all just transclude Template:Language category so they can easily be added and modified. They are all subcategories of Category:Languages. The names of the categories are derived from the language codes using Template:Language label. There are still a number of language codes identified in Special:WantedCategories that I didn't put mappings in Template:Language label for yet. If others want to fill the rest out feel free, or I'll get the rest later. 🐝 thingles (talk) 02:46, 12 April 2013 (UTC)


 * Cool idea and never forget the job queue ;) --&#91;&#91;kgh&#93;&#93; (talk) 17:03, 12 April 2013 (UTC)


 * That's a good point &#91;&#91;kgh&#93;&#93;! To help out, I added a near-realtime display on the job queue length in Operations (diff). Big win using Extension:External Data! :-) 🐝 thingles (talk) 17:15, 12 April 2013 (UTC)


 * That's great. I could have thought about it in the first place. For admins the job queue is really good to have though one really has to know what triggered them to be able to interpret it better. Yeah, I should have a closer look at the Extension:External Data extension. I guess I underestimated it. --&#91;&#91;kgh&#93;&#93; (talk) 17:34, 12 April 2013 (UTC)

Defunct overrides everything else
By the way, I noticed some recent edits and wanted to highlight that when you mark a site as defunct, you do not need to uncheck the other flag fields (active, validated, etc). Defunct is an override for everything else, so the site will not be considered active, User:Bumble Bee will not bother with it, etc, regardless of what it's active setting is if it is marked defunct. Just FYI. 🐝 thingles (talk) 17:21, 12 April 2013 (UTC)


 * Oops, I did this all the time. :| Thanks for the re-vaccination. :) --&#91;&#91;kgh&#93;&#93; (talk) 17:35, 12 April 2013 (UTC)

Bug and bad behavior fixed for extension names
Saw a nasty issue with Beauzons.com today. It looked like User:Bumble Bee was broken but the real issue was that this site had an extension with the name. This presented two problems. First, it caused SMW to throw an exception when processing the subobject declaration which kept the page itself from even rendering right because of the. This is the problem I saw, as autoedits to the Semantic Forms API failing and throwing an exception. Changing the  to a   removed that. However, the  and   signs are illegal as well. So, I made a helper function to filter these out and this should be all fixed now. Illegal characters are just being dropped, and equals is being changed to a dash. I fixed Beauzons.com/Extensions (change) by hand for now, Bumble Bee currently isn't able to reach the target. 🐝 thingles (talk) 18:42, 14 April 2013 (UTC)


 * Since this change Bumble Bee has been creating a lot of new extension pages with titles which include the URL, and no URL listed.--Ete (talk) 15:10, 15 April 2013 (UTC)


 * Thanks. Yeah, this diff shows what's happening. Those pages were previously invalid properties. I'll add code to attempt to unwrap these wikilinks hiding it titles. Ugh. :-\ 🐝 thingles (talk) 15:32, 15 April 2013 (UTC) PS: After I fix that, more new extension pages will be created and then the bad ones can be deleted.

Even more SMWInfo
I just saw an email on the semediawiki-user mailing list highlighting SMWInfo properties that were new to me. I tested the call on WikiApiary and indeed I get valid results (I'm running 1.9 alpha):

{  "info":{ "propcount":1264452, "usedpropcount":129, "declaredpropcount":"111", "proppagecount":118, "querycount":"52833", "querysize":"51359", "conceptcount":"8", "subobjectcount":"72581" } }

I'm going to be modifying User:Bumble Bee soon to collect these new items: querycount, querysize, conceptcount, subobjectcount.

Comparing these new values to Collect Semantic MediaWiki usage setup it looks like querycount and conceptcount are now in the new SMWInfo call, although the value I have for querycount is different than this. Querysize looks to be a sum of all sizes. Subobjectcount is a welcome addition that I didn't have in mine. Thanks to mwjames for adding these in response to bug 46458! 🐝 thingles (talk) 15:11, 15 April 2013 (UTC)


 * Okay, I just pushed a change (commit and commit) for User:Bumble Bee to request and store the four new stats that Extension:Semantic MediaWiki 1.9 return. Here is a screenshot to show this in the apiary_db. This isn't available for graphing yet. The PHP data accessors for the graphing need to be completely rethought and merged into one script. Later, at least now the data is getting collected. 🐝 thingles (talk) 01:26, 16 April 2013 (UTC)



No more Oakleys or Raybans
You may have noticed some spam user accounts being created matching "Rayban*" and "Oakley*". I just modified the wiki to deny any registration starting with those strings.

function DenyRegistrationByUsername( $user, &$message ) { $username = $user->getName; if (preg_match( '/^Rayban/', $username ) OR preg_match( '/^Oakley/', $username ) ) { $message = 'The username '. $username. ' is banned on this wiki.'; return false; }       return true; } $wgHooks['AbortNewAccount'][] = 'DenyRegistrationByUsername';

If you see anything weird let me know. I did a test and confirmed it works. If other patterns show up I can easily modify. The next step might also be to feed these to fail2ban and block the IP addresses that attempt to register these. 🐝 thingles (talk) 03:39, 19 April 2013 (UTC)

SMW changes for better stats!
Just FYI, check out Bug 46458. MWJames has made some more changes in SMWInfo that will enable even more stats for Semantic MediaWiki sites! So awesome! 🐝 thingles (talk) 14:07, 19 April 2013 (UTC)

Suspend a site?
As I've watched some  activity and have been looking at sites that are in error myself I've been thinking that there might need to be something less severe than just marking a site as defunct. For example, User:Kghbln marked BromWiki (en) as defunct appropriately so as their API is returning PHP errors. However, the wiki itself is up. This is a reasonable thing to do, otherwise this wiki will generate errors for the foreseeable future. This makes me wonder if there shouldn't be an option to suspend a site?

For example, with BromWiki (en) another option would be to suspend checking the site for 7 days. Or 14 days. Or 2 months. Then let User:Bumble Bee check again and see if they have fixed things. Marking as defunct will take the site out forever. Thoughts? Perhaps this is too complex? a Another approach would be to have User:Audit Bee check defunct sites very infrequently to see if they have "unfunct" themselves. I've got reservations about doing that though. Thoughts? 🐝 thingles (talk) 11:31, 21 April 2013 (UTC)


 * I think the "active" marker could serve this purpose. As soon as I uncheck "active" Bumble Bee should stop checking the website for x days, probably a fortnight. "Defunct" is indeed more for wikis which are not longer there at all. These could also be revisited infrequently to be sure, but we should not worry about them to much. --&#91;&#91;kgh&#93;&#93; (talk) 11:23, 22 April 2013 (UTC)


 * Yep, I think splitting defunct is a good idea. Additionally, perhaps the split can be automated, so long as the bot can load up the main page and check if it gives an error. API not working->check main page, if only API is down mark as API unavailable, those with both down marked as down. Then have the sites with issues checked once every 1 hour*number of errors in a row^2 (or similar formula, perhaps using the time since last working to avoid needing to record number of errors), so a site would still get rechecked automatically a few times after being taken down, but would not be constantly checked once it's been down for a while.--ete (talk) 15:47, 22 April 2013 (UTC)

Welcome Backup Bee!
You may have noticed that I added some new properties related to backing up websites. This is all very experimental! This morning I hacked on User:Backup Bee for a while and I have him to a functional state for at least one backup type. Take a look at his code if you wish. Comments welcome. Only the "Snapshot (text)" backup option is supported right now. This bot will only run against a site once a week using the Current day by hour segments groupings, see Backup schedule for just backups. You can also see Backup Bee's log file. I've got this bot running in a debug mode right now to see how he is going. You'll see him write to a "Backup log" subpage on wikis that he backs up to (e.g., Wiki thing/Backup log). Would anyone be willing to volunteer to test a restore of the dump file? Let me know if you would do that. Please do not enable this on a wiki with more than 10,000 or so pages right now. Exciting stuff!

''Huge credit to wikiteam who made their dumpgenerator.py code available. This is doing the hard work with User:Backup Bee just directing it.''

🐝 thingles (talk) 16:38, 21 April 2013 (UTC)


 * Very cool. I may be able to test backups at some point (and would be fine with data from my wikis being used for any tests), but don't have shell access to either wiki right now (should get it for the larger wiki soon, but it has 21.6k pages) and have a few other tasks to do first once I get shell. Would it be okay to enable backups for my smaller (1.6k page) wiki anyway?--ete (talk) 15:47, 22 April 2013 (UTC)


 * Yeah, go ahead and enable it on the small ones. I need more wikis to test with. Note that only "Snapshot (text)" works right now so pick that. 🐝 thingles (talk) 17:49, 22 April 2013 (UTC)


 * I could do tests with CAcert in Berlin using SMW. --&#91;&#91;kgh&#93;&#93; (talk) 14:40, 23 April 2013 (UTC)


 * CAcert in Berlin has been backed up for the first time, see CAcert in Berlin/Backup log. The resulting file is less than 100k, and you can download it here. Please share how it goes. 🐝 thingles (talk) 20:39, 23 April 2013 (UTC)


 * Some wikis and farms will have their own regular backups; perhaps a syntax for linking for them might be useful? It would probably not be feasible to backup larger sites remotely on a regular basis, nor do I think many administrators would welcome such attempts. GreenReaper (talk) 20:14, 23 April 2013 (UTC)

Planning upgrade to MediaW iki 1.21
I'm crazy excited to get upgraded to MediaWiki 1.21. I'm considering installing the 1.21rc4 candidate. I'm specifically really looking to dive into Extension:Scribunto. I think WikiApiary may benefit from some Lua capabilities. Any comments on the ugprade? Any of you done it? I know from Statistics that only 3 sites are using 1.21rcX so I'm guessing nobody here has done it yet. :-) 🐝 thingles (talk) 16:55, 22 April 2013 (UTC) PS: It's such a bummer that over 60% of the wikis monitored are 1.17.x and older. :-\


 * In case you cannot wait until mid of May you could try to install MW 1.21. WMF is already beyond this RC so ... Cheers --&#91;&#91;kgh&#93;&#93; (talk) 14:46, 23 April 2013 (UTC)


 * I'd just grab the REL1_21 branch from git using something like:

git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git git checkout -b REL1_21 origin/REL1_21
 * You can then use  to keep it up to date. When REL1_22 is out, check that out instead.
 * As for 1.17, I'm guessing many people are happy with it. It works well, there's been few compelling reasons to upgrade, and you have to switch from SVN to Git, which is an additional hassle along with the normal extension and patch tweaks. :Incidentally, the 1.17.x and lower" graph on statistics page seems a little off - I think you need to adjust it to order 1.9 below 1.17 (and 1.16, etc. - I'm sure there are plenty on older versions that aren't represented in the graph; clearly lots are still on 1.9!). GreenReaper (talk) 19:45, 23 April 2013 (UTC)

Changed account creation from Questy to Recaptcha
WikiApiary (and all of my wikis) have recently had a big jump in spambot registrations. For now, I decided to switch to using reCAPTCHA on account registration. I've left everything else, so as soon a user performs email confirmation they will not need to provide captcha. Effectively, this really means that captcha is only for registration. If you see any issues please let me know. If you think reCaptcha is a terrible solution I'm all ears on other approaches. 🐝 thingles (talk) 11:54, 23 April 2013 (UTC)


 * Hmm, I am a bit worried about ReCAPTCHA. So far I have had much better results with questy. Probably changing and hardening the set of questions is a way, e.g. "Type in the third letter of the third word." or "Enter the result of five plus three instead of 8+9" or "Type in ZGL in reverse order" and nasty things like this. Cheers --&#91;&#91;kgh&#93;&#93; (talk) 14:44, 23 April 2013 (UTC)


 * I saw two registrations come through after ReCAPTCHA. So, I went back to Questy but am trying something dynamic. Using these two for now to see how it works out.

$wgCaptchaQuestions[] = array (   'question' => "What day of the week is it at Greenwich Mean Time (GMT) right now?",    'answer' => gmdate("l") ); $wgCaptchaQuestions[] = array (   'question' => "In 24-hour format, what hour is it in Greenwich Mean Time (GMT) right now?",    'answer' => gmdate("G") );


 * Thoughts? I'll probably add some spelled-out-number ones as well. 🐝 thingles (talk) 15:08, 23 April 2013 (UTC)


 * Also be sure you're making full use of DNSBLs. WikiFur currently uses  (our server is in Europe, you might want to reorder these for shortest ping times from your server). GreenReaper (talk) 19:30, 23 April 2013 (UTC)


 * Thanks for the suggestion. I do the stopforumspam import weekly. I just added and enabled the $wgDnsBlacklistUrls too. I've probably put in enough stuff now that I should just sit tight and see what happens. :-) 🐝 thingles (talk) 20:29, 23 April 2013 (UTC)


 * I also just added another one that I think will prove very effective.

$myChallengeString = substr(md5(uniqid(mt_rand, true)), 0, 8); $myChallengeIndex = rand(0, 7) + 1; $myChallengePositions = array ('first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth'); $myChallengePositionName = $myChallengePositions[$myChallengeIndex - 1]; $wgCaptchaQuestions[] = array (   'question' => "Please provide the $myChallengePositionName character from the sequence  :",    'answer' => $myChallengeString[$myChallengeIndex - 1] );


 * I tested this and it seems that Extension:ConfirmEdit works fine with the answer being dynamic like this. If you are so inclined I'd love people to create some accounts to make sure that I haven't broken anything in the process of making this much stronger. Just share the username so the account can be deleted. My testing on another wiki with these rules suggests they work fine. Questy appears to store the answer each time so the randomness appears to work fine. 🐝 thingles (talk) 20:29, 23 April 2013 (UTC)