WikiApiary talk:Operations/2013/March

API Redirects
Had a problem in segment 8 with Wikimedia Labs. The site was issuing a redirect to Wikitech and messing up the API request in the process, resulting in a return value that was causing User:Bumble Bee to throw an exception. I've deactivated Wikimedia Labs since Wikitech already existed. I've also made a note that User:Bumble Bee should not follow redirects. I'll add some code so that the bot issues a warning if it is given a redirect, and skips the site. Thingles (talk) 15:10, 1 March 2013 (UTC)

Audit Bee activated
User:Audit Bee was activated this morning and is auditing 10 websites at the beginning of each hour. We'll see how this works, deal with any bugs, and depending on how it goes may increase the rate of audit. Thingles (talk) 15:51, 2 March 2013 (UTC)


 * Early runs look fine, audit bee seems to be doing a good job. I'm increasing the audits to 10 sites every 30 minutes. Thingles (talk) 19:24, 2 March 2013 (UTC)


 * Things seem to be going okay and I'm not super patient so I've updated it to run every 15 minutes. That seems like a fine state to work through the thousands of sites in the backlog. Bumble Bee instances don't get overly backed up and it staggers sites on different intervals for updating. Thingles (talk) 02:38, 3 March 2013 (UTC)

Some sites activated by audit bee wrongly
I just fixed a bug in audit bee diff that was causing it to activate sites that it failed to audit. This is now resolved. Expect some errors to show up in Bot log for sites that were activated. You can tell that this was the case because audit bee activated and validated them, but did not set them as audited. Thingles (talk) 17:12, 3 March 2013 (UTC)

More than 1M property values set
Today WikiAPIary exeeded the 1,000,000 property values mark and is now one of the 10 biggest known Semantic MediaWiki installations. --&#91;&#91;kgh&#93;&#93; (talk) 23:25, 7 March 2013 (UTC)



New status: Defunct
There is a new flag for websites, Property:Is defunct. This flag is used to indicate that a site should not be checked by User:Audit Bee (or frankly any robots). The Web site is probably no longer available. It may also be used to indicate that the API is defunct, where a working wiki is not allowing API access use defunct to get rid of the errors in Bot log from attempting to connect to it. Thingles (talk) 20:42, 9 March 2013 (UTC)

First step to new graphs
I took the first step to getting the new graphs online. My plan is that websites will have three graphs displayed, and the user can control the defaults for those three slots as well as change them and create a new window to leave a graph up on the screen. To do this, I've wired Widget:Website graphs to use the Javascript code that is in the WikiApiary code repo. I also added the PHP, HTML and JS stuff into  in that repo (previously this only had the Python robots in it). If you're reading this and feeling like making the PHP and JS better, your help would be greatly appreciated.

A nice improvement from this is that the graphs on the Web site page resize with your browser. Finally. :-) Thingles (talk) 20:45, 9 March 2013 (UTC)

Initial audit complete!
All sites have been audited (or marked defunct)!

Completed audit 0 sites 0 succeeded 0 failed

Audit Bee will continue to run every 30 minutes to conduct new audits (Concept:Websites never audited) as well as refresh expired audits (Concept:Websites expired audit). I modified the script though so that it will make no entries in Bot log if there are no audits performed (previously it reported it did 0 audits, as shown above). Thingles (talk) 13:02, 10 March 2013 (UTC)

Tried Semantic Drilldown
You may notice I did some activity in the Filter namespace trying to get Extension:Semantic Drilldown working. I added two filters (Filter:Defunct and Filter:Bot segment) and even set values for them, but it seems that Drilldown still queries the database and every property to build the user interface? I'm not sure, it just times out now. I'm actually not even sure that Drilldown works with SQLStore3 and the newest Extension:Semantic MediaWiki so it may all be a moot point. It would be a nice way, particularly for operators, to slice and dice the wikis. I'm going to leave it be for now. Thingles (talk) 14:01, 10 March 2013 (UTC)

DST Rippling through
This is interesting. Now that DST has flipped, we are seeing timeoffset values all getting updated. example Thingles (talk) 03:19, 11 March 2013 (UTC)

Bumble Bee Managing Error State Completely
User:Bumble Bee is now properly updating Property:In error when he detects a website is having issues. He will also increment that error status on subsequent failures, and finally clear the error when the site is responding again. You will see messages in Special:RecentChanges like:


 * recording error (example)
 * incrementing error count to 5 (example)
 * clearing error (example)

The intended behavior is that when Bumble Bee fails to get a response the first time, he will set Property:In error to true, he will also set Property:Has error count to 1, Property:Has error message will be set to the error message and Property:Has error date will be set to the current time in UTC.

On each repeated attempt to talk to that site that generates an error Bumble Bee will update Property:Has error count and Property:Has error message. He will not update Property:Has error date. That date is intended to be when the error event started.

When (or if) Bumble Bee does successfully reach the site on another attempt, it will set Property:In error to false, but will leave the other fields as they are so that operators can see that information as they wish. But the flag for being in error state is false. If this cycle repeats, the fields will be reset for the new error.

Or at least that is how it is supposed to behave. If you see otherwise, please let me know. The code to do this is in this commit.

This is a pretty huge milestone. Now User:Audit Bee is doing all auditing and activating, and User:Bumble Bee is indicating errors and clearing them. The only remaining step is for User:Audit Bee to deactivate a site that has been in error for too long (2 weeks?).

Thingles (talk) 02:05, 12 March 2013 (UTC)


 * After the excitement of a nasty bug that put Bumble Bee in a total tailspin it's cool to see that the sites in error count has dropped from 209 to 104 from the clearing of error status when contact is re-established! Thingles (talk) 04:34, 12 March 2013 (UTC)


 * 84 now. Thingles (talk) 05:30, 12 March 2013 (UTC)


 * Well, this is just fantastic, it means that we can feed the bees almost unlimitedly and increase the honey production at will! The human work will still be manageable.
 * I think it's time to link this from the extension pages on mediawiki.org; if the 504 errors below are temporary and we don't risk killing the site, I'll do it shortly. --Nemo 06:46, 12 March 2013 (UTC)

Skin collection disabled
I've temporarily disabled skin collection due to the randomized order causing unnecessary revisions issue. I'll bring it back online when I've implemented sorting to insure that we wont get thousands of silly edits. Thingles (talk) 03:43, 12 March 2013 (UTC)

Fixed major bug
Wow. If you were just trying to use the site in the last hour it was terrible. I introduced a bug in my recent bot work that caused Bumble Bee to no longer update the timestamp that it last checked a Web site for stats. After a while it was trying to update nearly all websites every time it ran! Bad! This commit fixes the problem. Whew! Will take a few runs to normalize but will return to good behavior shortly. Thingles (talk) 04:16, 12 March 2013 (UTC)
 * I got an error 504 on a page, are things still normalising? --Nemo 06:39, 12 March 2013 (UTC)
 * It took a couple of hours for things to normalize. It's possible that was still while you were looking at it. Here is what happened to CPU during that period. Note graph time is US/Central. Thingles (talk) 14:57, 12 March 2013 (UTC)



Linked to from Mediawiki.org!
WikiApiary extension pages are now directly linking to WikiApiary extension pages to show usage! (see diff) Very exciting and big thanks to Nemo_bis for putting this in place! Thingles (talk) 17:21, 14 March 2013 (UTC)


 * FYI, I added some nice logic in MediaWiki:Noarticletext to deal with cases where the extension pagename on Mediawiki.org doesn't match the name that the extension itself reports. It will now show list of matches if there is no direct match using the URL that the extension has declared for itself. To see this, go to http://www.mediawiki.org/wiki/Extension:Uniwiki_Layouts or http://www.mediawiki.org/wiki/Extension:NaturalLanguageList and click on the "Check usage" link that goes to WikiApiary. Thingles (talk) 20:09, 14 March 2013 (UTC)


 * Yeah, thanks go to Nemo! This will be of great benefit for the MediaWiki community (users, admins, developers). I have already seen your Noarticletext tweak in action. Great! --&#91;&#91;kgh&#93;&#93; (talk) 20:25, 14 March 2013 (UTC)

Extension Name Conflict
I could use some feedback on this. When debugging some linking from Mediawiki.org on extensions I was looking at the Google Maps extension. If you click through to WikiApiary, it takes you to Extension:Google Maps, but this isn't right. There is another extension using that name. The real Google Maps extension is at Extension:Google Maps Extension. Since the page exists, the helpful searching doesn't help out. And since there is a page I can't just redirect it without blowing away the existing page (which granted, is a non-real extension used on a single wiki). I expect this will happen rarely, but will from time to time. Thoughts on how to deal with this? Disambiguation page or assume that people want the real Extension:Google Maps Extension and send them there? Thingles (talk) 20:13, 14 March 2013 (UTC)


 * There are several examples for this along the way. This is one of the reasons why I introduced notes for extensions. In this case the note could be: "Do not confuse this extension with the Google Maps Extension". --&#91;&#91;kgh&#93;&#93; (talk) 20:22, 14 March 2013 (UTC)


 * Nemo_bis was wise enough to put an override parameter. So, I fixed this example with this diff. It doesn't alleviate the issue around name conflicts, but at least now people that click from mw.o will be sent to the right place. We should keep this in mind as mapping is tweaked, we can control the inbound URL. Thingles (talk) 20:32, 14 March 2013 (UTC)


 * Edit conflict: The same problem is sometimes around with different names of the extension in the respective repo. This was however not really a true name conflict. A true one would have been if there is the same name for two different extensions. I think I remember three or four cases of this ugly kind. But again we should probably use the notes there too since it is not a widespread problem. --&#91;&#91;kgh&#93;&#93; (talk) 20:41, 14 March 2013 (UTC)

Alexander Mashintalk 02:45, 19 March 2013 (UTC)
 * You can consider identifying extensions by their URLs when provided, not names. For example, Extension:EmbedVideo (http://traditio-ru.org/wiki/EmbedVideo) and Extension:MathJax (http://traditio-ru.org/wiki/MathJax_for_MediaWiki) used by Traditio are not mw:Extension:EmbedVideo and mw:Extension:MathJax. The former forked long ago, the latter has been independent from the beginning.

Extension Version Details
I've added a new function to Bumble Bee (diff) that will attempt to determine version details for extensions. You'll see these new fields coming into the  pages over the next few days. Here is a diff as an example. These are not currently being picked up by Template:Extension in use, I'll do that later. Thingles (talk) 12:21, 15 March 2013 (UTC) After watching the Lua talk last night I was dreaming of not doing this in Bumble Bee, and being able to do it in a Lua module, but I'll have to wait a bit for that. :-)

Whacky extension name
Just saw this on Elder Scrolls Wiki/Extensions:

Ugh. Going to need to modify User:Bumble Bee to unpack that stuff. If anyone see's other stuff like that in extension names let me know. I already unpack this kind of thing for author names. Thingles (talk) 01:29, 18 March 2013 (UTC)


 * Wikia is an expert for this. I guess there are about 10 to 15 of this kind there. Basically this weirdness may happen with every field for extension. I have also seen people linking versions to some page. My suggestion would be to let Bumble Bee just ignore this and import nothing when it comes to names, versions (and credits). --&#91;&#91;kgh&#93;&#93; (talk) 14:18, 18 March 2013 (UTC)

Problems with Form:Website
Special:RecentChanges shows a lot of issues with the website form. I keep getting an error message when saving modifications to these sites and having to do a raw edit. Very confused why this just started happening. Any info or ideas? Please share. Thingles (talk) 10:51, 18 March 2013 (UTC)
 * Sorry for not checking the output of the form; I filled the required fields (correctly, I believe) and didn't see any error or warning. --Nemo 11:24, 18 March 2013 (UTC)
 * You shouldn't have to check the output. :-\ Thingles (talk) 20:19, 18 March 2013 (UTC)
 * Hmm..., so far I am experiencing any problems with the website form and my results seem fine. --&#91;&#91;kgh&#93;&#93; (talk) 14:13, 18 March 2013 (UTC)
 * Me neither, I've used the form multiple times and it's fine but we seem to be the exception. Thingles (talk) 20:19, 18 March 2013 (UTC)


 * This is more and more concerning. I'm not sure why this happens with all the new users? In fact, the first pages that they are saving shouldn't even be possible. Fields that Semantic Forms says are mandatory are not present. &#91;&#91;kgh&#93;&#93;, would you mind doing a test and creating a new account and trying to add? I'll try with my User:Jthingelstad account later. See these two page histories and the errors: history for Alpha Centauri Wiki and history for Bulbapedia. I thought maybe this was happening because people hadn't confirmed their email? But User:Nemo_bis has the same behavior and he's been around and long confirmed. Thingles (talk) 20:19, 18 March 2013 (UTC)
 * Hi, I've had this three times and can give any information about what I've been doing you think will be helpful to debug. I'm on firefox and email confirmed, the first site I uploaded an image and thought that was the problem, but apparently not. Everything seems to be going smoothly, enter info to form, hit save, then you even see the correct template code (with urls, settings, etc) and the option to save again, but when you hit save the second time it takes you to a numbered page with an empty template, none of the info you put in before. hmm.. I think it may well be the capacha. It blocks the original save, so you see the template, then the secondary save is no longer pointed at the right page? And that would explain why you guys can't reproduce it.--Ete (talk) 01:15, 19 March 2013 (UTC)
 * Aha! I bet it is the (damn) captcha. Are you answering the Captcha? And if so, what answer you giving? I’m guessing the captcha is perhaps case sensitive and not liking your response? Maybe I should only have the captcha for account registration? Thingles (talk) 01:24, 19 March 2013 (UTC)
 * I am answering it correctly ("wikiapiary", I'm assuming that's right because it lets me save with that), but it is only offered after the first save, by which time I think the normal saving process is interrupted. Try switching to capacha for account making only, that seems likely to solve it. I'll be around to test tomorrow if no one else has by then.--Ete (talk) 01:30, 19 March 2013 (UTC)


 * Okay. I just updated the settings with:

$wgGroupPermissions['emailconfirmed']['skipcaptcha'] = true; $ceAllowConfirmedEmail = true;
 * 1) Skip CAPTCHA for people who have confirmed emails


 * So, you shouldn't be required to captcha with confirmed email. If you can try adding a page and see what happens that would be great. Thingles (talk) 01:44, 19 March 2013 (UTC)
 * Note that this is not sustainable for long, spambots are able to confirm email. When it's confirmed being the captcha, please file another bug... --Nemo 06:29, 19 March 2013 (UTC)
 * Bug 46342 filed. 🐝 thingles (talk) 01:30, 20 March 2013 (UTC)
 * Tested, it's working now. The capacha should be made available on the form page to fix this, or perhaps use some other method of preventing spambots (an unchanging question is also not super secure).--Ete (talk) 17:06, 19 March 2013 (UTC)

Handling Flapping Sites
Now that User:Bumble Bee is updating error status of sites as it happens, we are seeing a number of sites that flap in error state a lot. See the history for Bionic Commando Database for example. It is in error as much as it is out. I've considered adding another property for the total number of errors a site has had ever, so we can easily handle sites that flap a lot. This isn't a huge problem, it just means there are a lot of edits. Some ideas:


 * I could back-off the collection frequency as error rate goes up. (double check every time for each error?)
 * I could track the total number of errors and then sites that flap just are deactivated.

Other ideas? Thoughts? Thingles (talk) 18:25, 18 March 2013 (UTC)
 * hm, backing off collection rate would still mean roughly the same % of checks to those sites would end up being labeled as in error. And deactivating statistics for sites like that seems unnecessary, unless they're consuming an unreasonable amount of bot time. The statistics are still useful even if a large portion of attempted collections don't work.


 * The edits don't seem like a big enough problem to do much about, but perhaps to reduce them if a site goes down often but repeatedly comes back up, set a property (unstable site? flapping site?) which indicates that the bot should not report errors for it with an edit, and only mark the site as inactive if its statistics are not updated for several days?--Ete (talk) 21:13, 20 March 2013 (UTC)

Double the cores
Linode announced all servers now have 8 cores (after a required reboot). I did a quick reboot and the host that WikiApiary is on now has a full 8 cores. I will be very curious to see how this affects WikiApiary. User:Bumble Bee uses a very parallelized approach to doing work, so these additional cores may help a lot. I'm not sure if things will seem snappier, but if they do let me know. I'll look at graphs in a few hours to see how things look and share any news. 🐝 thingles (talk) 20:29, 19 March 2013 (UTC)

So far it looks like there is just more idle to use up in the future. I would think that there should be less contention though which may result in better perceived performance. I'm curious if any of you using WikiApiary find it fast, slow or just normal? 🐝 thingles (talk) 01:19, 20 March 2013 (UTC)


 * Cool, should help as this site tracks more wikis. WikiApiary already seems pretty fast to me, though you know what would be a good place to check? WikiApiary :). Maybe it'd be cool to have the mean or mode response time for the last week along the top bar with Edit Index and other useful metrics, and a little thing saying how this compares to the other wikis? "This wiki responded faster than 83% of tracked wikis in the last week".--Ete (talk) 21:20, 20 March 2013 (UTC)


 * This indeed helped a lot. Editing pages is much faster now, even on transatlantic lines. :) --&#91;&#91;kgh&#93;&#93; (talk) 20:27, 26 March 2013 (UTC)

Order extensions alphabetically?
On the site display, it be handy for comparing installed extensions between wikis if the extensions were sorted alphabetically.--Ete (talk) 16:29, 20 March 2013 (UTC)

Rerunning Pavlo's crawler
A large number of the wikis were imported from Pavlo's dataset on wikis generated in 2008, but I'm sure we'd be able to get a whole lot more if someone reran it now. The source code is freely available, but my terrible internet and lack of programming ability to update anything if it goes wrong means I'm badly placed to do this. It'd be excellent if the manual work of adding wikis could be mostly replaced with a bot, at least for a vast majority of wikis. thingles likes the idea, but seems to want to focus on improving other bits of WIkiApiary himself, so maybe someone else is up for doing it?--Ete (talk) 23:13, 20 March 2013 (UTC)

Splitting Defunct?
It'd be useful, especially if the plan is to notify wiki managers of issues, to separate out sites which are entirely not working from those which just have API issues that prevent the bees from working. Sites which just have API issues we could notify by automatically editing a page (main page talk? we'd need to have a human help bypass anti-bot checks), and sites which are entirely unavailable would be listed separately (and if we have XML backups available (yay thingles likes that idea), listed as a site we can restore). It'd also give clarity for those editing properties, "Down" and "API unavailable" are much more clear than "Defunct" to me.--Ete (talk) 23:32, 20 March 2013 (UTC)

A clear link to this page/other main discussion hub
I find it annoyingly well hidden (operations too), putting a link in the sidebar or on the main page would be handy and help encourage users to get involved or at least see what's happening.--Ete (talk) 23:37, 20 March 2013 (UTC)


 * There is one to Operations in the Operators section of the sidebar. 🐝 thingles (talk) 01:38, 22 March 2013 (UTC)


 * Dynamic link for Ops Talk added. 🐝 thingles (talk) 02:23, 22 March 2013 (UTC)


 * Okay, that should help. I'd recommend having a link from the main page as well though.--Ete (talk) 17:24, 22 March 2013 (UTC)

Spammed
I think it'd be handy to be able to identify sites which are constantly spammed without control (stricter definition, 10+ undeleted week old clearly spam pages?) for notifications. This could be done by tag or operator flag, I'm thinking tag may be better but sometimes spammed sites may not show up as inactive simply due to spambot edits so they won't be automatically marked inactive (and maybe will be automatically marked active?). So maybe there's a case for operator flag and human checking every once and a while for whether there's any non-spam activity.--Ete (talk) 01:01, 22 March 2013 (UTC)


 * I would like to consider what if anything should be done with these. A perfect example of this is OLAP. This wiki is so horrendously owned by spammers. It looks like it's on fire with edits and activity, but it's a total mess. Would love to hear thoughts on this. Very related to this. I've been noodling having User:Bumble Bee pull the  log for WikiApiary sites (new users on OLAP) and put this in the Apiary DB. If I did this, I could probably start to automatically know if a site is spammed or at least develop a likelihood of that. 🐝 thingles (talk) 01:52, 22 March 2013 (UTC)


 * I noticed that today - almost all of the highly active (page/user change), smaller wikis are active due to spam. It's a shame! Needs something like the Google Webmaster Tools warnings you get for an insecure certificate. GreenReaper (talk) 03:39, 22 March 2013 (UTC)


 * My suggestion would be to have Notify Bee edit Talk:Main Page and/or email the first user created on the wiki, with some human help past any capachas (though sites which have major spam problems fortunately tend to be those without capachas, or with extremely simple math ones). It would inform them of the situation and give details on how to prevent spam (what to upgrade/configure, etc).--Ete (talk) 17:24, 22 March 2013 (UTC)

New feature: Semantic usage!
I added a new capability to WikiApiary this morning that I've been wanting since I very first started building this. Check out Collect Semantic MediaWiki usage setup. This generates graphs of SMW usage for a wiki. Check out the bottom four graphs for WikiApiary. The data just started collecting, it will be fun to see this grow. I also think this may be interesting to the SMW authors to see common results format and how extensively queries are in semantic mediawiki websites. 🐝 thingles (talk) 15:22, 23 March 2013 (UTC)


 * Okay, I changed this dramatically and moved away from YAML and instead an using JSON on the target pages. This is working much better already. I pinged User:Kghbln and User:Garrickvanburen to update the wikitext on their wikis. I updated Traditio myself since I can do that anonymously. I'll watch this. You can see the Bumble Bee code is cleaner now, and the page will be much less fragile to individual wiki idiosyncrasies. Thanks for testing this with me guys. All three of you implemented it and had three different problems. I'm hoping this version is much cleaner. 🐝 thingles (talk) 01:44, 27 March 2013 (UTC)

More resilient JSON parsing
I just put in a change for User:Bumble Bee that will make him more forgiving of the JSON returned from sites he talks to. Previously I was taking the API response in JSON and directly parsing it. I'm now running it through a regex (as you can see in the diff) that will look for a JSON block inside of the return. This will hopefully help out some of the  errors we see. Here is an example of what one of those returns were:

Warning: Cannot modify header information - headers already sent in /usr/share/mediawiki/includes/api/ApiFormatBase.php on line 107

{"query":{"statistics":{"pages":1161,"articles":274,"views":72987,"edits":7371,"images":247,"users":851,"admins":3,"jobs":0}}}

Warning: Cannot modify header information - headers already sent in /usr/share/mediawiki/includes/api/ApiMain.php on line 253

Warning: Cannot modify header information - headers already sent in /usr/share/mediawiki/includes/api/ApiMain.php on line 254

For better or for worse, with the changes I just put in this will now find the JSON block in the return and work without throwing an error. :-) PS: I had a bug in this code on the first run that marked a couple hundred sites in error but they are clearing out now. This change is big, since it's in the core library that the bots use. If you see anything suspicious please post here. 🐝 thingles (talk) 02:55, 27 March 2013 (UTC)