WikiApiary talk


From WikiApiary, monitoring the MediaWiki universe

← Previous month Next month →

First pull request!

A new milestone for WikiApiary. The very first pull request was made to some of the code on Github. Very small change, but great to see someone looking at it and contributing! Thanks Philip Becker! 🐝 thingles (talk) 17:01, 8 May 2013 (UTC)

Cool! --[[kgh]] (talk) 17:21, 8 May 2013 (UTC)

Upcoming server maintenance

Just a note to let people know that sometime soon I expect to move the database for WikiApiary to another server. Right now everything for WikiApiary and all of run on a single box and the load is getting a bit much. I expect the process to take about 2 hours during which I'll be setting $wgReadOnly and suspending all bots. 🐝 thingles (talk) 19:37, 8 May 2013 (UTC)

Status update on this. I have a 2nd host fully provisioned and setup in Linode now to move the database for WikiApiary (and my entire farm) to. Currently I have everything on a single host. The new DB host is using a 64-bit OS and I'm also using this as an opportunity to move from MySQL to MariaDB (see also Wikipedia adopts MariaDB). The host is setup, MariaDB is running and now I just need to migrate the databases and switch configs to use it. Of course, that's the delicate part. :-) 🐝 thingles (talk) 12:35, 11 May 2013 (UTC)
db server is running and active. MariaDB setup and running. I'm going to do a full export/import rather than simply copying the data files over so that I can also defragment the tables in the process. 🐝 thingles (talk) 19:56, 11 May 2013 (UTC)
I was happy to see that there are about five other SMW sites that are using MariaDB as well. I made Semantic statistics/Databases to check this out. Hopefully there aren't any surprises once I make the switch. 🐝 thingles (talk) 19:58, 11 May 2013 (UTC)
I am a bit surprised that Wikitech is not working with MariaDB - I think they soon will. I do not really expect problems at the moment. Presumably MySQL and MariaDB will probably more and more diverge with time. Thus issues may arise. Rather than working on SQlite support the focus should be put to MariaDB as the second database system supported by SMW. However, I am sure that there was already mail on the dev-mailinglist. --[[kgh]] (talk) 20:52, 11 May 2013 (UTC)
I just finished moving the memcached service off of the webserver this morning to further separate things. Please let me know if you see any issues. 🐝 thingles (talk) 12:38, 18 May 2013 (UTC)

Extensions with protocol-relative URLs

Mark Hershberger mentioned that there were a number of extensions showing an error with their URL. It turns out a few extensions report their URL using a protocol relative URL. I'll avoid editorializing on the wisdom of that and instead just put a handler for this in User:Bumble Bee. Now any extension URL that starts with "//" will have http: prepended to it. Note that this only impacts the Template:Extension in use template that is used to aggregate data for extensions. Mainly this change will just remove warnings, it won't change any information on extensions statistics, counts or versions. When I put this fix in there were 606 Property:Has extension URLs with Property:Has improper value fors. Over the next day as Bumble Bee updates extension information this should drop dramatically. The current value is 12. 🐝 thingles (talk) 15:30, 10 May 2013 (UTC)

Example of this fix propogating. 🐝 thingles (talk) 15:33, 10 May 2013 (UTC)

Site seems really slow

I've been trying to add the Club Penguin Wiki Network wikis to Wikiapiary, since I'm staff there.

However, the site is being very slow for me. I get about 40ms ping, so it can't be my internet connection, but the server. time curl tells me it took 4.199s one time for the site to load and then afterwards sticks around 2-2.5s.

Any other clues as to why it's being slow? I have a host I can recommend if you run out of choices, I use them for my wikis and they are great. --Tux (talk) 05:32, 11 May 2013 (UTC)

Heiya Tux, yeah there is a lot happening in the WikiAPIary making it slow. I guess this problem has already been identified and will be fixed. Cheers --[[kgh]] (talk) 07:36, 11 May 2013 (UTC)
Thanks for the note and I'm definitely working on this. There are three main issues causing slowness right now: MySQL and PHP5-FPM are fighting for CPU on a single box, at night (and notably while your edits were happening) the backups cause additional load and lastly User:Bumble Bee needs some more intelligence about not rewriting already saved data. I've provisioned the host to move the database over already. I'm taking a little bit more time to move to a fresh 64-bit distro for the new DB host and also would like to use MariaDB. But, I might try to finish this this weekend. Sorry for the issues, this is a top priority for me (and sometimes why you see less of me in Special:RecentChanges and rather in SSH windows and text editors. :-P). 🐝 thingles (talk) 11:57, 11 May 2013 (UTC)
I'm curious by the way what host you were referencing above? 🐝 thingles (talk) 11:57, 11 May 2013 (UTC)
Yeah, tbe wiki does seem a lot faster now (I was editing at around midnight after a really shoddy school party). By the way, it's RamNode but they might not like the CPU-intensive stuff you're doing right now. I had to configure nginx and php-fpm correctly in order to not make the CPU scream, but once I did things are very speedy and the performance is very consistent. --Tux (talk) 13:03, 11 May 2013 (UTC)
By the way, if you want to know what I mean, check any of the Brickimedia (or CPWN, though there's not a whole lot of data) wikis out. The spikes are were when one of the dumb "system administrators" disabled APC and such and I had to fix it. But once it was done, it was speedy once again. --Tux (talk) 13:08, 11 May 2013 (UTC)
If you look at the API response time of WikiApiary one can see that performance degraded step by step. While it would be very cool to have more power performance is still ok for me since I am currently not power editing. :) --[[kgh]] (talk) 15:39, 11 May 2013 (UTC)
More power is coming soon kgh. db server is up and just needs data moved to it. 🐝 thingles (talk) 19:55, 11 May 2013 (UTC)
No worries. :) --[[kgh]] (talk) 20:42, 11 May 2013 (UTC)

New database online

I took about an hour of downtime this morning and put the new database server online. I decided to do a copy of the database over which means I still need to run an optimize. I've brought all the bots back online and everything looks okay. I'm guessing performance is going to be a bit variable for a bit while I tune things for this setup. I've given InnoDB a 1G buffer on the new server which should help a lot. And now with mysql not running on the webhost I should be able to increase the php5-fpm pool too, but I'm going to be very slow with that. 🐝 thingles (talk) 12:56, 12 May 2013 (UTC)

Maria Maria :) --[[kgh]] (talk) 16:42, 12 May 2013 (UTC)
I'm still poking at performance items (and plan on doing so in a bunch of areas for a bit). Anyone can see the performance metrics from the servers behind WikiApiary on my Munin page. If you see anything problematic please let me know. 🐝 thingles (talk) 21:30, 12 May 2013 (UTC)

Handling Error Information

I've been digging into performance and looking at what I might need to alter to get things faster, and keep them that way. A few weeks ago I made a change to record error information directly into the wiki and I think that was a mistake. It turns out that when collecting data from thousands of wikis that it is very, very typical for many of them to not be responding at any given time. On many of these that results in as many as 9 or 10 SMW Ask API queries as well as some edits, sometimes multiple ones. If you look at Special:RecentChanges and show bots you can see that the vast, vast majority of edits are simply recording error state for websites. This was a mistake.

My plan is to modify the ApiaryDB and integrate this detailed error information into it. It is sort of their now in the generic form of the Bot log but that isn't actually connected to the sites themselves, like the statistics information is for example. Instead, I'm going to start recording collection errors in a manner more like the statistics data itself. I'll also pull the current data into SMW properties, just like I do right now with the usage information. This means it will sometimes be out of date, but I think that is fine.

The added benefit of this is that it will allow charting of errors, for what it's worth.

If anyone else is interested in helping hack on this Python stuff let me know. 🐝 thingles (talk) 21:24, 12 May 2013 (UTC)

I cannot hack but comment. I guess it is a good approach to switch this to the stats behaviour. All these edits are also unnecessarily inflating the database. So far I have used these error edits as a "problem notificator" since I watched these pages but I think that Notify Bee can step in here somehow. --[[kgh]] (talk) 07:12, 13 May 2013 (UTC)

Semantic Query Optimization?

I'm wondering if others know the answer to this optimization question. I'm doing a SMW Ask query that looks like this (note, this is Python code but it's easy to see the query itself):

         my_query = ''.join([
            '[[Is defunct::False]]',
            '[[Is active::True]]',
            '|?Has API URL',
            '|?Check every',
            '|?Creation date',
            '|?Has ID',
            '|?In error',
            '|?Collect general data',
            '|?Collect extension data',
            '|?Collect skin data',
            '|?Collect statistics',
            '|?Collect semantic statistics',
            '|?Collect semantic usage',
            '|sort=Creation date',

My question is regarding the six different properties for collection. I've been considering modifying the wiki so that there is just one property, something like Collect flags and it would have multiple values set for each item to collect. Then on this query, I would just pull that one flag property and handle it appropriately. Would this make the query easier/faster for SMW to handle? 🐝 thingles (talk) 21:28, 12 May 2013 (UTC)

What about setting up fixed properties for the different collection flags? This should probably speed up things. However, the documentation on them does not seem to be up to date. Also this is not update proof. --[[kgh]] (talk) 07:06, 13 May 2013 (UTC) PS I will poke Nischayn to update the documentation.
I had totally forgotten about fixed properties (and I was so excited for them when they were on the roadmap!). It's not terribly confidence inducing to see you have to edit a file inside of SMW that has the comment @todo Move these to somewhere else? above it. :-) I think I'll avoid for now, but keep an eye on it. 🐝 thingles (talk) 21:40, 13 May 2013 (UTC)
That's what I meant with not update proof. :) Somehow these should be called via LocalSettings.php or even set there directly. Probably a tracking bug should be created in case there is not one about this already. --[[kgh]] (talk) 21:47, 13 May 2013 (UTC)
It would be really cool to have fixed properties here on WikiAPIary since this is a really interesting feature which should roll into general use earlier than later. Thus I have added a enhancement bug (48841) to move this a bit more to the front burner - hopefully. Cheers --[[kgh]] (talk) 16:03, 26 May 2013 (UTC)

Form editing slow

Just wanted to share that there seems to be a bug in SMW 1.9a that is causing generation of a lot of link tags on form edits. See bug 48486 for details. This does cause a notable delay when you hit edit with form. Hopefully it gets resolved quickly. 🐝 thingles (talk) 01:42, 15 May 2013 (UTC)

I realised this being pretty slow. Good to know that this being worked on. --[[kgh]] (talk) 19:29, 15 May 2013 (UTC)

MediaWiki 1.21 plans

1.21 should drop tomorrow. You can expect that I'll be upgrading WikiApiary (and my entire farm) very quickly after the release, within a day or two. When I do the upgrade I'm also planning on enabling Extension:Scribunto and Extension:Echo. I'm very curious to see if the Lua modules allow for some speedups and additional capabilities in certain areas. Just FYI. 🐝 thingles (talk) 01:43, 15 May 2013 (UTC)

Bah, release got pushed to this weekend. 🐝 thingles (talk) 16:59, 15 May 2013 (UTC)
Back to Life --[[kgh]] (talk) 22:16, 15 May 2013 (UTC)

Extension performance?

I'm starting to think that it's maybe an extension that is causing WikiApiary to be sluggish recently. I've poked at a lot of things and I'm confused by the performance. As kgh noted, clearly there were 2 times when performance took a stairstep up, and that doesn't match with any of my code changes. Is there a better way to assess extension impact other than just selectively disabling one by one? 🐝 thingles (talk) 13:44, 18 May 2013 (UTC)

Yeah, sometimes extensions my cause performance issues, but rarely with such a behaviour as observed here. In the meantime the reason became clear. I think this debug setting is quite a nice one too in case you want to find out more: $wgDebugToolbar = true;. A goody from the 1.19 branch. :) --[[kgh]] (talk) 09:34, 19 May 2013 (UTC)
Wow! I wasn't aware of $wgDebugToolbar! That is great! Thanks for sharing that. Now, if there were a Semantic MediaWiki tab in the debug toolbar! :-) That would be amazing. 🐝 thingles (talk) 12:44, 19 May 2013 (UTC)

Using compressOld.php?

I'm wondering if people have any reasons to avoid mw:Manual:compressOld.php? I made the mistake a long time ago of using the mw:Manual:$wgCompressRevisions which broke Extension:Replace Text, but the compressOld.php has the advantage of being able to leave the most recent revision uncompressed, which is really the only one I ever care about using things like Replace Text with. It seems that running this consistently would keep the database a bit cleaner and faster, right? I did a test run on a couple of my wikis and verified that Replace Text still worked fine. 🐝 thingles (talk) 18:44, 18 May 2013 (UTC)

This is quite a common mistake every admin makes. People find out too late about this issue (I have gone through this, too). For this reason I added a warning about this behaviour to Aaron's page. To my experience Replace Text deteriorated gradually and not immediately, however in the end it always turned out to be completely useless after a while. Since you can run bots here it would not be a problem for you to enable this setting but all others would miss out. I advise to postpone this decision which should be ok in the light of recent discoveries. :) --[[kgh]] (talk) 09:40, 19 May 2013 (UTC)
Sorry, I wasn't very clear. I'm not suggesting to turn on $wgCompressRevisions. That would cause many issues and I don't think it gives a ton of performance boost with modern hardware anyway. What I am considering is running the compressOld.php maintenance script in the concat setting via cron . In this setting, the current revision of pages are left uncompressed and as is. Only previous versions of the page are compressed and concatenated. The main advantage is to reduce the size of the database. Since the current version is not modified, Replace Text continues to work fine. It strikes me that running compressOld.php followed by a regular optimize table would be wise to keep the database clean, small and fast. I don't see any downside, but want to make sure I'm not missing something. Particularly pages that have had hundreds of bot edits would get reduced by this. 🐝 thingles (talk) 12:41, 19 May 2013 (UTC) PS: The reason that Replace Text deteriorates slowly is that $wgCompressRevisions only activates on edit. So, pages would continue to work fine until they were edited. I had to write a bot after the fact to perform a null edit to every page to uncompress it after I deactivated $wgCompressRevisions.
I was too much distracted by mw:Manual:$wgCompressRevisions, which prevented me from having a closer look at mw:Manual:compressOld.php. I do not have any experience with this script and its results, so I cannot definitely advise. However, it seems to be fine running it with -t concat by reading your info and what this option will do. --[[kgh]] (talk) 08:44, 20 May 2013 (UTC) Thank you for sharing your infos!
I went ahead and ran this across all the wikis in my farm. It produces all sorts of dots and slashes in the output
1065	User_talk:Thingles ..................../..................../..................../.
each of those is a version that it has now compressed. I'll see what this does to the database size after it is done. 🐝 thingles (talk) 02:06, 21 May 2013 (UTC)
Sharing results of this. The script took a while to run, probably 30 minutes across all the wikis in my farm. It had a decent amount of work on WikiApiary since the bots have introduced a number of edits. Some pages having 100+ revisions that were able to be compressed. The results were good. Note however that you will not see a reduction in database size just from running this script, you must then tell SQL server to optimize the tables. After doing both the compressOld.php and then the optimize tables the WikiApiary MediaWiki DB dropped from 1G to 810MB. Overall I took about 600MB of space off all my databases. See graph below. That's not a ton of space, but it does get the database more compact which is probably a good thing to keep things fast going forward. I already run optimize tables automatically once a week. I'm going to consider adding compressOld.php to that process. 🐝 thingles (talk) 16:06, 21 May 2013 (UTC)

Result of compress old.png

PS: Note the gradual DB size growth from 1a to 5a on Tuesday? That is when SMW_refreshData.php is running across the farm.
That's great and interesting. Thank you for sharing. :) --[[kgh]] (talk) 19:01, 21 May 2013 (UTC)

Banned IP check causing slowness and removed (aka Performance victory!)

I've been poking all over the place to try and figure out why there have been these fairly dramatic slowdowns on WikiApiary (and all my MediaWiki farm). I think I hit a big one tonight. A long time ago I had put in a cron job to download the bannedip list from Stop Forum Spam. At some point, this file has ballooned to 377,669 entries. This was in LocalSettings.php and as a result every HTTP request PHP received did nearly 380,000 comparisons before even serving the page!

I looked at some timings and by simply removing this include I dropped response time for the API from over 1.0 (sometimes 3 or more!) second to consistently around 0.2 second. Interestingly, this also matches very smartly with the response time graphs on WikiApiary for WikiApiary! My cron job runs once a week to update that file. There was a jump up on 4/14 and again on 5/1. 4/14 is a Sunday, which is when that cron job runs. 5/1 isn't however.

Initial results via WikiApiary (debugging WikiApiary with WikiApiary is so meta!) look way better.

WikiApiary API Response Time 20130518.jpg

Lesson learned there. I thought that check was a bit expensive to begin with, and it got very expensive. I can already see via htop on console that the php5-fpm processes are not nearly as hot. Hopefully now the separate DB server and moving memcached and giving both more resources will yield a net improvement! Thank you for your patience! 🐝 thingles (talk) 03:05, 19 May 2013 (UTC) PS: Note that the WikiApiary page itself is a pretty big beast when you load it. It has 9 dygraph charts and a ton of data. It's slow, but that not a server issue. :-P I have a plan to change the layout of that page so there aren't 9 graphs on it soon.

This also explains why my invite only wikis like Links thing did not have a similar slowdown. They are setup to not load any antispam code since they require an invite. Things are looking much, much better. It's like I've had the emergency brake on without knowing it for a couple of weeks! 🐝 thingles (talk) 03:19, 19 May 2013 (UTC)
If you're curious, the CPU graph for pub2 also shows the difference. 🐝 thingles (talk) 03:21, 19 May 2013 (UTC)
Wow, now WikiAPIary is really fast again. Using this wiki blew off the rug from my head right away. :) I think this is an interesting lesson learned not just for here but for other wikis as well. --[[kgh]] (talk) 09:27, 19 May 2013 (UTC)
I'm glad to hear that it feels faster to you as well. Actually, API response times are now faster than they have ever been before even before WikiApiary got to real work. I'm hopeful that that is the net result of having the new DB server in place. 🐝 thingles (talk) 12:34, 19 May 2013 (UTC)

These graphs please me very much!

WikiApiary API Response Time 20130519.jpg

Planet Kubb Wiki API Response Time 20130519.jpg

🐝 thingles (talk) 15:46, 19 May 2013 (UTC)

Yes, this happens to others too.[1] I'm glad you found the culprit. --Nemo 16:45, 19 May 2013 (UTC)
I've added a section to mw:Manual:Combating_spam#IP_address_blacklists. I refactored the whole page but I'd use help updating that section, because I know nothing about those blacklists and I suspect that information is outdated and possibly completely wrong. Did spam increase? Did those blacklists ever help? Are some better than others? Etc. (Reply there if possible.) --Nemo 17:51, 19 May 2013 (UTC)
It seems that the DNS blacklists are probably fine. They will introduce some delay but largely your name resolver will cache that information so performance really shouldn't be that problematic. I was using the method of downloading the Stop Forum Spam database and then turning that into a call in LocalSettings.php and after looking into it I really wouldn't recommend that approach. Not only is there the performance implication that is always there, but I really don't like that that will change every week and if cron is updating it you're likely not going to know. I even got an email each week with the changes, but after a few weeks you ignore it. Then I forgot about it and didn't even consider it's impact. Also, in addition to adding a full second or more to each request, it also ballooned my PHP APC cache size. You can see my APC cache size and when I removed this call I took about 20MB out of it. That's a lot of valuable memory for real code to be cached in! I think the stuff I did with randomly mutating Questy Captcha is a much better solution. 🐝 thingles (talk) 18:54, 19 May 2013 (UTC)

FYI, in my desperation to try and fix slowness I also made a number of other changes that I hope are helping things as well. Notably, I'm now doing nice -n 19 on all the bots, my SMW_refreshData and runJobs calls. I can see that at least half of the CPU load is now running lower priority than nginx and php-fpm. This should keep the server responses snappy even when multiple background tasks are running. 🐝 thingles (talk) 18:59, 19 May 2013 (UTC)

The graphs and all the info here is really nice in every resprect. It really feels good to have this problem off the back. :) --[[kgh]] (talk) 08:37, 20 May 2013 (UTC)

Suggestions for dealing with database information in farms?

I've seen this for a long time and I'm curious if others have any ideas. It seems very common in the large wiki farms, notably Farm:Wikimedia and Farm:Wikia that they have different versions of databases in use at the same time. User:Bumble Bee captures this and it generates flapping in WikiApiary. Note the history for Wikibooks (te)/General as one example. I'm wondering if anyone has any great suggestions for dealing with this. One idea I've had is to store the database information as an array, which would be pretty cool and show a real picture of all the various versions in use. I think that is probably the best approach, however I have to figure out then how to get rid of old information. I could store an array of objects so I would have the database signature and the timestamp it was last seen. Then if one of those signatures isn't seen for some period of time it could be dropped. Thoughts? 🐝 thingles (talk) 14:35, 19 May 2013 (UTC)

Ouch. However it is very interesting to see which things happen outside in all the places. Your suggested approach seems good. Probably information older than three months may go. --[[kgh]] (talk) 08:55, 20 May 2013 (UTC)
After throwing out the idea of keeping database signature with a timestamp I realized it would be an even bigger problem since I would need to update the timestamps of when I saw those databases every single time User:Bumble Bee checked the remote wiki (unless I stored it in ApiaryDB, but this is not the kind of thing I want in there, that is just for high-frequency data). So, need to think more on this to figure out a way to do it without introducing more edits. 🐝 thingles (talk) 16:09, 21 May 2013 (UTC)
Alright, I just committed a change that should handle these multiproperty values. I decided to create a table in the ApiaryDB where I'm recording special properties that can have multiple values and made a function, ProcessMultiprops, that deals with these. This is hard to test thoroughly but I can see that it's working at recording the current values. After a day or so I will see if occurrences and timestamps are changing right and insure that wikis with multiple databases start getting a comma separated list. I'm actually also tracking first and last time that value was seen, and the occurrences of it. I plan to then change that field to holding a template that will record all that data. But, first I just want to see if the values are coming through properly. Once I know this works on dbversion I will look at any other fields that would benefit from this. Notably I've seen PHP version flap. If you know of other fields that flap, comment here. 🐝 thingles (talk)
This diff tells me this is working. I still need to modify Template:General info to catch the multiple values. 🐝 thingles (talk) 20:58, 27 May 2013 (UTC)
And with this diff to Template:General siteinfo we now support multiple databases being given for a single wiki. I am capturing the last time seen, and frequency of these as well. For now that is unfortunately locked away in ApiaryDB. I'd like to get it into the website, but I need to figure out how you store templates in other template parameters. Something I haven't done yet. 🐝 thingles (talk) 16:10, 28 May 2013 (UTC)

Upgraded to 1.20.6

Just FYI. WikiApiary is upgraded. Wanted to hold out for 1.21. But.. :-) 🐝 thingles (talk) 20:38, 21 May 2013 (UTC)

Upgraded nginx from 1.3.10 to 1.4.1

I recently upgraded nginx from 1.3.10 to 1.4.1. I don't expect there to be any impact of this, but just mentioning it in case you see something weird. 🐝 thingles (talk)

Editing is a bit slow again. However, I do not really know if it is connected to this update. --[[kgh]] (talk) 13:14, 25 May 2013 (UTC)

File:IndianerWiki Logo.png

This file was/is throwing the following error upon thumb generation:

Error creating thumbnail: convert: Too many IDAT's found `/.../images/wikiapiary/1/17/IndianerWiki_Logo.png' @ error/png.c/MagickPNGErrorHandler/1309.
convert: corrupt image `/.../images/wikiapiary/1/17/IndianerWiki_Logo.png' @ error/png.c/ReadPNGImage/3294.
convert: missing an image filename `/tmp/transform_70a87626351b-1.png' @ error/convert.c/ConvertImageCommand/3011.

I now replaced the file with a jpg-version to make things sane. There is even a bug for this. --[[kgh]] (talk) 12:36, 23 May 2013 (UTC)

Thanks kgh! I added a comment on that bug. 🐝 thingles (talk) 14:39, 23 May 2013 (UTC)

Change to siteinfo collection and general subpages

I put a change in to User:Bumble Bee that takes a different approach to building the general siteinfo subpages (the /General pages that each site has). Previously Bumble Bee would get siteinfo and then pull out a handful of values that were specifically identified, discarding the rest. This change moves to recording all values returned, except a list of keys that are specifically ignored. This gets a lot more data into the wiki that templates can then decide if we care about or not, rather than a bot deciding that. See an example of this change. I'm planning to do a similar change in approach to other data collection as well. 🐝 thingles (talk) 15:17, 23 May 2013 (UTC)

With this change we can start to use weird things returned in siteinfo. See Wikinews (sv) general diff. linktrail and linkprefix are not standard that I know of. Also, git-hash could be useful for something. 🐝 thingles (talk) 15:46, 23 May 2013 (UTC)
Now that we have these data we might get into trouble while trying to interpret them. ;) There is no wiki privacy I guess. ;) --[[kgh]] (talk) 13:15, 25 May 2013 (UTC)
We'll be getting more extension information very soon due to bug 48418. It's not in yet but this highlights why I'm changing collection to grab everything. As new fields role out I want them to just automatically come in. 🐝 thingles (talk) 11:58, 26 May 2013 (UTC)

Upgraded to 1.21

WikiApiary and have been upgraded to MediaWiki 1.21. I've also enabled Extension:Scribunto but have not tested it out yet. I attempted to enable Extension:Echo but got blocked by bug 48822. I'm going to look things over a bit and have removed $wgReadOnly and re-enabled User:Bumble Bee. Please post if you see weird things. I did upgrade all extensions, and I also moved a handful of extensions I had on REL1_20 branches back to master. 🐝 thingles (talk) 04:31, 26 May 2013 (UTC)

Extension:Scribunto works!

See User:Thingles/Scratch2 and Module:Hello world. Now I can start learning all this! 🐝 thingles (talk) 04:36, 26 May 2013 (UTC)

I guess I will have to do so too. :) I heard that this should be quite straight forward. --[[kgh]] (talk) 06:52, 26 May 2013 (UTC)

Semantic Forms autoedit issue in 1.21

I have found one critical bug as a result of the 1.21 upgrade. The Extension:Semantic Forms sfautoedit API currently throws a 400 error. Here is an example from debugging on Links thing:

{"errors":[{"level":0,"message":"Format json is not supported for content model
wikitext"}],"responseText":"Modifying <a
href=\"\/wiki\/From_Here_You_Can_See_Everything\" title=\"From Here You Can See
Everything\">From Here You Can See Everything<\/a>
failed.","status":400,"form":{"title":"Bookmark"},"target":"From Here You Can
See Everything"}

Since I've disabled error tracking back to the wiki in User:Bumble Bee there isn't an issue there, however, User:Audit Bee depends on this API to audit sites. As such, I'm just disabling Audit Bee for now until bug 48838 is resolved, which will hopefully be soon. 🐝 thingles (talk) 14:16, 26 May 2013 (UTC)

Just today f.trott pushed a fix for this issue! Awesome! User:Audit Bee is reactivated and doing his work again! Although now he needs to be updated to deal with sites that are using Special:Statistics collection. :-) 🐝 thingles (talk) 02:19, 13 June 2013 (UTC)

Extension and Skin collection changed

I just made a change to User:Bumble Bee that uses the same collect everything returned method for recording information. You will notice that the extension instances are written differently now. But, more importantly we'll start seeing some more advanced information coming in from 1.22 and above. Let me know if you see bad things. 🐝 thingles (talk) 16:21, 26 May 2013 (UTC)


It hit me that this would be pretty easy and fun to do. :-) Map. I've integrated the MaxMind GeoLite city database with User:Bumble Bee. Note the addition of Template:Network info in the /General subpages. :-) 🐝 thingles (talk) 20:12, 26 May 2013 (UTC) Disclaimer: All data here is from MaxMind and is based on the hostname of the wiki. I considered doing this based on IP address location of A records for the sites, but this was much easier and possibly more accurate. Also, I'm not sure it's super useful, but I wanted to see it myself and see what others thought.

Nobody looking at the Map yet? Am I the only map geek here? :-) 🐝 thingles (talk) 20:59, 27 May 2013 (UTC)
Last time I looked at it, only a few wikis were already mapped. Currently the wikiverse seems to be a bit NATO-centric. --[[kgh]] (talk) 10:11, 28 May 2013 (UTC)

Property:Founded date

I added a field to Form:Website to allow setting a date founded for a wiki and store it in Property:Founded date. I'm planning on looking at using a bot for this by looking at the timestamp for revision 1 on the wiki in question. I plan on using this date in a bunch of reports. 🐝 thingles (talk) 03:56, 27 May 2013 (UTC)

Great, after this information was collected we could do a "born this date" on main page. :) --[[kgh]] (talk) 12:23, 27 May 2013 (UTC)
I hadn't thought of that! That's a great idea! 🐝 thingles (talk) 12:52, 27 May 2013 (UTC)

Bot segment override

You may notice I added a field to Form:Website to allow a user to override the automatically calculated Property:Has bot segment. The main use of this field is for an administrator to be able to place a specific wiki in a desired bot segment, or even place it in multiple segments. Mainly, right now User:Bumble Bee cycles through 15 segments one a minute so the fastest query time is every 15 minutes. Using this field, I could put a wiki in segments 0, 5 and 10 thus allowing it to get data faster than Bumble Bee's natural cycle time. Note, this still will not allow a website to collect faster than the out-of-wiki throttle that defaults to 240 minutes for all wikis unless they are manually changed. 🐝 thingles (talk) 04:00, 27 May 2013 (UTC)

An convenient option, though it should probably rarely be used. --[[kgh]] (talk) 12:25, 27 May 2013 (UTC)
Yes, very very rare. I only put a note here so people know what it was. I'm probably the only one that should be editing it. I plan on putting sites that do override into a category so they are easily tracked. 🐝 thingles (talk) 01:59, 28 May 2013 (UTC)

Catching 1.22wmf5 default skins

Take a look at the Skins tab for Wikipedia Test Wiki 2. Note that the default skin is now indicated. Also, skipped skins are no longer indicated. For full discussion on this see Talk:Main Page#Highlight current skin. 🐝 thingles (talk) 17:41, 28 May 2013 (UTC)


Just sharing this Python module that has me very excited. Thus far all template editing I've been doing in User:Bumble Bee has been through the Semantic Forms API, and it has a number of challenges. I'm very excited to see this module and have installed it to start playing with it. Notably, this will allow me to deal with much more complicated templates on pages, including multiply occurring ones.

Also, one of the things that I could consider doing with this is collapsing all the subpages for websites. So, instead of WikiApiary, WikiApiary/General, WikiApiary/Skins and WikiApiary/Extensions I could put all those templates into the main WikiApiary page. This has pluses and minuses. A big one is that MediaWiki would be tracking many fewer pages and template transclusions. It would probably also make data synchronization for each wiki a lesser issue since one refresh would update everything. On the downside, there would be a single stream of edits for users and bots (right now bots mostly edit subpages). But, maybe that wouldn't be a downside. I would love to hear what people thought of collapsing all those subpages into the main page. 🐝 thingles (talk) 14:42, 29 May 2013 (UTC)

A plus would also be that one has to watch only the main page of the wiki to be informed about all changes. Currently I have to watch up to five pages for a wiki. However on big farmed wikis (Wikia, Wikimedia) you would get a lot of notifications since there are heaps of changes related just to farming (database etc.) This could probably get a bit annoying. Having a direct data refresh is preferable, too. This change should not deteriorate editing performance so ... --[[kgh]] (talk) 08:12, 1 June 2013 (UTC)
That sounds like a {{Vote|Yes}}. :-) 🐝 thingles (talk) 12:19, 1 June 2013 (UTC) PS: This would require that I reconfigure the bots a decent bit so it wouldn't happen soon.

How to deal with uncared Wikis that are spammed?

When looking at the main page at the statistics I found that in the category "1K to 10K Pages" all of the Wikis in the top 5 list for 'Page change' are only there because they are spammed by bots creating advertising pages in huge amounts - some of them for quite some time so it seems that nobody cares. How is dealt with pages like that? They seem to render the top x lists unusable. --Curlybracket (talk) 10:53, 30 May 2013 (UTC)

Marking it as defunct would be one option. The other one is to switch to monthly data collection. Thus we could revisit the site and see if the landlord is back. I guess I prefer the second option. Third options around? --[[kgh]] (talk) 17:04, 30 May 2013 (UTC)
I could see adding a custom flag like "Do not highlight" that could be used to exclude sites from lists of most actives. I could also see it filtering sites that are shared on Twitter (eg, do not share adult wikis). 🐝 thingles (talk) 03:47, 31 May 2013 (UTC)
We could even combine this. :) --[[kgh]] (talk) 07:55, 1 June 2013 (UTC)

IRC meetup?

I’m curious if there would be interest in coordinating a time for an IRC meetup about WikiApiary. We could draft up any discussion points and also just get to meet each other a bit. Thoughts? 🐝 thingles (talk) 03:49, 31 May 2013 (UTC)

I am usually a not that frequently on IRC bloke, but yes for brainstorming and discussions, it would a nice thing to do. Why not agreeing on a regular weekly or two-weekly WikiAPIary meetup were we try to be there. --[[kgh]] (talk) 07:54, 1 June 2013 (UTC)
That is a good idea. I went ahead and registered #wikiapiary on Freenode and set ChanServ to help manage it. I'll look at putting a regular timeslot for a meetup out there shortly. 🐝 thingles (talk) 12:17, 1 June 2013 (UTC)