WikiApiary talk:Operations/2013/May

First pull request!
A new milestone for WikiApiary. The very first pull request was made to some of the code on Github. Very small change, but great to see someone looking at it and contributing! Thanks Philip Becker! 🐝 thingles (talk) 17:01, 8 May 2013 (UTC)
 * Cool! --&#91;&#91;kgh&#93;&#93; (talk) 17:21, 8 May 2013 (UTC)

Upcoming server maintenance
Just a note to let people know that sometime soon I expect to move the database for WikiApiary to another server. Right now everything for WikiApiary and all of Farm:Thingelstad.com run on a single box and the load is getting a bit much. I expect the process to take about 2 hours during which I'll be setting  and suspending all bots. 🐝 thingles (talk) 19:37, 8 May 2013 (UTC)


 * Status update on this. I have a 2nd host fully provisioned and setup in Linode now to move the database for WikiApiary (and my entire farm) to. Currently I have everything on a single host. The new DB host is using a 64-bit OS and I'm also using this as an opportunity to move from MySQL to MariaDB (see also Wikipedia adopts MariaDB). The host is setup, MariaDB is running and now I just need to migrate the databases and switch configs to use it. Of course, that's the delicate part. :-) 🐝 thingles (talk) 12:35, 11 May 2013 (UTC)


 * db server is running and active. MariaDB setup and running. I'm going to do a full export/import rather than simply copying the data files over so that I can also defragment the tables in the process. 🐝 thingles (talk) 19:56, 11 May 2013 (UTC)


 * I was happy to see that there are about five other SMW sites that are using MariaDB as well. I made Semantic statistics/Databases to check this out. Hopefully there aren't any surprises once I make the switch. 🐝 thingles (talk) 19:58, 11 May 2013 (UTC)


 * I am a bit surprised that Wikitech is not working with MariaDB - I think they soon will. I do not really expect problems at the moment. Presumably MySQL and MariaDB will probably more and more diverge with time. Thus issues may arise. Rather than working on SQlite support the focus should be put to MariaDB as the second database system supported by SMW. However, I am sure that there was already mail on the dev-mailinglist. --&#91;&#91;kgh&#93;&#93; (talk) 20:52, 11 May 2013 (UTC)

Extensions with protocol-relative URLs
Mark Hershberger mentioned that there were a number of extensions showing an error with their URL. It turns out a few extensions report their URL using a protocol relative URL. I'll avoid editorializing on the wisdom of that and instead just put a handler for this in User:Bumble Bee. Now any extension URL that starts with "//" will have  prepended to it. Note that this only impacts the Template:Extension in use template that is used to aggregate data for extensions. Mainly this change will just remove warnings, it won't change any information on extensions statistics, counts or versions. When I put this fix in there were 606 Property:Has extension URLs with Property:Has improper value fors. Over the next day as Bumble Bee updates extension information this should drop dramatically. The current value is . 🐝 thingles (talk) 15:30, 10 May 2013 (UTC)


 * Example of this fix propogating. 🐝 thingles (talk) 15:33, 10 May 2013 (UTC)

Site seems really slow
I've been trying to add the Club Penguin Wiki Network wikis to Wikiapiary, since I'm staff there.

However, the site is being very slow for me. I get about 40ms ping, so it can't be my internet connection, but the server. time curl http://wikiapiary.com tells me it took 4.199s one time for the site to load and then afterwards sticks around 2-2.5s.

Any other clues as to why it's being slow? I have a host I can recommend if you run out of choices, I use them for my wikis and they are great. --Tux (talk) 05:32, 11 May 2013 (UTC)


 * Heiya Tux, yeah there is a lot happening in the WikiAPIary making it slow. I guess this problem has already been identified and will be fixed. Cheers --&#91;&#91;kgh&#93;&#93; (talk) 07:36, 11 May 2013 (UTC)


 * Thanks for the note and I'm definitely working on this. There are three main issues causing slowness right now: MySQL and PHP5-FPM are fighting for CPU on a single box, at night (and notably while your edits were happening) the backups cause additional load and lastly User:Bumble Bee needs some more intelligence about not rewriting already saved data. I've provisioned the host to move the database over already. I'm taking a little bit more time to move to a fresh 64-bit distro for the new DB host and also would like to use MariaDB. But, I might try to finish this this weekend. Sorry for the issues, this is a top priority for me (and sometimes why you see less of me in Special:RecentChanges and rather in SSH windows and text editors. :-P). 🐝 thingles (talk) 11:57, 11 May 2013 (UTC)


 * I'm curious by the way what host you were referencing above? 🐝 thingles (talk) 11:57, 11 May 2013 (UTC)


 * Yeah, tbe wiki does seem a lot faster now (I was editing at around midnight after a really shoddy school party). By the way, it's RamNode but they might not like the CPU-intensive stuff you're doing right now. I had to configure nginx and php-fpm correctly in order to not make the CPU scream, but once I did things are very speedy and the performance is very consistent. --Tux (talk) 13:03, 11 May 2013 (UTC)
 * By the way, if you want to know what I mean, check any of the Brickimedia (or CPWN, though there's not a whole lot of data) wikis out. The spikes are were when one of the dumb "system administrators" disabled APC and such and I had to fix it. But once it was done, it was speedy once again. --Tux (talk) 13:08, 11 May 2013 (UTC)


 * If you look at the API response time of WikiApiary one can see that performance degraded step by step. While it would be very cool to have more power performance is still ok for me since I am currently not power editing. :) --&#91;&#91;kgh&#93;&#93; (talk) 15:39, 11 May 2013 (UTC)


 * More power is coming soon kgh. db server is up and just needs data moved to it. 🐝 thingles (talk) 19:55, 11 May 2013 (UTC)


 * No worries. :) --&#91;&#91;kgh&#93;&#93; (talk) 20:42, 11 May 2013 (UTC)

New database online
I took about an hour of downtime this morning and put the new database server online. I decided to do a copy of the database over which means I still need to run an optimize. I've brought all the bots back online and everything looks okay. I'm guessing performance is going to be a bit variable for a bit while I tune things for this setup. I've given InnoDB a 1G buffer on the new server which should help a lot. And now with mysql not running on the webhost I should be able to increase the php5-fpm pool too, but I'm going to be very slow with that. 🐝 thingles (talk) 12:56, 12 May 2013 (UTC)


 * Maria Maria :) --&#91;&#91;kgh&#93;&#93; (talk) 16:42, 12 May 2013 (UTC)


 * I'm still poking at performance items (and plan on doing so in a bunch of areas for a bit). Anyone can see the performance metrics from the servers behind WikiApiary on my Munin page. If you see anything problematic please let me know. 🐝 thingles (talk) 21:30, 12 May 2013 (UTC)

Handling Error Information
I've been digging into performance and looking at what I might need to alter to get things faster, and keep them that way. A few weeks ago I made a change to record error information directly into the wiki and I think that was a mistake. It turns out that when collecting data from thousands of wikis that it is very, very typical for many of them to not be responding at any given time. On many of these that results in as many as 9 or 10 SMW Ask API queries as well as some edits, sometimes multiple ones. If you look at Special:RecentChanges and show bots you can see that the vast, vast majority of edits are simply recording error state for websites. This was a mistake.

My plan is to modify the ApiaryDB and integrate this detailed error information into it. It is sort of their now in the generic form of the Bot log but that isn't actually connected to the sites themselves, like the statistics information is for example. Instead, I'm going to start recording collection errors in a manner more like the statistics data itself. I'll also pull the current data into SMW properties, just like I do right now with the usage information. This means it will sometimes be out of date, but I think that is fine.

The added benefit of this is that it will allow charting of errors, for what it's worth.

If anyone else is interested in helping hack on this Python stuff let me know. 🐝 thingles (talk) 21:24, 12 May 2013 (UTC)

Semantic Query Optimization?
I'm wondering if others know the answer to this optimization question. I'm doing a SMW Ask query that looks like this (note, this is Python code but it's easy to see the query itself):

my_query = .join([           ,            'Is defunct::False',            'Is active::True',            '|?Has API URL',            '|?Check every',            '|?Creation date',            '|?Has ID',            '|?In error',            '|?Collect general data',            '|?Collect extension data',            '|?Collect skin data',            '|?Collect statistics',            '|?Collect semantic statistics',            '|?Collect semantic usage',            '|sort=Creation date',            '|order=rand',            '|limit=1000'])

My question is regarding the six different properties for collection. I've been considering modifying the wiki so that there is just one property, something like  and it would have multiple values set for each item to collect. Then on this query, I would just pull that one flag property and handle it appropriately. Would this make the query easier/faster for SMW to handle? 🐝 thingles (talk) 21:28, 12 May 2013 (UTC)