WikiApiary talk:Operations/2013/June

Wiki of the Month
WikiAPIary is being showcased as Wiki of the Month on Semantic-MediaWiki.org for the month of June, 2013. --&#91;&#91;kgh&#93;&#93; (talk) 07:50, 1 June 2013 (UTC)




 * [[Image:thumbs-up-smiley.jpg|200px]] 🐝 thingles (talk) 14:14, 1 June 2013 (UTC)

Founded dates
I've seen a number of people adding founded dates to websites. Honestly, way more than I expected. I figured that would be a pain to find by hand. :-) Anyway, once bug #48838 is cleared in Semantic Forms I plan on adding a check in User:Audit Bee that will get the date of revision 1 for every wiki and populate it in founded date. I'll only do this if a date isn't there, so it won't overwrite any existing data. Once I make that change I'm going to flag all websites for a new audit so they get populated universally. Just an FYI that you needn't worry much about getting this data point. 🐝 thingles (talk) 18:56, 2 June 2013 (UTC)

IRC #wikiapiary and WorkerBee
I've setup a channel on Freenode at. Feel free to stop by and camp. I also have setup a bot called WorkerBee. The bot is powered by Willie and is running on my server. I've used Willie in the past to interface with the MediaWiki API and I'm thinking it would be fun to have some commands that WorkerBee can do in IRC to interface with WikiApiary. Additionally I plan on having him share some Bot log information into the IRC channel. If you have suggestions or ideas please share! 🐝 thingles (talk) 19:14, 2 June 2013 (UTC) ''PS: IRC bots are fun. :-)''

Introducing wmbot
You may see some edits from User:Wmbot. This is a bot that I'm writing to do two things:


 * 1) This bot is pulling the Wikimedia configuration files and creating new sites on WikiApiary for any sites that it finds. This will quickly fill out all of the language sites for all the Wikimedia projects.
 * 2) This is intended to be a standalone bot (not using the WikiApiary libraries that User:Bumble Bee and User:Audit Bee use) that can serve as an example of how to write a bot that creates sites automatically. Hopefully someone that may want to write a crawler, or an admin of another wiki farm, can use this code to make a bot that synchronizes their farm as well.

I'm being cautious so I just ran wmbot and had it create 10 Farm:Wikipedia profiles. I'm going to watch it and expand from there. 🐝 thingles (talk) 22:23, 3 June 2013 (UTC)


 * Alright, I cleaned up the code for wmbot.py a bit more. I'll add it to github shortly, it still has a username and password in the source that I have to extract out. If anyone would mind checking out Special:Contributions/Wmbot and looking at the pages that it has made so far that would be great. Assuming they all work out fine, I'll finish the run tomorrow to add all sites it knows how to parse. This will put all but a handful of unique Farm:Wikimedia sites into WikiApiary, and can easily be run weekly or so to catch any new language instances that get created. 🐝 thingles (talk) 03:42, 4 June 2013 (UTC)


 * wmbot has completed a first full run and added a ton of Wikimedia project sites. WikiApiary should now have every language for Farm:Wikipedia, Farm:Wikiquote, Farm:Wikisource, Farm:Wiktionary, Farm:Wikibooks, Farm:Wikinews, Farm:Wikiversity, Farm:Wikimedia and Farm:Wikivoyage. I haven't scheduled this to run regularly yet, but will probably have it run via cron monthly. 🐝 thingles (talk) 12:16, 4 June 2013 (UTC)


 * A more generic version -- one that can operate on any farm -- would be a FarmerBot. -- ☠ MarkAHershberger ☢ (talk) ☣ 16:18, 4 June 2013 (UTC)


 * I have had a look at some of WMBot's edit and found them allrighty. :) --&#91;&#91;kgh&#93;&#93; (talk) 17:00, 4 June 2013 (UTC) PS I would love to see Farm:ShoutWiki and Farm:Wikkii here, too.


 * Thx kgh!

More than 2M property values set
Almost exactly three month after the last big milestone WikiAPIary exeeded the 2,000,000 property values mark and is once again one of the 10 biggest known Semantic MediaWiki installations. --&#91;&#91;kgh&#93;&#93; (talk) 16:32, 4 June 2013 (UTC)

Host lists for other farms?
Now that I got User:Wmbot running and it seems to be fine, I could branch him to deal with some other farms. kgh mentioned Farm:ShoutWiki and Farm:Wikkii. Personally, I'm very curious to start pulling in Farm:Wikia although it will certainly cause pain as the site load increases. Here is where I could use some help though. For any of these I need some sort of list of sites. For Wmbot I'm using the Wikimedia configuration files, of which one contains a list of database names and from that, because they follow good convention, I can infer the wikis from.

I've looked a lot for a list of Wikia sites and cannot find anything. I thought I had found something for Farm:Wikia when I hit this thread on s23.org but the URL's mentioned are no longer. I even tried looking for information in  as well as , but no go. I see a similar thread for a host list from ShoutWiki, but I don't see such a list.

If anyone can find host lists for any of these farms, it is pretty darn easy for me to make something to pull them all in. Please share any information you can find here. 🐝 thingles (talk) 17:17, 4 June 2013 (UTC)

Wikia
Currently I am not too keen to see Wikia here. That's a lot of wikis to munch on. What about this site? There you get at least a list of the domains. Basically that's the product of s23's work. --&#91;&#91;kgh&#93;&#93; (talk) 17:26, 4 June 2013 (UTC)


 * I've poured over the s23/wikistats's stuff but it's not clear to me how often that is updated. Somehow mutante got ahold of a hostlist for those there, I'd like to use the same source host lists so there isn't an added dependency. 🐝 thingles (talk) 17:39, 4 June 2013 (UTC)


 * You could try to pull a list of the top 5k wiki domains off of http://www.wikia.com/WAM . Edit: that is for wiki's that are active, or you could try for a complete ist http://community.wikia.com/api.php?action=query&list=wkdomains&wkfrom=10000&wkto=15000 Simant (talk) 18:26, 4 June 2013 (UTC)


 * Holy crap! That's the magic I need! (Good too, since that WAM list was not workable.) That will work perfect to start looking at that. I am very, very mindful though of User:Kghbln's concerns. I think what I will likely do for Farm:Wikia is pull this file but only add say 50-100 websites on any given run, and do them manually. Gradually just adding small sets. This is a lower priority though. 🐝 thingles (talk) 04:34, 5 June 2013 (UTC)

Wikkii
Wikkii has a list of active wikis and a list of wikis. Shoutwiki is currently down, but this should be a cached list of their wikis.--ete (talk) 20:54, 4 June 2013 (UTC)


 * Excellent! That looks very workable. It seems like if I filtered for links with  I would get the list I would want. Thanks! 🐝 thingles (talk) 04:34, 5 June 2013 (UTC)


 * Hmm, there seems to be a much bigger problem with Farm:Wikkii. Looking at the sites on that list when I attempt to access the API for any of them I get the error:

MediaWiki API is not enabled for this site. Add the following line to your LocalSettings.php $wgEnableAPI=true;


 * That pretty much takes them out entirely. We do currently have the Wikkii Community Wiki tracked, and it has an API point (at an odd URL). However that is the only Wikkii site that is in WikiApiary right now, perhaps their community wiki is not hosted on their main farm. It is worth noting that while they have the API seemingly disabled universally, basic statistics would still be possible by calling Special:Statistics. For example, HG World stats would be usable to get data. We've discussed using Special:Statistics collection before and decided against it, but that was in the context of collecting from old MediaWiki instances. It looks like there would be utility to collect data from sites that have the API disabled as well. Hmmm... By the way, Wikistats does collect data from Wikkii, and it does support collecting using Special:Statistics calls. 🐝 thingles (talk) 19:44, 5 June 2013 (UTC)


 * I decided to just dive in and start working down the road to collecting from Special:Statistics. I made a change to Form:Website to allow selection of API or Special:Statistics for collection. I'm using Semantic Forms fancy hide/show stuff so it looks nice. Nothing else supports this yet, but getting the data for the sites was the first step. This will allow Wikkii to come in with at least stats information, and, it will allow WikiApiary to support pre-API wikis, FWIW. :-\ 🐝 thingles (talk) 21:00, 5 June 2013 (UTC)


 * &rarr; Further updates on collecting via Special:Statistics.


 * Now that Special:Statistics collection is working I will look at picking up the Farm:Wikkii collector soon. It would add about 3,000 sites to WikiApiary.

Referata
Related, this does not seem to have all the Referata sites, and this is admin only, maybe Yaron could provide us with a list?--ete (talk) 17:26, 5 June 2013 (UTC)


 * I pinged Yaron on #semantic-mediawiki and he mentioned he purposefully hasn't provided a single list of all Farm:Referata sites. I mentioned that if there is some list he would like me to load I'd happily do that. 🐝 thingles (talk) 19:22, 5 June 2013 (UTC)

SMW Registry
While looking for a list for Referata, I found this, which may be handy? A less easy to digest, but a load more semantic sites is always nice. --ete (talk) 17:26, 5 June 2013 (UTC)


 * Yes, I've poked at the registry for Semantic sites and plan to do a sync with WikiApiary for that. 🐝 thingles (talk) 19:22, 5 June 2013 (UTC)

Thinking about wikispam
I've spent some time tonight sketching out what WikiApiary could to do to fight off spam. I'm pretty amazed by the potential honestly. Given the existing API data, along with a couple of new feeds and paired with an (optional) bot account on a remote wiki WikiApiary could be incredibly effective. Even without a remote bot account, I was just thinking through building a reputation score for a given edit. Part of the magic that I don't see us taking advantage of when fighting wikispam is that we don't have to do it in realtime. Similar to email spam. Spam may go in your inbox for a few minutes, and then once some system figures out it is spam, can pull it out before you even saw it. Same with edits. Anyway, just sharing some late night contemplation. Don't mean to start a huge thread, but happy to hear thoughts. I am starting to do a little playing with accessing the logs and recent changes information via API. The total data size for this does start to get much larger. Entirely feasible, but perhaps more than a single VPS at Linode. :-) 🐝 thingles (talk) 05:33, 5 June 2013 (UTC)
 * Hm, you are considering making a Soldier Bee or something of the kind, which would go around reverting/blanking/deleting detected spam edits (and perhaps even blocking the user when fairly certain)? I guess registration on sites which are getting spammed anyway may be possible via bot..


 * Things that come to mind to consider: First task for the bot after registration could be to edit a section onto the main page talk page, its own userpage, and maybe the founder's talkpage explaining what the bot is, a couple of basic spam prevention things (setting up abuse filter to block unconfirmed users from adding links and putting up a non-terrible capacha seems to stop almost all spam for me), not to reply to the bot on their wiki (give a link to a page here that they can ask about it, unless their spam filter blocks new user links already), and requesting bot/sysop powers to be able to delete spam pages and work without cluttering RC. --ete (talk) 10:22, 5 June 2013 (UTC)


 * When I think of WikiApiary using an account on a remote wiki I see that as something that the admin would request. I envision a setting in WikiApiary that says "Request bot credentials" that would then send a notice via email to the person who requested it with a username and password (unique for every site) for an account to create. WikiApiary could then test if it can login with those credentials, and that the user is given bot permission and send an error if the account isn't setup right. There would need to be a number of security considerations taken into account in that flow, but that is what I'm thinking. 🐝 thingles (talk) 15:18, 5 June 2013 (UTC)


 * If it's on request only, not even creating an account before approval.. I feel like virtually all the spam will go untouched. It's really easy, in my experience, to shut down all but a few spambots (the ones which mangle their links) with a couple of edit filters (perhaps due to how little the spambot writers have to try against some wikis). A lot of MW site creators are just extremely lazy and reluctant to give out power to fix things. I'd happily sign up for this, and if it works well it would be kinda helpful, but spam is already a very minor problem for us (three spambots bypassed the filters in the last month (two confused polish spambots, and one before we'd turned the filters back on after updating), and the filters only block unautoconfirmed users (no age limit, and only one edit needed) from adding links). On some wikis, I've known 100s of spam pages being deleted daily by hand, due to lack of preventative measures. It would obviously be a lot more work.. but if you could get funding to run a bot which wiped out a large portion of spam from these kind of wikis, and plopped a clear and easy preventative solution right into the talkpages of the head admins, that would be pretty amazing. As compared to only working with signed up wikis, which would be.. a bit helpful to a much more limited number of wikis. And a lot harder to get right, since the spambots which bypass sane sysadmin measures tend to be less predictable.--ete (talk) 18:06, 5 June 2013 (UTC)


 * Would you be thinking of offering this as a freemium thing, or maybe try to get funding from somewhere else? I imagine owners who can't be bothered to set up basic anti spam (which is terrifyingly many of them) likely won't be interested in paying, and those who could be interested in paying have little enough spam compared to their number of active users that it's not worth their hassle. But if there was a big single source of funding it could work. Would Wikimedia or some other organization perhaps be interested in funding a least server costs? Maybe even Google once you've got proof of concept, they'd probably be very happy for there to be less wikispam polluting the links they get from wikis and the positive publicity, and have plenty of money/server time. --ete (talk) 10:22, 5 June 2013 (UTC)


 * This sounds like we are talking about US healthcare policy now! ;-) Kidding. I think your observation that "those that can, do, and those that can't, won't pay" is an astute one. I sort of look at the universe of all these unprotected wikis as a pre-built honeypot. But, what to do with it and who would find it valuable requires thought. I do think that modern platforms need to think about spam as it reflects negatively on the platform as a whole. A friend of mine recently setup a MediaWiki and didn't put anything on it. It instantly was obliterated and that made him negative about MediaWiki as a platform. Nothing actionable in that, just an observation. With all that said I am definitely looking at a variety of angles where WikiApiary can generate revenue to fund itself and additional services. 🐝 thingles (talk) 15:18, 5 June 2013 (UTC)


 * Agreeing about it reflecting negatively on MediaWiki as a platform, I've found this especially true for users who clear the spam by hand for badly run wikis. I had one user argue strongly against using MediaWiki on one project entirely due to bad experience with spamclearing on a previous wiki. Honestly, this is something which should be solved on MediaWiki's end (include pre-configured basic anti-spam as default, disableable if not needed), but WikiApiary could definitely help with automatic repairs of long term spammed wikis and putting clear information about how to prevent more spam in clear view of not just the owners, but the users who can badger the owners to fix it. Working on catching those spambots which slip past basic safeguards seems like a more involved project (because those bots are less stupid, and their writers are more likely to update if you figure out how to shut down the current generation) with less overall returns (because globally the stupid bots outnumber the smarter ones by a huge margin).--ete (talk) 18:06, 5 June 2013 (UTC)


 * I wonder if, as a bot editing large numbers of wikis, there was a chance of you ending up on some antispam IP blacklists. That would be amusingly ironic, but annoying.


 * Oh, and would it be practical for the bot to look through deletion logs of large numbers of wikis (all those that gave it access after it automatically registered) and edit logs, looking for the words "spam" and "spambot" to learn from existing spam detected by human editors?--ete (talk) 10:22, 5 June 2013 (UTC)


 * There are a number of ways that you can detect positive and negative votes from the existing log activity. This could then be used to build or decrease reputation both for edits and for the user behind it. 🐝 thingles (talk) 15:18, 5 June 2013 (UTC)


 * Right, seems like you've got some ideas :). Perhaps having some form of output so the owners could run the user removal script to eliminate the spambots would be nice as an extra?--ete (talk) 18:06, 5 June 2013 (UTC)

Limiting charts to 3 months
I just hacked in a small change to the PHP data accessors that are used in dygraph to only return data for the last 3 months. Some sites, notably WikiApiary itself were starting to return a mountain of data that made the browser almost unusable. The plan is to have daily and weekly aggregations for these graphs, but for now I just limited the raw graphs to 3 months. The data is all there, so don't worry. Just not accessible right now. 🐝 thingles (talk) 05:56, 5 June 2013 (UTC)


 * This has now been removed since the charting units got a complete rewrite! 🐝 thingles (talk) 21:25, 6 June 2013 (UTC)

Charting revamped
I got a ton of help today from my friend Paul DeCoursey who spent the morning with me in a coffee shop completely rewriting the Javascript that powered the charting on WikiApiary. It is radically improved now, and I've made a bunch of changes.


 * 1) All website pages now just have 3 dygraph charts on them. There is a selector that allows you to select what data you want to view. This solves a big problem for websites like WikiApiary that had 9 charts on them and really bogged the browser down.
 * 2) Charts now default to 2 months, but there is now a selector that allows you to select how much data you want.
 * 3) Image count graphs are now accessible again.
 * 4) The exact same code powers the chart popup window.

Still remaining to do:


 * 1) The CSS could use some help, particularly in the popup. (Anyone with CSS skills feel free to jump in!)
 * 2) The frequency option currently doesn't do anything, it will when I get aggregation working in the database.
 * 3) You can currently request SMW charts for sites that don't collect it.
 * 4) Add settings so that websites can specify which 3 charts should be displayed on default. (Some may want to highlight different aspects.)
 * 5) Look into adding a spinner or some "Loading..." message when chart is switching. For some charts it can take a bit.

Huge thanks to Paul for all his help on this. There is no way I could have gotten the Javascript side of this working right on my own.

🐝 thingles (talk) 21:31, 6 June 2013 (UTC)


 * WOW, awesome - actually the word for this is missing!!! [[Image:thumbs-up-smiley.jpg|200px]] --&#91;&#91;kgh&#93;&#93; (talk) 21:56, 6 June 2013 (UTC)


 * I see "thumbs up smiley guy" turning into an extra special form of a barnstar. :-) 🐝 thingles (talk) 22:19, 6 June 2013 (UTC)


 * Yeah, this would be a cool one. --&#91;&#91;kgh&#93;&#93; (talk) 22:48, 6 June 2013 (UTC)


 * This sounds very positive, but I'm having issues.. none of the charts are showing up at all for me (just a large blank area), either on pages or when I click open in new window. I'm currently on a borrowed mac, running firefox.--ete (talk) 22:03, 6 June 2013 (UTC)


 * Hmm, well, the good news is I see that too. I only tested in Chrome and Safari. Looking into this now. 🐝 thingles (talk) 22:06, 6 June 2013 (UTC)


 * And after one reload I can no longer reproduce it. :-\ 🐝 thingles (talk) 22:08, 6 June 2013 (UTC)


 * Okay, I can't reproduce this again after the very first time. I did see the error in the console and it indicated the Chart object wasn't loaded. I tried clearing cache and I still can't get the error. Can you reproduce reliably ete? 🐝 thingles (talk) 22:14, 6 June 2013 (UTC)


 * Must be a caching issue. After I cleared the cache, everything works smoothly again. I remember us having the same issue in February or March after the last main revamp of this section. Worked out allrighty. :) --&#91;&#91;kgh&#93;&#93; (talk) 22:48, 6 June 2013 (UTC)


 * I was also actively editing and breaking stuff for a while, so a bad version could have gotten downloaded at many points. 🐝 thingles (talk) 22:53, 6 June 2013 (UTC)


 * Clearing the cache also fixed it for me, looks great. The dropdown for duration seems not to drop down when I click it though.--ete (talk) 00:02, 7 June 2013 (UTC)


 * That dropdown not working is a CSS issue. I'll work on a fix shortly. 🐝 thingles (talk) 01:26, 7 June 2013 (UTC)

Supporting collection via Special:Statistics
A day or so ago I modified Form:Website to allow the addition of fields to collect statistics using Special:Statistics from a remote site. I realized there was an edge case to handle for websites between 1.8 and 1.11 that would support the API for general version information, but not statistics. So, some websites like Rodovid (en) will actually get general information from the API, and statistics from Special:Stats. This means I had to change the form to checkboxes so both could be selected, etc. Also note three new properties:


 * Property:Collect statistics stats
 * Property:Has statistics URL
 * Property:Has collection method‎

This should be all I need to do in the wiki to support this. Next step is to add the capability to User:Bumble Bee.

🐝 thingles (talk) 04:09, 7 June 2013 (UTC) ''PS: Why can't people upgrade their software? Ugh.''


 * PS: You may notice with this change that I moved the field for the API URL to the collect tab. This was intentional. Now the first tab of the form to add a website is all non-technical. In fact, this makes it easy to use WikiApiary as a database of wikis (similar to WikiIndex) and for some wikis not collecting any data. Just having an entry. When you switch to collecting, then API URL's and such are there to fill in. I'm was somewhat hesitant that people may not go to the 2nd tab, and might not add API URL's (which I've set as no longer mandatory). I'll keep an eye on that and see if it is a problem. Please share any feedback on this. 🐝 thingles (talk) 04:35, 7 June 2013 (UTC)


 * Sounds great. Would it be practical to check whether currently defunct websites have accessible Special:Statistics data, and undefunct them if they do? I imagine a large portion of the defunct sites are still available, they just screwed up their url structure and broke the API. Otherwise, we're going to have to check them all by hand (or use mass find/replace to turn off defunct everywhere and Statistics on for those, then just redefunct error sites?).--ete (talk) 12:40, 7 June 2013 (UTC)


 * I'll look into adding some logic to Audit Bee to enable this. It should be okay, although I'm worried about trying to guess at the URL for Special:Statistics, it can be a number of different things. In the meantime, this search shows sites that should probably get this activated. 🐝 thingles (talk) 00:05, 10 June 2013 (UTC)


 * Alright, with this commit collection of statistics data via Special:Statistics will work for sites that have it configured. I'm positive this is going to have bugs and blow up. I'll deal with the exceptions as they come. I also strengthened User:Bumble Bee with this commit that put collection for each site in a nice try/catch. Previously if one site in a segment blew up it would stop the remaining ones. That won't happen anymore. 🐝 thingles (talk) 21:33, 9 June 2013 (UTC)


 * For examples of sites working with this method see: Rodovid (en), Bastion Wiki, 2012 Wiki and BODDoctor.


 * Thank you very much, this is wonderful. Too bad we owe it in part to spoilt camping.
 * Now we could add more wikis which we didn't add because they lacked API, maybe? Though that would mean adding some critical metadata manually, I expect. --Nemo 06:03, 10 June 2013 (UTC)


 * I've already written User:Wikkiibot to pull in Farm:Wikkii. Will be running soon. 🐝 thingles (talk) 02:28, 11 June 2013 (UTC)

Historical data for high profile wikis?
I'm curious if anyone is aware of a source that I could get historical statistics information for high profile wikis like Wikipedia (en). What I would love to find is some database that had the historical data points for the statistics API call along with a timestamp for each point:

{ "query": { "statistics": { "activeusers": 127907, "admins": 1446, "articles": 4250529, "edits": 616931733, "images": 807894, "jobs": 15674, "pages": 30340510, "users": 19106087 } } }

I'm thinking that that might exist. I'm pretty sure that Wikistats has it, and even has a teaser that Historic data can be found here which is a 404 error. Maybe some folks here know some folks on Wikistats? Or know of other data sources? I think it would be cool for some of the very big wikis to be able to see the graphs going back many years. Format isn't all that important. CSV would work, heck a MySQL database dump would be fine. This is a one time task.

🐝 thingles (talk) 04:22, 7 June 2013 (UTC)


 * Not ideal since the wikis would have to install, but how about doing this via an extension which creates some API options to query arbitrary dates statistics? Would it be possible to make that work in a non-hideously resource heavy way? Alternatively, and also not ideal for different reasons, how about using the web archive? For major sites they'll almost certainly have pretty regular copies of Special:Statistics, but you'd probably have to figure out how to read the non-raw version and work with archive.org's system for retrieval.--ete (talk) 12:45, 7 June 2013 (UTC)


 * The main statistical page is http://stats.wikimedia.org/ but I do not know if there is useful stuff around. --&#91;&#91;kgh&#93;&#93; (talk) 15:13, 7 June 2013 (UTC)

Ops FAQ
I think we are going to need some sort of Operations FAQ at some point. Feel free to contribute to it. I'm starting with a new flag I just added. 🐝 thingles (talk) 15:56, 7 June 2013 (UTC)

Gone camping
This is what I'll be doing the next couple of days.



Have a great weekend all! 🐝 thingles (talk) 19:54, 7 June 2013 (UTC)


 * Lot's of fun to you! --&#91;&#91;kgh&#93;&#93; (talk) 07:51, 8 June 2013 (UTC)


 * Got rained out the 2nd night so ended up coming home early. The upside for WikiApiary is I got the Special:Statistics collection working. :-) 🐝 thingles (talk) 00:06, 10 June 2013 (UTC)


 * I was already wondering about you activity. It is a pity that you had to return early. --&#91;&#91;kgh&#93;&#93; (talk) 15:56, 10 June 2013 (UTC)

Demote doesn't seem to work
HAA Best Practices is listed at main page (and others) even though the attribute is set. --Curlybracket (talk) 13:12, 10 June 2013 (UTC)


 * Good catch, the query had to be expanded to respect this flag. Cheers --&#91;&#91;kgh&#93;&#93; (talk) 15:55, 10 June 2013 (UTC)


 * Thanks for grabbing that kgh! 🐝 thingles (talk) 19:25, 10 June 2013 (UTC)


 * No probs at all, though I was wondering if the property should probably be called "Is demoted" instead of "Is demote"? --&#91;&#91;kgh&#93;&#93; (talk) 20:36, 10 June 2013 (UTC)

Introducing Wikkiibot
I just got a first version of User:Wikkiibot up and running. I've done a couple of test runs and he looks pretty good. Check out Special:Contributions/Wikkiibot if you have a moment. I made sure that all of these sites do have API URL's, since some of them seem to have an API enabled. User:Audit Bee can check every few months and switch sites that have API's enabled over. Also, I'm purposefully not bothering with logos as I'm going to get User:Logo Bee going that will generically hunt for logos for sites that don't have them, and maybe even update logos that are out of date. Anyway, Wikkiibot will bring in 3,200+ websites so this will be a pretty heavy lift. I'll probably run it in batches of a few hundred. 🐝 thingles (talk) 02:21, 11 June 2013 (UTC)


 * Also, I got the code cleaned up so it has no passwords so you can check User:Wikkiibot's source if you wish on Github. 🐝 thingles (talk) 02:24, 11 June 2013 (UTC)


 * For now I've set User:Wikkiibot to load 100 new sites every day at 03:00 UTC. 🐝 thingles (talk) 02:55, 11 June 2013 (UTC)


 * Nice to see Politicalwiki (Wikkii), since it shows that the duplicate name check logic is working in User:Wikkiibot. This bot checks for the pagename already being used and if so adds " (Wikkii)" to the pagename. 🐝 thingles (talk) 01:22, 12 June 2013 (UTC)

Remove some data?
Is it possible to remove some stastics data for a Wiki? I want to remove everything before June 5th for the LimeSurvey wiki -> http://wikiapiary.com/wiki/LimeSurvey_Manual#tab=Overview. The reason is that we had alot of wrong counts before that (Wiki was converted from a different software and initStats resulted in correcting the stats escept for the 'Active users' count, which seems to be a bug in MediaWiki --Curlybracket (talk) 08:42, 13 June 2013 (UTC)


 * Not something I want to do a lot of, but I did go in and clean this up. I was able to delete just everything before your page count got right, and then I fixed the gap errors in the active user counts. Cheers! 🐝 thingles (talk) 04:28, 14 June 2013 (UTC)


 * Hehe, I know it would be some hassle. Thank you very much - much appreciated. --Curlybracket (talk) 07:02, 14 June 2013 (UTC)

Wikia doesn't have 370,953,036 users
They really have 9,594,175. It turns out that the Farm:Wikia reports the users (not active users, total registered users) is to report that every person who signs up on Wikia is a member of every wiki on Wikia. This resulted in Template:Farm calculating Farm:Wikia users wrong. I added a special check for this so now Farm:Wikia has the max user count of any of it's member wikis, which accurately reflects reality. 🐝 thingles (talk) 13:02, 13 June 2013 (UTC) PS: I wonder if the new universal registration stuff that Farm:Wikimedia is working on may have a similar outcome.
 * This will likely end up to be the case for farms that share a user table. For example, Farm:Club Penguin Wiki Network claims 6,767 users while the user count is really 5,197. Also, I've run initStats.php on the CPWN wikis so that number's going to jump. --Tux (talk) 18:28, 13 June 2013 (UTC)


 * Hmm, perhaps that should be a flag that can be easily set for the farm? A checkbox that indicates if the farm shares users and then the calculation can adjust? Would be easy, just replace the check for Wikia with checking for that flag. Make sense? Seems to to me. 🐝 thingles (talk) 20:32, 13 June 2013 (UTC)


 * And then Farm:WikiEducator comes in and does both, or at least I suspect. It looks like WikiEducator (fr) and WikiEducator (en) share a user table, but WikiEducator (es) and WikiEducator (he) do. :-\ I'm guessing there are three hosts running those four sites? Could still be done as described above, but would end up breaking this apart and putting fr and en in a farm of their own, that is then a child farm of Farm:WikiEducator so the total users calculation would be done right. 🐝 thingles (talk) 05:19, 14 June 2013 (UTC)


 * I added this as a generic checkbox for farms. I noticed that Farm:18DAO Reference Wiki needed it, and Tux added it for Farm:Club Penguin Wiki Network. The new usage tables in Template:Farm, while less pretty, make it easy to see sites that need this since their user counts are all very near each other. 🐝 thingles (talk) 23:13, 16 June 2013 (UTC)

Introducing User:TropicalBot
I just made User:TropicalBot to bring in Farm:TropicalWikis. This farm uses Extension:Farmer so the host list was very similar to Farm:Wikkii, except Farm:TropicalWikis is actually fast, has API enabled and isn't currently down for several hours. :-\ I'm running an import of the 190 wikis in Farm:TropicalWikis right now. See Special:Contributions/TropicalBot to check out his work. 🐝 thingles (talk) 02:36, 14 June 2013 (UTC)
 * You're welcome! I would've helped beforehand, though. --Tux (talk) 11:48, 14 June 2013 (UTC)
 * Tux, is TropicalWikis your farm? 🐝 thingles (talk) 14:43, 14 June 2013 (UTC)
 * Correct. --Tux (talk) 17:02, 14 June 2013 (UTC)
 * Awesome! I didn't realize that, but now I get while there were all those Club Penguin sites on that farm! :-) 🐝 thingles (talk) 18:59, 14 June 2013 (UTC)
 * Yeah... the CPWN farm and TropicalWikis share a lot of overlap. I digress, as I'm sysadmin for both. By the way, just added the Club Penguin Archives wiki. Lots of files there. --Tux (talk) 19:16, 14 June 2013 (UTC)

-1 Active Users
I finally put in a catch for times when MediaWiki reports -1 active users. I'm not sure why it does that, but on idle wikis with no active users it seems to do it fairly often. User:Bumble Bee will now catch anything < 0 and set it to 0. I also updated all the historical statistics to reset < 0 to 0. Cheers, 🐝 thingles (talk) 04:21, 14 June 2013 (UTC)


 * BTW, this issue is over 6 months old! Glad to get rid of it. 🐝 thingles (talk) 04:22, 14 June 2013 (UTC)


 * I could only reproduce it when I let the stats recount by using the initStats.php script. For some reason the active user count only updates when you visit the statistics page and reload it. That's the reason in general why the active user count is pretty jumpy. I think it is a bug in Mediawiki because I would expect the value to updated as soon as statistics are requested _before_ display. Oh, and it is not updated at all when requesting the statistic by API.--Curlybracket (talk) 20:57, 17 June 2013 (UTC)

Wikkii down, how to ignore it
For a while now today all of Wikkii is has been down. This is causing all the checks on those websites to timeout, filling up the Bot log and also causing bots to generally run much longer than they would otherwise since they have to wait for the timeout on each of the calls to those wikis. This makes me think that I may want to have an ability to put groups of sites on administrative hold for situations like this. Mostly noting that here so I can ponder some ways to do that. It can't involve turning off all of the wikis individually. There could be some dynamic regex that is checked to see if an operator block is in place at this moment. If you have ideas, throw them out. 🐝 thingles (talk) 04:39, 14 June 2013 (UTC)

Adaptive updating for check frequency
I'm worried about the defunct process as we get more and more wikis and as more are automatically. Some of those wikis may come back, but more importantly I think it's too much of a load on operators to have to mark them defunct. Consider a host that has a thousand wikis that goes away. Granted that could be done in bulk, but marking 1,000 wikis defunct seems laborious.

At the same time, there are a lot of wikis in the farms (notably Farm:Wikkii) that are completely unused. Have no users. No edits, etc. They aren't defunct. But they obviously don't need to be looked after very often. Right now User:Bumble Bee will check them every four hours. I don't want to mark them as defunct, but I also don't want to waste resources checking and storing data every four hours for sites that won't change in 4 years!

I would like to look into using some form of adaptive algorithm to move the Property:Check every for each site up and down. A lot of errors would progressively move it up and up so it's being checked very infrequently. Having no edits would do something similar. If edits started showing up, the delay between checks should come back down. You get the drift.

Does anyone have suggestions on formulas that aren't overly complex to achieve something like this? 🐝 thingles (talk) 04:45, 14 June 2013 (UTC)


 * Hm, kinda a reply to the previous and this section, but some error messages should count towards an entire farm. For example, if for the last 200 attempts to contact wikis on a farm all have given errors, automatically setting the whole farm to be checked less regularly until it stops giving errors would be good (a setting on the Farm: page I imagine?).


 * That general plan of a flexible how often to check setting automatically updated seems good. Additionally, would it be practical to, every few days perhaps, run a sweep on the database and remove values (e.g. edit count, users) when the value is the same for the time before and after? This could massively reduce database size if it's being filled with huge numbers of useless data points from dead wikis, and leave the graphs displaying virtually as they are now.


 * As for specific formulae.. let's see. Options for things which increase time between checks: repeatedly getting the same value for certain values (edit count would probably be the most useful, since constant new users without any increase of editcount means the wiki is dead, but has spambots registering who are unable to edit), repeatedly getting errors (check for alternate collection methods first), being marked as spammed (by hand for now, maybe teach the bot to recognize it long term). hm, and I guess rather than directly editing the check every value (which would result in losing human settings, or this not working on wikis with human settings), introducing another value which would be a multiplier for the check every value would be nice.


 * My initial rough sketch (probably with too many variables, strip out annoying ones) would be something like.. check every multiplier = cumulative error count^1.5*(1+total errors/total non-errors)*if(spammed,10,1)*days since last increase in editcount (min one, and one if site has been recently added).
 * Probably needs some tweaking of values (especially the exponent of the first part and the days part of the last), but that should massively decrease checks to dead wikis fairly quickly. If it's too quickly and we end up not picking up on ones which come back fast enough, you could set a maximum value for the check every multiplier (once every two weeks or month perhaps)?--ete (talk) 18:03, 16 June 2013 (UTC)

Re-audits fixed
Turns out I had a bad mistake in Concept:Websites expired audit, I was querying for the wrong property. As such, sites were never being re-audited. This is fixed now, so pretty much all sites will be getting a fresh audit now. 🐝 thingles (talk) 05:25, 15 June 2013 (UTC)


 * Excited to see a lot of new skins coming into WikiApiary as websites that have been upgraded are now getting Collect skins turned on in the re-audits. 🐝 thingles (talk) 18:29, 16 June 2013 (UTC)

Audit Bee now sets founded dates
I added some code to Audit Bee and he will now automatically set the founded date for a wiki if possible. When performing an audit, if the site does not already have a founded date set, Audit Bee will ask the remote wiki for the timestamp associated with revision 1 of the wiki (the very first edit). Most wikis can answer this question, and if they do, that date is put into the founded date. If it is wrong, you can set it to something else and Audit Bee will leave that alone. Before starting this there are 85 wikis with a founded date. As of right now, there are 0 with a founded date set. I'm excited about this data point as it will allow us to show a record of when wikis were created, think of a calendar or timeline. We will also be able to highlight in the page how old the wiki is and derive some nice statistics from that (edit count divided by days active for example). We can also have a "Wikis with Birthdays Today" highlight on the front. Fun stuff! :-) 🐝 thingles (talk) 05:30, 15 June 2013 (UTC) PS: Yes, this is what I do for fun on Friday night. :-)


 * Nice to see this working so well; example Eugene Neighbors. Look at the history for Eugene Neighbors. Audit Bee turned off things in March when they were on MediaWiki 1.10. On March 7 they upgraded to 1.20.3. And now Audit Bee is enabling skin collection, extension collection, setting founded date, etc. Awesome. 🐝 thingles (talk) 05:34, 15 June 2013 (UTC)


 * This is fun. See Youngest websites and Oldest websites. And Birthday today. 🐝 thingles (talk) 06:21, 15 June 2013 (UTC)


 * This is so much fun. Love this Websites by year started list. 🐝 thingles (talk) 18:26, 16 June 2013 (UTC)

52,801 data points on June 14, 2013
I thought it would be interesting to just capture this. On June 14th (yesterday) User:Bumble Bee sampled 52,801 data points from the various wikis in WikiApiary. 🐝 thingles (talk) 07:17, 15 June 2013 (UTC)

Timeout on Statistics
I'm getting a 504 Gateway Time-out error when trying to load the Statistics page. Maybe too many big queries on one page? Crafting an edit URL and setting each of the limits to 10 rather than 1000 results made it load on preview, but some of the graphs seem not very useful with that limit. Maybe split and link to more detailed pages for each?--ete (talk) 18:03, 16 June 2013 (UTC)


 * Thanks ete. That page has to be completely rethought. As you can see I added an entire section on "Reports" in MediaWiki:Sidebar. I'm really thinking that Statistics should just be a page with links out to dozens of other pages with isolated queries and graphs. I'll move things around soon, but if others have ideas feel free to start flushing out some new pages as well. 🐝 thingles (talk) 18:28, 16 June 2013 (UTC)