WikiApiary talk:Operations/2013/October

New Property:Has IP address
I had some emails with User:MarkAHershberger today about tracking wiki farms and he was wondering if WikiApiary could possibly get the IP address of wikis to match them up, even when farms aren't necessarily identified. I just added a little hack into User:Bumble Bee to do just that and they are now populating. Within 12 hours all will be picked up. Note that this is using the multiproperty capability in WikiApiary, so if a wiki uses multiple addresses over time it will pick them all up (this is currently used for DB versions). Screenshot of the data in the ApiaryDB:



Fun to watch this fill in. See if it tells us much. Also, jhe I wish Semantic MediaWiki had an IP address data type! :-)

🐝 thingles (talk) 01:41, 2 October 2013 (UTC)

''PS: I found a bug that showed up in this diff and have fixed it. FreeWiki/General will get fixed on the next update.''
 * nice... now to make use of this. -- ☠ MarkAHershberger ☢ (talk) ☣ 02:10, 2 October 2013 (UTC)

New Property:Has reverse lookup
Following up on Property:Has IP address it hit me that it might be really useful to do a reverse hostname lookup on the IP. Turned out to be really easy to add this as well, so that is now in. This is particularly useful since the reverse lookup tends to identify clearly the hosting provider. This is also being managed as a multiproperty.



🐝 thingles (talk) 02:23, 2 October 2013 (UTC)

PS: Mark, note Dreamhost listed in the graphic above.


 * Note this page User:Thingles/Hosting providers. 🐝 thingles (talk) 02:40, 2 October 2013 (UTC)
 * Probably noted elsewhere, but sorting by version number leaves 1.9 at the opposite end of the list than 1.10 with 1.20 in the middle. -- ☠ MarkAHershberger ☢ (talk) ☣ 13:41, 2 October 2013 (UTC)
 * posted to wikitech-l -- ☠ MarkAHershberger ☢ (talk) ☣ 15:02, 2 October 2013 (UTC)

Whois integration
Late last night/early this morning I added a whois query to bumble bee to augment the IP and Reverse hostname lookups. This now queries whois and adds the organization that owns the network block being used. See Property:Has netblock organization and Property:Has netblock organization handle. This will almost certainly allow groupings of wikis into logical units (all wikis at a University, all wikis hosted by Dreamhost, etc). The biggest limitation is that the Python library I used only queries ARIN, so addresses that are registered with one of the four other regional Internet registries bodies won't have detailed info yet. This is why you will see a lot of entries for "RIPE Network Coordination Centre". That is another RIR that I should then query to get the right answer. I'm open to code that would query them. ARIN will give good data for US and Canada. There is a really good Stack Overflow answer on Python libraries to do whois. Unfortunately they either only query ARIN, or don't allow IP lookups. :-( 🐝 thingles (talk) 13:18, 2 October 2013 (UTC)

New network fields
This diff does a good job of showing the new information coming into WikiApiary. 🐝 thingles (talk) 13:24, 2 October 2013 (UTC)

New property delay
Note that if you look for these new properties, they won't be populated on many wiki's entries until Bumble_Bee gets to them, so have a little patience. -- ☠ MarkAHershberger ☢ (talk) ☣ 15:20, 2 October 2013 (UTC)


 * Good highlight Mark. Yeah, Bumble Bee is collecting this when he updates the "/General" subpages for each site. This is updated once every 24 hours. 🐝 thingles (talk) 22:16, 2 October 2013 (UTC)

New subpages
Shortly I plan on making a modification to User:Bumble Bee to create two new subpages for each site. As everyone knows there is a /General, /Extensions and /Skins for each site. These store the actual data and are transcluded into each sites properties. I've grown to really like how you can add these pages to watchlists and get notifications.

A while back I added some geographic data for each site using the MaxMind Free GeoIP database. I was lazy at the time, and just appended the Template:Network info into the /General page. I recently started pulling data from whois queries and that I just added to the existing Template:General siteinfo. I don't think either of these were good ideas, just the fastest at the time.

I'm going to move these two data items into new subpages, /Whois and /Maxmind. They will be bot maintained pages just like the others. This will keep watchlists from triggering on things that really aren't /General. I also plan to add more fields from the whois queries in the future. Additionally, I plan to have different cache durations for each page. I want /Extensions to update every 24 hours. But something like /Whois or /Maxmind (and also likely rename Template:Network info to Template:Maxmind) is probably fine at weekly or even monthly for that matter.

I'll post an update after I've made the change. I expect it to take a bit for things to propagate. Some properties may drop away until they get refilled on their new pages. In general the philosophy that seems to make sense to me is to have a subpage for each type of dataset that is stored for the site.

Note that this will add over 18,000 pages to WikiApiary.

🐝 thingles (talk) 19:48, 5 October 2013 (UTC)