WikiApiary:Pavlo import project

Nemo bis mentioned on User_talk:Thingles Pavlo's list of wikis. Rather than deal with the format of this in the mutante list, it's easier to deal with this file directly.

Discussion
Given the size of this import, I think it makes sense to add a specific category tag to these pages. Suggesting Category:Pavlo import be placed in all these pages. This would especially help with any Replace Text needs or other followup mass actions. Thingles (talk) 03:18, 26 January 2013 (UTC)
 * I reran it with this category code added to the template. Seems like a good idea. Thingles (talk) 03:28, 26 January 2013 (UTC)
 * Yes, this should be done. Also it would help to track the source of the content. --&#91;&#91;kgh&#93;&#93; (talk) 10:05, 26 January 2013 (UTC)

This import is going to be messy. As you can see in the debug output, these sites would be imported with Validated false and Active false. It may make sense to just have active false. Since the name of the wiki is retrieved from the API on this import, the API URL is already validated. Thingles (talk) 03:20, 26 January 2013 (UTC) PS - Note Validated=No in the test runs below.


 * I would import Validated=no since this would help to distinguish if a human eye looked at the page or not. Since Active has to be ticked anyway ... There may probably still be wikis not being activated though they are valid (e. g. Audit Bee or other reasons) --&#91;&#91;kgh&#93;&#93; (talk) 10:05, 26 January 2013 (UTC)

Audit Bee will run this import. Right now he is not checking for any preexisting sites. There could be a query to see if any existing sites have the same API URL being reviewed before importing, but it would slow it down dramatically and I suspect would not hit any matches. Either way, if a site is overwritten the revision history would be able to recover the original. Thingles (talk) 03:23, 26 January 2013 (UTC)


 * Looking at the file a bit more, this may be needed.

% grep wikinews wikilist.txt | wc -l 18


 * A check on a preexisting API URL would match for these and could be skipped. Thingles (talk) 03:30, 26 January 2013 (UTC)


 * A cheap and cheesy alternative solution would be to check the result of the edit action to the wiki. If it returns a new page was created there is no conflict. If it returns anything else I could flag that page name to be reviewed. This seems workable. Thoughts? A new page will return

{'edit': {'pageid': 2541, 'title': 'WKGE-Wiki/Extensions', 'newtimestamp': '2013-01-23T06:17:17Z', 'result': 'Success', 'new': '', 'oldrevid': 0, 'newrevid': 10141}}


 * Note the . If the edit is a   or the   is not 0 then further investigation would be warranted. Thingles (talk) 03:48, 26 January 2013 (UTC)


 * Yeah, this would be the way. --&#91;&#91;kgh&#93;&#93; (talk) 10:05, 26 January 2013 (UTC)

It is really hard to know if a language code should be added to the target page names. Not sure how to deal with that, or if it's even worth trying. Thingles (talk) 03:30, 26 January 2013 (UTC)


 * This should only be done if there is some kind of language farm. However, in this case the page get's overwritten by Audit Bee over and over, telling us a second import with language codes added to the page name should be done. --&#91;&#91;kgh&#93;&#93; (talk) 10:05, 26 January 2013 (UTC)

To Do

 * 1) Add 2nd category for groups of 100 sites in the import. Useful to limit Replace Text actions.
 * 2) Attempt to determine is Semantic MediaWiki is available and set flag appropriately.
 * 3) * This would include determining if SMW is 1.6+. We actually need a further property what distinguishes between < 1.6 and >= 1.6. The latter should be done anyway. --&#91;&#91;kgh&#93;&#93; (talk) 11:35, 26 January 2013 (UTC)
 * 4) ** I'm going to do it easier and just ask for the SMW stats API method, if it returns I know it should be collected. Thingles (talk) 12:43, 26 January 2013 (UTC)
 * 5) Attempt to identify remote $wgLogo, upload file and set it.
 * 6) * This would include renaming the file name to the wiki's name. Heaps of files will be just called logo.png Working on logos will, besides adding the description (I am afraid that nothing can be done about this automatically), take most of the time. --&#91;&#91;kgh&#93;&#93; (talk) 11:32, 26 January 2013 (UTC)
 * 7) ** I've got it working for a large number of them! :-) And they will be renamed with the wiki name and logo appended. Thingles (talk) 12:43, 26 January 2013 (UTC)
 * 8) Identify import overwrites for review.

Debug of first 30 lines
This is the output of running the bot on the first 30 lines of the Pavlo file. 20 success, 10 failures.

% python import-pavlo.py (1) Processing: http://125.160.17.21/wiki/api.php (1) API Call: http://125.160.17.21/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(1) Target: SpeedyWiki

(2) Processing: http://128.174.125.122/wiki/api.php (2) API Call: http://128.174.125.122/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json (3) Processing: http://128.175.126.139/gallusWiki/api.php (3) API Call: http://128.175.126.139/gallusWiki/api.php?action=query&meta=siteinfo&siprop=general&format=json (4) Processing: http://129.2.15.45/wiki/api.php (4) API Call: http://129.2.15.45/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(4) Target: CARMA

(5) Processing: http://130.15.118.46/curr309/api.php (5) API Call: http://130.15.118.46/curr309/api.php?action=query&meta=siteinfo&siprop=general&format=json 'query' (6) Processing: http://130.230.88.154/api.php (6) API Call: http://130.230.88.154/api.php?action=query&meta=siteinfo&siprop=general&format=json

(6) Target: Ticsp

(7) Processing: http://131.130.46.67/wiki/api.php (7) API Call: http://131.130.46.67/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(7) Target: Philo Wiki

(8) Processing: http://137.99.79.133/wiki/api.php (8) API Call: http://137.99.79.133/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(8) Target: UConn PAN

(9) Processing: http://139.57.180.167/wiki/api.php (9) API Call: http://139.57.180.167/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json (10) Processing: http://140.134.26.7/selabwiki/api.php (10) API Call: http://140.134.26.7/selabwiki/api.php?action=query&meta=siteinfo&siprop=general&format=json HTTP Error 404: Not Found (11) Processing: http://140.254.84.203/wiki/api.php (11) API Call: http://140.254.84.203/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(11) Target: PLANTFACTS.OSU.EDU

(12) Processing: http://141.50.94.20/wiki/api.php (12) API Call: http://141.50.94.20/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json (13) Processing: http://147.96.5.37/SklogWiki/api.php (13) API Call: http://147.96.5.37/SklogWiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(13) Target: SklogWiki

(14) Processing: http://1503.annowiki.org/api.php (14) API Call: http://1503.annowiki.org/api.php?action=query&meta=siteinfo&siprop=general&format=json

(14) Target: AnnoWiki 1503

(15) Processing: http://151.100.9.100/emclab/api.php (15) API Call: http://151.100.9.100/emclab/api.php?action=query&meta=siteinfo&siprop=general&format=json timed out (16) Processing: http://158.108.32.49/wiki/api.php (16) API Call: http://158.108.32.49/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(16) Target: Theory Wiki

(17) Processing: http://1602.annowiki.org/api.php (17) API Call: http://1602.annowiki.org/api.php?action=query&meta=siteinfo&siprop=general&format=json

(17) Target: AnnoWiki 1602

(18) Processing: http://163.13.175.46/wiki/api.php (18) API Call: http://163.13.175.46/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(18) Target: TwBsBall

(19) Processing: http://1701.annowiki.org/api.php (19) API Call: http://1701.annowiki.org/api.php?action=query&meta=siteinfo&siprop=general&format=json

(19) Target: AnnoWiki 1701

(20) Processing: http://18dao.jamesqi.com/api.php (20) API Call: http://18dao.jamesqi.com/api.php?action=query&meta=siteinfo&siprop=general&format=json [Errno 104] Connection reset by peer (21) Processing: http://193.132.104.136/api.php (21) API Call: http://193.132.104.136/api.php?action=query&meta=siteinfo&siprop=general&format=json

(21) Target: Your Archives

(22) Processing: http://194.204.30.253/api.php (22) API Call: http://194.204.30.253/api.php?action=query&meta=siteinfo&siprop=general&format=json

(22) Target: ZWiki

(23) Processing: http://195.130.120.154/wiki/api.php (23) API Call: http://195.130.120.154/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(23) Target: EconWiki

(24) Processing: http://2007.newformsfestival.com/artcamp/api.php (24) API Call: http://2007.newformsfestival.com/artcamp/api.php?action=query&meta=siteinfo&siprop=general&format=json 'query' (25) Processing: http://2008.igem.org/wiki/api.php (25) API Call: http://2008.igem.org/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(25) Target: 2008.igem.org

(26) Processing: http://2009.igem.org/wiki/api.php (26) API Call: http://2009.igem.org/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(26) Target: 2009.igem.org

(27) Processing: http://208.100.59.10/wiki/api.php (27) API Call: http://208.100.59.10/wiki/api.php?action=query&meta=siteinfo&siprop=general&format=json

(27) Target: ISFDB

(28) Processing: http://208.109.125.217/WOEkipedia/api.php (28) API Call: http://208.109.125.217/WOEkipedia/api.php?action=query&meta=siteinfo&siprop=general&format=json (29) Processing: http://209.197.90.23/api.php (29) API Call: http://209.197.90.23/api.php?action=query&meta=siteinfo&siprop=general&format=json

(29) Target: FreeOrionWiki

(30) Processing: http://210.115.53.45/api.php (30) API Call: http://210.115.53.45/api.php?action=query&meta=siteinfo&siprop=general&format=json

(30) Target: Geochemistry

Processed: 31 Success: 20  Fail: 10