Talk:Yellpedia

Site Statistics are completely wrong
To whom ever,

I love the concept of this site and in general think it's a great thing, but sadly your statistics about my site, yellpedia.com, are completely wrong. The issue is the way the Mediawiki software does page counts. It's a fault of the design of the magic word, which only counts a page if it contains one wiki link or is categorized to at least one category. Normally for wikis that grow organically this is a perfectly justified method, but we at yellpedia.com uploaded, via a special script, the entire contents of the United States Yellow Pages. This created over 10 million pages, not the 1400 our current statistics show us as having. Part of this problem is when we created all those pages we did not record them to the category tables in the MYSQL -- this happens when someone edits and saves a page for the first time. Not sure how this can be corrected on this site, or if could be, since this data is collected via automatic bots so we're leaving this note. Chris Tharp (talk) 03:18, 9 April 2013 (UTC)


 * Hmm, indeed Bumble Bee does collect articles, but it also collects pages. When I request statistics via the API I get . WikiApiary collects and graphs both   and  . I am aware of the article count stuff, but I don't see how that would alter the page count that is returned via the API. This is actually why the page line is highlighted stronger than the articles line. You can see some discussion about this on my talk page. With all that said, I'm not sure at all why the page count from your statistics endpoint is returning a lower number. Perhaps others might know? 🐝 thingles (talk) 20:21, 9 April 2013 (UTC)


 * Thingles, I wish I had a clue why the page count is off completely in the statistics, but so far I've been unable to find an answer to the question. Since it's not my top concern I just decided in the end not to worry about it. I'm sure the problem lies with the fact that we "cheated" when we added all our data and didn't record everything to every table. These days if anyone questions my statements on the size of Yellpedia I just tell them to do a search by State -- entering State Ca, for example, returns 1,180,695 page results.(Not implying your questioning my statements, but some have). All the best with your project here. Chris Tharp (talk) 21:16, 9 April 2013 (UTC)


 * It would be interesting to see what Semantic usage would report (see the note below this). I'm curious, by "cheated" did you insert content directly into MediaWiki's database when you upload via your script? If so, I'm guessing it's exactly as you are thinking that there is some internal thing that is not getting updated to reflect the page count right. For what it's worth, I would suggest doing that via the MediaWiki edit API instead. That would insure you are protected form internal database changes and that all internal references are handled properly. 🐝 thingles (talk) 21:54, 9 April 2013 (UTC)


 * Just thinking about this further, that is likely also why your edit count is too low too. Each one of those new pages should be an edit. 🐝 thingles (talk) 21:56, 9 April 2013 (UTC)


 * Yes, we directly inserted content into the Mediawiki database, which may or may not have been the ideal way to do it, but it's done now. Most likely adding data via the method you suggested is a better way to go and in the future we most likely will be using it. The thing I'm curious is the speed of the API -- as I recall we were adding something like 120,000 Pages an hour when we uploaded our data. If all goes well we are going to face this problem in the future since we hope to expand beyond just the directory listings of the United States. Chris Tharp (talk) 00:07, 10 April 2013 (UTC)


 * You should be able to correct this by running updateArticleCount.php, after altering $wgArticleCountMethod to $wgArticleCountMethod = 'any';. Though for a site with as many pages as yours, it may take some time to run.--Ete (talk) 20:59, 14 April 2013 (UTC)

Semantic usage?
I see that Yellpedia uses Semantic MediaWiki. You may find it useful to also enable collection of semantic usage information as well. I would also be curious to see if that returns radically different statistics than siteinfo, per the issue above. 🐝 thingles (talk) 20:22, 9 April 2013 (UTC)