Here is a detailed statistical analysis of Indian language Wikipedias for the month of 2010 January .
The PDF of this analysis is available at http://shijualexonline.googlepages.com/2010_01_january_en.pdf. Please use the PDF for referring to the tables as the size of the tables in blog post is quite lengthy.
The data for this report is taken from the statistical analysis of all the WikiMedia wikis prepared and maintained by Erik Zachte (Website: http://infodisiac.com/). Special thanks to Erik for all the support extended by him while I was compiling this report. The data is collected on the last day of every month. That is, the data for the month of 2010 January is collected at 2010 January 31 23:59 PM GMT.
The statistical analysis of the following Indian language Wikipedias is included in this blog post.
Assamese (http://as.wikipedia.org)
Bengali (http://bn.wikipedia.org)
Bhojpuri (http://bh.wikipedia.org)
Bishnupriya Manipuri (http://bpy.wikipedia.org)
Gujarathi (http://gu.wikipedia.org)
Hindi (http://hi.wikipedia.org)
Kannada (http://kn.wikipedia.org)
Kashmiri (http://ks.wikipedia.org)
Malayalam (http://ml.wikipedia.org)
Marathi (http://mr.wikipedia.org)
Odia (Oriya) (http://or.wikipedia.org)
Pali (http://pi.wikipedia.org)
Punjabi (http://pa.wikipedia.org)
Sanskrit (http://sa.wikipedia.org)
Sindhi (http://sd.wikipedia.org)
Tamil (http://ta.wikipedia.org)
Telugu (http://te.wikipedia.org)
Urdu (http://ur.wikipedia.org)
I have also included the statistical analysis of some other language wikipedias from Indian subcontinent even though these languages are not spoken in India. I am very much interested in the wiki activity of these languages.
Burmese (http://my.wikipedia.org)
Nepal Bhasha/Newari (http://new.wikipedia.org)
Nepali (http://ne.wikipedia.org)
Sinhala (http://si.wikipedia.org)
I know that there is no meaning in comparing an inactive wiki with less number articles like Assamese or Oriya, to highly active wikipedias like Hindi or Telugu. But for this month, let me treat all of them together. Next month onwards I would like to treat them as two separate entities.
I hope this initiative will improve the interaction between different Indian Language Wikipedias/wikipedians. We (Malayalam Wikipedians - http://ml.wikipedia.org) have been maintaining a similar comparison study of the major Indian Language wikipedias for the past two years. This analysis has helped us to understand the status of Malayalam Wikipedia as compared to other Indian Language Wikipedias. I hope this analysis will also help other Indian language wikipedias.
Please feel free to add your suggestions/analysis as comment to this post. I have divided this report into two different sections.
Statistical analysis of Wikipedias
Localization status of Mediawiki software
Following are the different topics covered under each section.
Wikipedia Statistics
Article statistics
Number of Articles
Number of Edits
Break up of edits (2009 February – 2010 January)
Edits per article
Number of new articles/day
Average size of an article (bytes)
Database size (in Mega Bytes)
Percentage of articles with size greater than 500 bytes
Percentage of articles with size greater than 2000 bytes (2 kilobytes)
User Statistics
MediaWiki Statistics
Localization statistics
Number of Articles
Wikipedia Language | Number of Articles | ||
2009 November | 2009 December | 2010 January | |
Assamese | 261 | 261 | 263 |
Bengali | 20,754 | 20,918 | 21,016 |
Bhojpuri | 2,480 | 2,481 | 2,481 |
Bishnupriya Manipuri | 23,424 | 24,733 | 24,738 |
Gujarathi | 11,255 | 11,904 | 12,579 |
Hindi | 52,144 | 52,645 | 53,216 |
Kannada | 7,596 | 7,741 | 7,846 |
Kashmiri | | | 375 |
Malayalam | 11,459 | 11,635 | 11,871 |
Marathi | 25,737 | 26,034 | 26,544 |
Odia (Oriya) | 553 | 553 | 553 |
Pali | | | 2,316 |
Punjabi | 1,490 | 1,492 | 1,505 |
Sanskrit | 3,883 | 3,887 | 3,914 |
Sindhi | | | 349 |
Tamil | 20,095 | 20,472 | 20,959 |
Telugu | 44,098 | 44,238 | 44,333 |
Urdu | | | 12,547 |
| | | |
Burmese | | | 2,938 |
Nepal Bhasha/Newari | | | 61,487 |
Nepali | | | 3,079 |
Sinhala | | | 2,153 |
Hindi Wikipedia continues to have the highest number of articles among all the Indian language Wikipedias. Telugu Wikipedia is in the second place. The number of articles in Gujarathi Wikipedia is almost doubled during the past 6 months. Among the active Wikipedias, Kannada and Malayalam are growing very slowly (growth in terms of article number) compared to other active Indian language wikipedias.
When you analyze the different topics in this blog post, you will understand that some of the big wikipedias do not have enough number of edits corresponding to the number of articles. Also the number of stub articles in some big wikis are very high.
Let us hope the stub articles will get more content as more active contributors arrive.
Number of Edits
Wikipedia Language | Number of Edits | ||
2009 November | 2009 December | 2010 January | |
Assamese | 8,926 | 9,134 | 9,290 |
Bengali | 5,51,486 | | 5,86,472 |
Bhojpuri | 52,553 | 53,203 | 54,099 |
Bishnupriya Manipuri | 4,18,566 | 4,29,198 | 4,36,153 |
Gujarathi | 63,578 | 67,769 | 72,492 |
Hindi | 5,51,162 | 5,67,029 | 5,81,447 |
Kannada | 1,22,964 | 1,26,504 | 1,29,848 |
Kashmiri | | | 13,075 |
Malayalam | 5,33,391 | 5,51,307 | 5,69,056 |
Marathi | 4,45,205 | 4,58,769 | 4,74,113 |
Odia (Oriya) | 19,805 | 20,052 | 20,321 |
Pali | | | 48,865 |
Punjabi | 16,980 | 17,426 | 18,176 |
Sanskrit | 67,151 | 68,557 | 70,132 |
Sindhi | | | |
Tamil | 4,59,441 | 4,71,678 | 4,83,481 |
Telugu | 4,69,481 | 4,76,825 | 4,83,390 |
Urdu | | | 2,70,868 |
Burmese | | | 30,503 |
Nepal Bhasha/ Newari | | | 458,066 |
Nepali | | | 40,363 |
Sinhala | | | 74,493 |
More number of edits from multiple contributors will enhance the quality of articles in a Wikipedia.
Hindi, Bengali, and Malayalam Wikipedias have the highest number of edits in the Wikipedia. But most of the big wikis do not have the number of edits corresponding to the number of articles.
Break up of edits (2009 February – 2010 January)
Wikipedia Language | Bot edits | User Edits (Registered and Anonymous users) |
Assamese | 55 | 45 |
Bengali | 68 | 32 |
Bhojpuri | 92 | 8 |
Bishnupriya Manipuri | 96 | 4 |
Gujarathi | 37 | 63 |
Hindi | 49 | 51 |
Kannada | 53 | 47 |
Kashmiri | 83 | 17 |
Malayalam | 37 | 63 |
Marathi | 59 | 41 |
Odia (Oriya) | 92 | 8 |
Pali | 94 | 6 |
Punjabi | 55 | 45 |
Sanskrit | 82 | 18 |
Sindhi | 29 | 71 |
Tamil | 51 | 49 |
Telugu | 51 | 49 |
Urdu | 53 | 47 |
| | |
Burmese | 41 | 59 |
Nepal Bhasha/ Newari | 91 | 9 |
Nepali | 52 | 48 |
Sinhala | 12 | 88 |
You can use wiki bots to automate a set of tasks that are repetitive and boring. For example, adding inter wiki links, fixing double redirects, and so on. But for some wikipedias majority of the contributions are from the bots.
From the above table, you can conclude that for some of the big wikipedias, the wiki activity is not due to human editors. Edits by bots are more than 80% in some cases. Some wikis are even using the wiki bots to create one-liner articles just for the sake of increasing the article count. This approach needs to be changed.
We need to learn how to use the bots effectively to increase the quality of wiki articles, rather than creating one liner articles. Using bots to create the title-only articles will effectively block many future wikipedians from wiki editing. Most of the people outside wikipedia will start wikipedia edit by creating an article. By using bots to create thousands of one-liner articles we are loosing many of the future wikipedians.
Edits per article
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 19.3 | 20.0 | 20.2 |
Bengali | 16.7 | 17.0 | 17.2 |
Bhojpuri | 5.5 | 11.7 | 18.7 |
Bishnupriya Manipuri | 9.3 | 9.1 | 9.3 |
Gujarathi | 4.3 | 4.4 | 4.5 |
Hindi | 7.0 | 7.2 | 7.3 |
Kannada | 12.8 | 12.9 | 13.1 |
Kashmiri | 28.3 | 28.8 | 29.3 |
Malayalam | 26.6 | 27.1 | 27.4 |
Marathi | 13.1 | 13.3 | 13.5 |
Odia (Oriya) | 21.8 | 22.1 | 22.5 |
Pali | 17.7 | 18.0 | 18.3 |
Punjabi | 8.0 | 8.3 | 8.6 |
Sanskrit | 14.7 | 15.0 | 15.2 |
Sindhi | 23.5 | 23.8 | 24.1 |
Tamil | 16.3 | 16.5 | 16.5 |
Telugu | 8.0 | 8.0 | 8.1 |
Urdu | 15.9 | 16.0 | 16.2 |
| | | |
Burmese | 6.3 | 6.4 | 6.6 |
Nepal Bhasha/Newari | 3.0 | 2.9 | 2.9 |
Nepali | 9.7 | 10.0 | 9.9 |
Sinhala | 23.2 | 21.4 | 21.8 |
The number of edits in an article is an indicator of article quality and wiki activism. It shows how much attention that an article gets. More edits to an article by multiple contributors indicate that the quality and the neutrality of an article is better than an article with only a few edits by only one or two editors. (You can always debate this. But this is true in most of the cases.) Among the active big wikipedias Malayalam Wikipedia tops the list with an average of 29 edits for each of its articles.
Number of new articles/day
Wikipedia Language | 2010 January |
Assamese | Not Available |
Bengali | 5 |
Bhojpuri | Not Available |
Bishnupriya Manipuri | Not Available |
Gujarathi | Not Available |
Hindi | 18 |
Kannada | Not Available |
Kashmiri | Not Available |
Malayalam | 8 |
Marathi | 17 |
Odia (Oriya) | Not Available |
Pali | Not Available |
Punjabi | Not Available |
Sanskrit | Not Available |
Sindhi | Not Available |
Tamil | 16 |
Telugu | 3 |
Urdu | 5 |
| |
Burmese | 3 |
Nepal Bhasha/ Newari | 45 |
Nepali | Not Available |
Sinhala | 3 |
The above table shows the average number of articles created in a wiki daily.
Increasing the number of articles in a wikipedia is very important. Then only the wikipedia will get a large user base (readers). Hindi, Tamil, and Marathi Wikipedia tops this list. Hope all the Indian wikipedia communities will make sure these “new articles” will have sufficient content, so that wiki user does not need to navigate away to English Wikipedia or Google to get the information about a topic.
A large number of articles with good encyclopaedic content will attract more potential editors to the wikipedia. So please ensure that you are adding at least the basic facts and figures in all the newly created articles.
Average size of an article (bytes)
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 2506 | 2506 | 1492 |
Bengali | 1342 | 1383 | 1407 |
Bhojpuri | 118 | 119 | 119 |
Bishnupriya Manipuri | 1084 | 1086 | 1090 |
Gujarathi | 1056 | 1098 | 1099 |
Hindi | 1182 | 1235 | 1275 |
Kannada | 1923 | 2526 | 2806 |
Kashmiri | 424 | 422 | 420 |
Malayalam | 2690 | 2725 | 2740 |
Marathi | 768 | 777 | 800 |
Odia (Oriya) | 236 | 236 | 236 |
Pali | 141 | 141 | 141 |
Punjabi | 741 | 740 | 759 |
Sanskrit | 184 | 187 | 197 |
Sindhi | 4092 | 4080 | 4070 |
Tamil | 2118 | 2441 | 2574 |
Telugu | 832 | 883 | 915 |
Urdu | 1535 | 1554 | 1550 |
| | | |
Burmese | 2986 | 3037 | 3033 |
Nepal Bhasha/ Newari | 707 | 805 | 882 |
Nepali | 1256 | 1282 | 1259 |
Sinhala | 5892 | 5430 | 5452 |
A wikipedia article will benefit a user when the articles have sufficient content. But many of the big Wikipedias have less content in most of its articles.
It is extremely happy to see the long articles in Sinhala wikipedia. Sinhala Wikipedia has more than 2000 articles now. Many of the Indian language Wikipedias should learn from Sinhala wikipedia the art of adding more content to the existing articles.
Database size (in Mega Bytes)
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 1.5 | 1.5 | 1.5 |
Bengali | 81 | 84 | 86 |
Bhojpuri | 4.8 | 4.8 | 4.8 |
Bishnupriya Manipuri | 65 | 65 | 65 |
Gujarathi | 32 | 35 | 37 |
Hindi | 165 | 174 | 181 |
Kannada | 42 | 53 | 59 |
Kashmiri | 0.77 | 0.77 | 0.78 |
Malayalam | 88 | 90 | 93 |
Marathi | 63 | 64 | 67 |
Odia (Oriya) | 1.2 | 1.2 | 1.2 |
Pali | 4.7 | 4.7 | 4.7 |
Punjabi | 4 | 4 | 4.1 |
Sanskrit | 6.6 | 6.6 | 6.8 |
Sindhi | 2.8 | 2.8 | 2.9 |
Tamil | 119 | 138 | 148 |
Telugu | 97 | 103 | 107 |
Urdu | 40 | 42 | 42 |
| | | |
Burmese | 24 | 26 | 26 |
Nepal Bhasha/Newari | 107 | 128 | 144 |
Nepali | 9.2 | 9.5 | 9.8 |
Sinhala | 34 | 39 | 40 |
This is the total size of a wikipedia. No wonder Hindi Wikipedia tops the list, as it is also the biggest wiki with highest number of articles.
It is extremely happy to see that Tamil wikipedia with article number half of Hindi wikipedia is having the database size almost in the same range of Hindi Wikipedia. This shows that even though the number of articles in Tamil wikipedia is less as compared to Hindi or Telugu, it has more content in its articles.
Percentage of articles with size greater than 500 bytes
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 41 | 41 | 41 |
Bengali | 56 | 57 | 57 |
Bhojpuri | 2 | 2 | 2 |
Bishnupriya Manipuri | 85 | 86 | 86 |
Gujarathi | 19 | 19 | 20 |
Hindi | 42 | 43 | 43 |
Kannada | 54 | 55 | 55 |
Kashmiri | 12 | 12 | 12 |
Malayalam | 84 | 84 | 84 |
Marathi | 26 | 26 | 27 |
Odia (Oriya) | 2 | 2 | 2 |
Pali | 1 | 1 | 1 |
Punjabi | 16 | 16 | 16 |
Sanskrit | 4 | 5 | 5 |
Sindhi | 61 | 60 | 60 |
Tamil | 81 | 82 | 82 |
Telugu | 22 | 22 | 22 |
Urdu | 55 | 55 | 55 |
| | | |
Burmese | 67 | 67 | 67 |
Nepal Bhasha/Newari | 60 | 62 | 63 |
Nepali | 55 | 56 | 54 |
Sinhala | 78 | 81 | 81 |
The above table shows the percentage of articles with size more than 500 bytes.
More than 80 percentage of the articles in Tamil, Bishnupriya Manipuri, and Malayalam wikipedias have article size crossing 500 bytes. For some big wikipedias, more than 50 % of its articles have size less than 500 bytes; which reveals that many of those articles are not much help to the reader. The reader still need to depend on English Wikipedia or Google to acquire the required information. More effort is required from the respective wiki communities to make the Wikipedia useful to the respective language community.
Percentage of articles with size greater than 2000 bytes (2 kilobytes)
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 22 | 22 | 22 |
Bengali | 14 | 14 | 15 |
Bhojpuri | 1 | 1 | 1 |
Bishnupriya Manipuri | 1 | 1 | 1 |
Gujarathi | 5 | 5 | 5 |
Hindi | 9 | 10 | 10 |
Kannada | 15 | 16 | 17 |
Kashmiri | 5 | 5 | 5 |
Malayalam | 34 | 35 | 35 |
Marathi | 6 | 7 | 7 |
Odia (Oriya) | 1 | 1 | 1 |
Pali | 0 | 0 | 0 |
Punjabi | 8 | 8 | 8 |
Sanskrit | 1 | 1 | 1 |
Sindhi | 33 | 33 | 33 |
Tamil | 24 | 25 | 25 |
Telugu | 8 | 8 | 8 |
Urdu | 17 | 17 | 17 |
| | | |
Burmese | 38 | 39 | 39 |
Nepal Bhasha/Newari | 2 | 7 | 11 |
Nepali | 10 | 10 | 10 |
Sinhala | 53 | 53 | 53 |
The above table shows the percentage of articles with size more than 2000 bytes. Among active Wikipedias Malayalam, and Tamil tops the list. You can make your own analysis by comparing this table with the previous table.
Number of active wikipedians
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 1 | 1 | 1 |
Bengali | 25 | 35 | 32 |
Bhojpuri | 1 | 1 | 1 |
Bishnupriya Manipuri | 6 | 6 | 4 |
Gujarathi | 7 | 9 | 8 |
Hindi | 50 | 62 | 51 |
Kannada | 24 | 22 | 22 |
Kashmiri | 0 | 0 | 1 |
Malayalam | 56 | 50 | 65 |
Marathi | 22 | 25 | 36 |
Odia (Oriya) | 0 | 0 | 0 |
Pali | 0 | 1 | 0 |
Punjabi | 1 | 1 | 4 |
Sanskrit | 4 | 5 | 6 |
Sindhi | 1 | 0 | 2 |
Tamil | 45 | 55 | 53 |
Telugu | 38 | 34 | 26 |
Urdu | 24 | 20 | 20 |
| | | |
Burmese | 1 | 1 | 4 |
Nepal Bhasha/Newari | 3 | 3 | 3 |
Nepali | 8 | 5 | 5 |
Sinhala | 45 | 37 | 7 |
The number of active wikipedians is the strength of a wikipedia. They define the quality of a wikipedia. For that, the Wikipedia should have number of active Wikipedians corresponding to its total number of articles. Then only edits will happen in more number of articles. Hindi, Tamil, and Malayalam tops the list with most number of active wikipedians.
Page views per month (All figures in Lakh page views/month)
Wikipedia language | 2009 November | 2009 December | 2010 January |
Assamese | 0.87 | 0.93 | 0.86 |
Bengali | 22 | 28 | 24 |
Bhojpuri | 0.09 | 0.09 | 0.11 |
Bishnupriya Manipuri | 13 | 15 | 14 |
Gujarathi | 4.6 | 5.4 | 4.9 |
Hindi | 41 | 49 | 41 |
Kannada | 8.08 | 9.16 | 7.72 |
Kashmiri | 0.52 | 0.57 | 0.51 |
Malayalam | 27 | 28 | 26 |
Marathi | 23 | 28 | 24 |
Odia (Oriya) | 0.41 | 0.42 | 0.40 |
Pali | 0.85 | 0.83 | 0.82 |
Punjabi | 1.24 | 1.28 | 1.33 |
Sanskrit | 2.00 | 2.11 | 2.17 |
Sindhi | 0.57 | 0.61 | 0.54 |
Tamil | 24 | 26 | 24 |
Telugu | 37 | 41 | 33 |
Urdu | 11 | 10 | 10 |
| | | |
Burmese | 1.58 | 1.60 | 1.77 |
Nepal Bhasha/Newari | 13 | 14 | 15 |
Nepali | 1.44 | 1.47 | 1.55 |
Sinhala | 2.82 | 3.02 | 3.50 |
Number of page views represent how many times the readers/wiki editors have opened the wiki pages. This parameter is some what related to the number of articles in a Wiki. If there are more number of articles wiki visitors will also be more. Many of today's readers can be future wiki editors. This is one of the main reasons to increase the number of articles (of course with quality content).
Media Wiki Localization status (percentage)
Language | Most often used messages | MediaWiki messages | Extensions used by Wikimedia | All extensions |
Assamese | 98.08 | 43.83 | 1.86 | 1.61 |
Bengali | 100.00 | 82.36 | 46.09 | 22.25 |
Bhojpuri | 0.21 | 0.08 | 0.00 | 0.00 |
Bishnupriya Manipuri | 100.00 | 52.51 | 0.11 | 0.30 |
Gujarathi | 100.00 | 40.79 | 5.91 | 6.59 |
Hindi | 99.36 | 97.22 | 29.43 | 26.44 |
Kannada | 100.00 | 59.63 | 3.55 | 3.21 |
Kashmiri | | | | |
Malayalam | 100.00 | 97.90 | 98.00 | 51.77 |
Marathi | 98.72 | 75.88 | 26.19 | 37.13 |
Odia (Oriya) | 4.48 | 1.39 | 0.25 | 0.30 |
Pali | 0.21 | 0.08 | 0.00 | 0.00 |
Punjabi | 56.08 | 30.26 | 0.42 | 0.42 |
Sanskrit | 97.65 | 27.22 | 0.00 | 0.34 |
Sindhi | 73.13 | 24.91 | 0.11 | 0.07 |
Tamil | 92.32 | 74.71 | 1.02 | 1.77 |
Telugu | 100.00 | 100.00 | 65.41 | 52.57 |
Urdu | 71.64 | 38.77 | 1.75 | 1.12 |
| | | | |
Burmese | 29.00 | 10.45 | 0.07 | 0.02 |
Nepal Bhasha/Newari | 32.84 | 12.35 | 0.04 | 0.01 |
Nepali | 96.59 | 68.77 | 0.98 | 0.90 |
Sinhala | 100.00 | 100.00 | 28.59 | 20.06 |
GerardM a translate wiki administrator, has been passing the below message to most of the Indian language wikipedias. He has also send mails regarding this to wikimediaindia mailing list a couple of times.
We expect that with the implementation of Localisation update the usability of MediaWiki for your language will improve. We are now ready to look at other aspects of usability for your language as well. There are two questions we would like you to answer: Are there issues with the new functionality of the Usability Initiative Does MediaWiki support your language properly. The best way to answer the first question is to visit the translatewiki.net...
Localization of the Wiki software is very important when we are trying to reach to prospective Wikipedians in any language. A website with interface and all system messages in the local language has edge over a website with English only content among that particular language community. Here comes the role of Localization of MediaWiki software. We use translate wiki for coordinating the localization efforts of all the languages. Two administrators of translate wiki, Siebrand and GerardM are available in this list also.
Some of the Malayalam Wikipedians (including me) have understood the importance of localizing the Media wiki messages to Malayalam long back (more than 2 years ago). From the above table you can understand that Malayalam is in the forefront in localizing the mediaWiki and other related system messages. Now a days Malayalam Wikipedian Praveen Prakash is coordinating the localization efforts of Media Wiki messages for Malayalam.
I request the respective community to give top priority in localizing the Mediawiki messages to your language. When you do so you are localizing the interface and system messages to your respective language. Apart from helping the Wiki projects of your language you are also helping a native user to use Media wiki software with the native language support.
Nepali is spoken in India, see the Wikipedia article of Nepali.
ReplyDelete