Monday, August 8, 2011

Wikimania 2011 : Are Internet Sources Reliable?



Image source: Wikimedia


On August 4-6 I participated in Wikimania 2011 in Haifa! The participants were Wikipedian from many countries, including Wikipedia's founder Jimmy Wales and all of Wikimedia Foundation directors, except one.
The organizing team headed by Tomer Ashur, Wikimedia Israel's chairman and Deror Lin, Wikimania General Manager, achievements were exceptionally good both in the administrative aspects and the professional aspects. 


Wikipedia is evolving and changing. It is not exactly the same community it was when I described it in one of my Web 2.0 for Dummies posts in 2008. However, the challenges described few years ago in my post titled: Wikipedia: The Good the Bad and the Ugly did not vanish. 


Wikipedia case
The evolution and challenges of Wikipedia and Wikipedia's Community are very similar to challenges facing other Web 2.0 projects and communnities, including Social Networking Services. However, there are also unique aspects as well.  


Not all Internet Sources were created Equal
There is growing tendency in Wikipedia to ask for Sources supporting facts or so called facts, mentioned in Wikipedia articles. 


If no supporting sources are cited a big box stating that sources are missing is inserted under the article's title. The rational is to avoid of wrong information inserted deliberately or by mistake.


There are cultural differences between different languages in Wikipedia. The English Wikipedia is the most demanding supporting evidences (Sources). 


I doubt if relying on Web Sources is good enough. 
The reason for my doubts is based on a well known fact about Web Sources: The variability of the Reliability of the immense information in the Web is high. Some sources are trustful others are just nonsense.


So what is the Value Proposal of five or ten or fifteen sources if none of them is Reliable?


What is the Added Value of a Reliable but very superficial sources, such as some of the articles visible in some of the Electronic Newspapers ?  


The real problem is that in any Web related task the user has to sort them and assign to each source a level of Reliability. It is not an easy task. 
Many people are not able to perform this task properly if the subject matter is not included in their expertise.  


This observation is valid for Wikipedian as well as for non-Wikipedian. 


To illustrate this issue I wrote a new article in the Hebrew Wikipedia on an important Information Technology technical subject.


I was able to write a good article without using any source.


How was it possible?
My professional experience includes actual selections and implementations of technical products addressing this subject.


In order to do my job I read technical material and participated in Vendors' Presentations. I also read Analysts' Research Notes.
However, that was few years ago and I had no access to White Papers and Research Notes I read.


Finally, I decided to add only one Web based source. It was an article written in the English language by a University Professor considered as a leading expert or Guru in this area. 
Few years ago I read an impressive book he wrote on the same subject.


I also read the article on the same subject in the English Wikipedia. It includes about ten sources. However, none of them was a valid source (as Wikipedia articles are a collaborative work it may be now a higher quality article including valid sources).


I read all sources cited in the English Wikipedia and decided that none of them is good enough for citation.    


The Bottom Line: Not the best article based on multiple non-trusted sources in the English Wikipedia, edited by somone with no expertise in the subject and no ability to assess sources Reliability.
A lot better article in the Hebrew Wikipedia based on expertise and experience and a single trusted source.


Is there a better way?
I am sure that it is possible to partially formalize an algorithm for assigning Reliability score to Web sources.


In my opinion, it is also possible to automate the algorithm by dedicated software.
Automation will probably be partial automation and the heuristic algorithm will surely not be 100% precise.


Rating of Academic sources is possible and rating of Search Results by PageRank algorithm and other algorithms is also possible. So why not automating a similar task such as rating of sources for Wikipedia articles (and may be generalizing it to other contexts requiring relying on Web Sources)? 


I guess that in addition implementing a Bayesian Probability algorithm for improving the rating of sources could be deployed as well.      






No comments:

Public Cloud Core Banking: Hype or Reality? - Revisited

  More than 4 years ago I was asked if Public Cloud Core Banking is a Hype or a Short Term Reality? If you had read the post, you would prob...