Subscribe to
Posts
Comments

Google

Libraries and Google

Code. Photo: David de la Calle CerezoNot my usual fayre for this blog, but there is an excellent issue of the Journal: “Library Philosophy and Practice” available. Nearly the whole issue speaks about how Google can be coupled with traditional library services in a manner that benefits both (i.e., none of the hit and miss nature of Google, whilst enhancing the work of librarians).

There is also a very enthusiastic article about the benefits of Open Source and Open Access Journals to libraries. Wel worth a look

Code. Photo: David de la Calle CerezoThe BBC writes:

One in 10 web pages scrutinised by search giant Google contained malicious code that could infect a user’s PC.

That may seem a worryingly high proportion! Fortunately it is also nonsense.

Looking at the actual paper, it seems that Google in fact analyzed several billion pages, and sifted these with a preliminary analysis tool called MapReduce

“MapReduce processed all the crawled web pages for properties indicative of exploits”. As it says in the paper, “MapReduce allows us to prune several billion URLs into a few million”. This process left around 4.5 million pages that were likely candidates. Out of those, they found 450,000 pages that they were confident were correctly identified as malicious.

So it is not 1 in 10 pages. It is perhaps 450,000 pages in several billion. It’s more like 1 in 10,000.

The BBC failed to spot this - although perhaps that is not surprising, considering some examples of previous sloppy scientific reporting by the corporation (and most other media corporations of course).