Subscribe to
Posts
Comments

Web

have you ever heard of the Happy Endings Foundation? Well neither had I until Sunday morning when I saw on the BBC news an article about their campaign to ban books with sad endings for children.

The BBC paid for two experts - a child psychologist and someone else - to come into the studio and pontificate on how this campagn was misguided. What a pity though that these experts, and the BBC news researchers (and the Daily Mail, who were also taken in) were not so expert in the realm of critical thinking.

The first clues that all was not well could be found in the web site itself. Rewrite Lemony Snicket? Are you allowed to do that? Why would you want to? Also the BBC admitted they could not actually contact anyone from this web site to come on to their programme.

Also, what an odd list of books they were saying had happy endings.

But one skill that should be taught to school children up and down the country when we teach them basic IT skills is how to find out who pubilshed a web page. This is not actually very hard. Point your web browser at any of the whois services, but I particularly like this one:

http://www.whois.sc/

Look down the page for the registrant details. In this case we have a registrant as follows:

Registrant Name:Peter Rope Registrant Organization:ArtScience

Usually this is enough (and it is here, if you know who ArtScience are - a promotional company trying to promote Lemony Snicket!) but this whois search has a great feature. It will do a reverse domain lookup to find out what other web sites are hosted on the same web server as this one. In this case we find:

Artscience.net Artscienceclick.com Charlie-bone.com

And a quick click on any of these links will quickly show you that these people are in the business of marketing children’s books.

The BBC was duped by marketeers. I hope an apology will follow for using license payers money to advertise someone’s books.

The Daily Mail was also duped - but that is par for the course.

But the BBC really should know better.

Code. Photo: David de la Calle CerezoThe BBC writes:

One in 10 web pages scrutinised by search giant Google contained malicious code that could infect a user’s PC.

That may seem a worryingly high proportion! Fortunately it is also nonsense.

Looking at the actual paper, it seems that Google in fact analyzed several billion pages, and sifted these with a preliminary analysis tool called MapReduce

“MapReduce processed all the crawled web pages for properties indicative of exploits”. As it says in the paper, “MapReduce allows us to prune several billion URLs into a few million”. This process left around 4.5 million pages that were likely candidates. Out of those, they found 450,000 pages that they were confident were correctly identified as malicious.

So it is not 1 in 10 pages. It is perhaps 450,000 pages in several billion. It’s more like 1 in 10,000.

The BBC failed to spot this - although perhaps that is not surprising, considering some examples of previous sloppy scientific reporting by the corporation (and most other media corporations of course).

ChaCha Search Guides

There is a new search engine around that offers a human guide to help you with your search. The company is betting that revenue from the advertisements on the site will pay for these guides… I’m not convinced, but it must be worth a look.

Take a look at the ChaCha site. If you have a difficult search to make (and you are frustrated with all those Wikipedia links that keep obscuring the good stuff), then ChaCha may be the search engine for you.

For other stuff about search engines, look at my Really Useful Search Engines post.

I have just noticed that since I wrote my XML code samples, with some rather long lines, Internet Explorer has started floating the right hand sidebar of this site down to the foot of the page.

Not strictly a bug. It must be a font size thing. But for best results, use any browser except Internet Explorer :)

Internet Explorer Only Sites

It must be something about vehicle manufacturers, because they seem to be some of the worse offenders for this kind of nonsense:

Upgrade your browser This website does not support your current Browser version. For the best experience of this website we recommend you to use Microsoft Internet Explorer 5.0 and above.

Last time I had one of these, it was when looking up information about Renault cars before buying a new vehicle. We did not buy Renault. We did not even bother go to a Renault show room.

This time it is Scania (don’t ask!)

Now I browse the Internet with either Safari or Firefox. Both are far better browsers than Internet Explorer. Occasionally I use text browsers or browsers on handheld devices, and these stupid sites break with these browsers too.

Now if everybody wrote sites with an eye on web standards, then we would all be much better off, and we would not have brain dead retailers trying to persuade us to infect our machines with Microsoft’s dangerous products, telling us we must “upgrade” to this rubbish before they are willing to sell us their products.

The Association for Computer Machinery (ACM) tells us it is the world’s first educational and scientific computing society. This is a society of computing professionals, who have formed - among other things - the ACM Special Interest Group on Computer Science Education.

It is therefore somewhat disturbing that their web site can be hacked. Today the SIGCSE site is displaying one of those “I own your site” messages beloved of website vandals.

But just as disturbing are the images the site is portraying - of Israeli girls signing bombs shortly before they are fired at Lebanese civilian targets.

Of course, what this hacker does not mention is that this idiocy flows both ways. As long as people teach their children to hate their neighbours there can never be peace in the Middle East or anywhere else.

So the third disturbing thing about this website hacking incident is that I can see no difference between the heart attitude of the attacker and that of the girl writing on the rocket (or the soldiers who let her do so). All that is different is that one side has weapons supplied by a rogue state, and the other side has arms supplied by Syria.

According to BBC NEWS, MySpace, the world’s most popular social networking website, has been shut down after a power outage.

My own web service has been patchy recently, with a number of short breaks in service. Techost have agreed to migrate me to a new server.

I wonder if MySpace users will do the same. There are other sites which produce far more useable web pages.

For such a popular service it seems odd that there was not more redundancy built in.

Really Useful Search Engines

One would be forgiven for believing that there are no search engines left on the Internet other than Google. No longer do we search for information, we “google it”. Google, through a clean interface, stunningly good technology and a novel search strategy has rightly become the search engine of choice for… well just about everyone.

That is not necessarily a bad thing, but we should understand that there are other ways to index and access information that are better in some circumstances.

Google’s search strategy is a popularity contest. Well linked sites score well. Poorly linked sitres score poorly. New sites, however relevant, need to encourage many people to link to them before they gain visibility in Google. This has also led to a whole new type of spam which has forced bloggers to enable moderation of comments or clever spam filters. For instance, on this site we see bunches of comments that look something like the following (although usually more sexually explicit):

Interesting site. For the best information on Wales, look here: Wales For the best information on a Tour of Wales, look here: Tour Wales

Usually the links number 10 or 15. This is just an attempt to spam the Google index - and unfortunately it works. Unfortunately because it encourages more of this anti social behaviour.

So what can we do? Well there are other search engines, but we need to know how the search engines work before we understand which tool is best for which job. Let me highlight just two:

  1. Ask.com uses Teoma. The indexing algorithm on Teoma reverses that of Google. Rather than counting the number of people linking into a site, Teoma indexes based on outward links, and what the outward links link to. Thus for example, an academic site might about snake venom might link to other sites about snake venom. A search on snake venom may bring up the academic sites first, regardless of attempted google spamming.

Of course Teoma is not immune to spamming. One could build a site with many external links and then populate it with rubbish. But this is an alternative strategy at least.

  1. Technorati is a search engine that searches the blogsphere. It specifically searches weblogs such as this one and other less pretentious and perhaps more up to date ones, and when there is new news about, it is the quickest way to trak down current information. Other search engines can take days to index information, but technorati will index blogs much more quickly. Of course, if you are searching blog space then you are looking at a bunch of biased sites by definition - but its worth knowing about.

I could mention more - maybe I will in another post, but in the meantime let me ask you: What is your favourite (non Google) search engine? Why? and do you know how it creates its index? Please add your comments.

Google Maps

I’ve been playing with the Google Maps API.

The Google Maps API allows the inclusion of google maps on your own pages, and you can use the Asynchronous JavasScript and XML (AJAX) to interact with the maps. It is all very interesting, and I want to do more with this - not least because of the seamless fusion between GIS data (of which I have a fair amount) and the web.

I have previously used MAPServer, but whilst google maps does not have all the feautures of MAPServer, it odes come with a complete set of maps included!

So here is the problem. At www.root-servers.org you will find a complete list of the 13 DNS root name servers that make all other name service lookups work. But where are these root name servers?

Well this is not accurate to the street level, but I used the information on the site to geocode locations for the root name servers, and you can view them at myRoot Nameservers page.

In the last few years there has been much experimentation and roll out of IPv4 Anycast Services to clone the functionality of these thirteen key root name servers. This reduces the clustering of all the vital name servers around the Washington DC area, and provides faster lookup to localities that were historically far removed from the nameservers. I again geocoded data from root-servers.org to come up with this page of the current location of all Worldwide Root Nameservers.

Let me know what you think.

Spammers

It is a basic Internet truism that all spammers are scumbags. Spammers make the Internet less pleasent. Spammers make it hard to allow children to use email without fear that they will be deluged with material inappropriate for their age. Spammers lie. Spammers charge you for their advertising.

None of this is rocket science. If every spammer suffered the kind of violent death meted out to Guy Fawkes and his co conspirators, the world would probably be a happier and more just place :)

But what gets me is the morons who, having been spammed by someone, pass the information on to me, requesting that I consider purchasing some web advertising service or search engine optimisation.

I mean, we can accept that there will always be some low life scum who suffer from the delusion that the pursuit of money by any means is an end in itself. But why are there people who think a service offered immorally is worth considering if their service might be useful? It is people like these who give the spammers the oxygen that allows them to exist.

So if you know someone who is considering buying a product or service from a spammer - tell them - “just say no”. That way lies destruction.

Next »