Do tags work? Getting personal
Tom Foremski of Silicon Valley Watcher has an interview with Mike Lynch of Autonomy. They are a British firm that produces search software for use by big companies.
Lynch believes that tags are not working as a way of classifying information on the Internet, and is promoting a software product from Autonomy that he believes does a better job of choosing them than users.
Tags have become the default way of organising information, on sites like del.ico.us - a bookmarking service acquired by Yahoo!, last.fm - a popular music discovery website, and Google's GMail. They help solve a problem that is very pertinent to big media companies who are offering their archive online: If everything is available, how does someone find something they would like?
I'm not convinced that software from the likes of Autonomy can be the answer, because:
- The big advantage of tags is that they make sense and feel natural to us, the users who tagged them in the first place. If a computer does it automatically we'll lose that.
- If Autonomy is selling a product, it must be complex stuff, assuming lots of things for us. It is bound to make some of the same mistakes as the Amazon recommendation engine. I only purchased 10 self-help books to review them for work (honestly…). I do not need any more time management recommendations.
There must be some simple things that can be done to fix the problem.
I believe these start around making tags first and foremost about the user that tagged them, for whom the tags make good sense. del.icio.us does a pretty good job of this. Tags are shared across users, but it is my own tags for an item that I use to navigate my bookmarks.
Of course, the difficult problem is making tags useful on stuff we have not personally tagged with our own language. I think often this can be made easy too. It just requires the system to interpret between the names you personally use, and the names that others generally use. This is a common task, and we all do it every day in conversation. We might explain a comment from a workmate to a friend who does not understand the work jargon, or explain something about how the world works to a child in simple terms.
Web sites can often do this process in a very simple and understandable way. For example, imagine I have not tagged the TV series Desperate Housewives. I have tagged other TV shows like 24 and Lost as 'Entertainment'. Meanwhile, other users have generally tagged them 'Drama', a tag that I do not really use.
The website should maintain a simple dictionary of my language. The dictionary should record that 'Drama' in the language of the world is equivalent to 'Entertainment' in my own language. When I look at Desperate Housewives the website knows that in the language of the world this is 'Drama', but for me it can mark it as an Entertainment show.
The disadvantage of this approach is that it requires lots of data to be successful. There's a risk that without enough dedicated users the dictionary would never become useful. But big media companies are in a position to build this kind of scale, and could benefit from a personal tagging approach along these lines.
Posted at 20:45 BST, 6th April 2007.
Last changed at 21:13 BST, 6th April 2007.

8 Comments
Add a new comment.
It is possible to have communities give their tags meaning as well.
My annoyance with the flat-out folksonomy stuff is that it really is just the chimp-authored text. So there's no way to know if when we both mean the same thing when we use the same term.
Shared vocabularies, picked by users, backed by global identifiers (URIs) and thus potentially understandable give us much more of a leg-up. How we mint those URIs so we work in the same space – is quite hard, but worth doing.
There's some nice chatter about this in Semantic Web Revisited by Nigel Shadbolt, Tim Berners-Lee and Wendy Hall, as well as in the first issue of their new Web Science journal.
Folksonomies can adjust, get some semantic meaning out of actually having names (URIs) for things (concepts), and get a long way towards a meaningful, join-able Web. People who avoid doing that because they find any talk of ontologies too impractical have missed the point.
We've little chance of doing this at a Web-scale until individual sites we'll build that out of put their own data in order, and give tags like http://del.icio.us/tag/sweets some real meaning. At the moment del.icio.us' intro to tags says:
and
That doesn't look like a good route to meaningful data to me.
Also, the portion about automatically detecting copyright infringement is absolutely crack-fuelled. I'll buy that they can somewhat reliably determine that a piece of content contains a piece of another work.
I certainly don't believe they can figure out with a bot whether that use is infringing, though.
See the entertaining saga of Wendy Seltzer for a good example of why automatic bot-driven legal notices are a lousy idea.
I think there's a role for both the folksonomy (such as a vocabulary of tags that is made up as we go along) and formal ontologies (where each term has a clear single meaning and can be uniquely identified). The latter is important in the Semantic Web, as described and promoted by Tim Berners Lee. This uses URIs (the correct name for what we normally call a URL for each term as a way to distinguish it from other terms).
There's a good analogy with a library. Most users of a library are baffled by the Dewey Decimal system for organising the books on shelves. They rely on helpful signs put up by the librarian. "Cooking books this way" etc. That's the folksonomy, whereas Dewey Decimal is the Ontology.
Despite the popularity of the signs, we really need the formality of Dewey Decimal underneath the nice signs if the library is to run well. The librarian and some regular library users depend on it.
In the post I suggest that the folksonomy is a good idea for one person's tagging of items. It's a nice, user-friendly way for them to find stuff, and is a lot easier for a many user than understanding the common meanings of terms in a formal ontology.
The difficult bit is linking the meaning of each individual's terms up together, so that we can create 'dictionaries' of what each user really believes. This is where the ontology would be very useful - just as the Dewey Decimal system is very useful for the librarian in knowing where to put the cookery sign.
I don't know what the best way is to get a large number of people using the ontology. You're right that web innovators should not give up, they have an interesting challenge. I've not seen sites that make ontologies nice to use - even for the kind of person who knows their favorite Dewey Decimal codes.
For http://www.gmep.org.uk/ we wrote some software to manage the taxonomy mapping between each of the partner websites together. On the whole it worked out given the structured nature of the content, and the imposition of standard taxonomies on each of the partner sites. The number of differences were small.
I don't think that infering mappings would have worked, even in this structured environment. It seemed that there would be enough information to make the inference, even more than you are suggesting, but at the end of the day it's just easer to have a trained chimp make the mapping by hand with a very simple software tool.
I think it's a good point that the mapping can (and in a folksonomy-style world, should) be done by monkeys-at-keyboards. However, once you've actually put some common meaning on your tags, and really published that with some machine-readable structure, then you've opened the door to others running tools over your data. They can use it in ways you hadn't imagined, or planned for – serendipitous reuse, if you will.
Part of the trick as a firm who has a lot of this data internally somewhere, is figuring out how to make the leap to a link-friendly world, where you see benefit when others reuse your data.
For television content-owners, I expect that means trying to become the best, globally useful Web resource for each of their own episodes. Without that I can't see how they'll compete with usable, shiny borrow-recordings-from-your-friends systems.
Thanks for the comments.
David, I agree that getting this sort of thing working will be a challenge, but I think it is worth a try. Of course the mapping here does not need to be anywhere near 100% accurate to be of some use, whereas I imagine the classification of content on GMeP needed to be correct almost always. I'd not thought much about the involvement of humans in the matching process, but it I can see it could be helpful.
Given that the number of sources of the tags is going to be much greater than on the GMeP site I expect that this person would probably be a developer who could write rules for the matching software, rather than someone manually fixing the linkages. I do believe this should be possible without lots of complex software, otherwise we're back in the Autonomy world that I think we should be steering away from.
Ashok, I agree with you that getting machine-readable structure in place would be very valuable, and could lead to all sorts of benefits that we cannot see now - it's about doing things the way the web works best, rather than ignoring the strengths of the technology. I think we also agree that mapping folksonomies to common meaning is likely to be an important part of getting the common meaning adopted. I am guessing that making workable folksonomies is the first step, but with an eye to machine-readable common meaning later.
Yes, I believe it's possible to start in a somewhat folksonomy way, and then elevate them with meaning. But you will need to be planning for it, which to me feels more like the more structured approach, but with the power of the crowd doing the grunt-work.
It means holding back a little on things like "you can rename or delete them later". You also need to be quite careful that when two people use the same tag they either know that (and it's a positive thing) or you're capable of teasing them apart again later.
muie
si tie si mironositei.
Add a new comment.