Information retrieval has three definitions depending on your viewpoint as either a user, a system or a source. A user typically has inadequate knowledge on the subject they are searching for, and hence seek to retrieve information through a search request to enlighten them. A system stores information, processes it and makes it available for retrieval through software and hardware. it is the technology that allows the user to search how they want to. A source is the document that contains the information we wish to retrieve, it has an intended purpose and audience. Information is a valuable commodity which is ripe for exploitation: it can be bought and sold as a service.
Information retrieval on the internet occurs whenever we make a web search (we want to find some information online). Broder (2000) conceived a taxonomy for web searching by looking at the different types of query we make:
- Navigational queries (e.g. finding the home page for a company when you don't know the precise URL)
- Transactional queries (e.g. a mediated activity, such as purchasing a box of fudge)
- Informational queries (e.g. finding information on a particular subject, such as what is available or how to do something)
- Known-item retrieval i.e. the user knows the exact item necessary to satisfy their informational need (e.g. a particular movie or video hosted online)
- Fact retrieval i.e. the user knows what they want but do not have the precise information in order to fulfill their need. (e.g. which actor played a certain part in a particular movie)
- Subject retrieval i.e. the user is looking to a subject, which is not precisely defined (e.g. the most memorable deaths in horror films)
- Exploratory retrieval i.e. checking out what data is available for a provided selection (e.g. searching for classical music on iTunes)
0. identify the fields you wish to make searchable in the index (e.g. the most memorable parts of the
document which are typically searched for, such as title, author, year etc. This allows for highly
accurately, focused searching to be carried out)
- identify words that will act as keywords for a search procedure, which will be those terms or phrases that are likely to be searched by the user. A consideration of whether digits, non A-Z characters will be included or excluded needs to be undertaken. Keeping the keywords in lowercase will yield more accurate search results.
- remove stop words such as and, the.
- stem words, by cutting off the suffix to allow for wider searching of a concept or term e.g. act! would bring up results for acting, actors, actions etc.
- define synonyms, i.e. different words that have the same meaning.
Index structure in place ... we can now search! Search models for information retrieval include boolean connectors (AND, OR, NOT), proximity searching (within same sentence, paragraph or phrase; word adjacency), best match results generated through ranking systems built into search engines such as Google, and simply browsing the internet (bypasses any indexes in place).
Should the preliminary search fail, we can then try a manual query modification (by adding or removing terms from the initial search query) or try an automatic query modification such as a 'show me more on this topic' which is provided for you by the search engine.
Once you have conducted a search, how do you determine how relevant the results are? You need to evaluate it.
It can be done qualitatively, from a user viewpoint (was that user was satisfied with the search results) or a sources viewpoint (how much should the user be charged for search services providing relevant results)
It can be done quantitatively from a systems viewpoint, by which we can evaluate the retrieval effectiveness and efficiency by calculating precision and recall respectively:
Precision = the proportion of retrieved documents that are relevant
= relevant documents retrieved
total documents returned
Recall = the proportion of relevant docs documents
= relevant documents retrieved
total number of relevant documents in the database
The practical lab session allowed us to explore the exercise of information retrieval by using two internet search engines, Google and Bing, to search for a variety of information by making search queries, then calculating the precision and recall of each engine. Because we are already well versed in searching the internet, and because I already used advanced search models such as boolean connectors for online searching, I was able to find relevant searches efficiently. The session as a whole however reinforced the need for well structured index structures and precise searching models to be in place if we are to retrieve information that is relevant to our needs at the time we need to access it.
No comments:
Post a Comment