Showing posts with label internet. Show all posts

Wednesday, 4 January 2012

DITA coursework blog 2 - Web 2.0/3.0 and beyond

The URL of this coursework blog is http://library-liam.blogspot.com/2012/01/dita-coursework-blog-2-web-2030-and.html

Title: Library 2.0 colliding with the semantic web: an iceberg effect?

1. Introduction

The integration of web 2.0 technologies into online library portal interfaces is changing how library users access and interact with information sources; but there is a real danger that user input will create ‘information icebergs’ in which the volume of information expands rapidly with a build-up of a small amount of ‘searchable’ user-centric ontological metadata that ‘freezes’ above the useful primary data, pushing it below the surface where it lurks unseen and inaccessible without deep and effective search retrieval. The library’s raison d’être for making information transparent and discoverable to all users is seemingly lost in the process. The iceberg effect the writer speaks of here is akin to the concept of web 3.0 (the semantic web) which threatens to sink the traditional notion of librarianship.

A comparative analysis of how Web 2.0 technologies have been incorporated into the user interfaces of academic, private sector and public library services, and how the idea of semantic web strategies could apply to electronic library services in the future will allow the writer to consider whether such technologies are truly compatible or not, or if they ever will be.

2. Library 2.0?

Library 2.0[1] appears to be the next logical evolution of library services in our increasingly digitised online society, and seemingly follows the same trajectory path as the platform (the World Wide Web) that facilitated and now realises the concept. If Web 1.0 first allowed machine-readable data to reach the first online generation as digitally presented information, and Web 2.0 permitted the same, albeit maturing, users to read and write their own data to either add to, edit or remove online information, then it is predicted that the semantic web (Web 3.0) will allow the future generations of internet users to read, write and execute information through provision of user-created ontological metadata to bolster the artificial intelligence of computer servers in performance of our fine-tuned information needs.

Library 2.0 “simply means making your library’s space (virtual and physical) more interactive, collaborative, and driven by community needs.”[2] Libraries allow access to information collected through their materials, traditionally through read-only catalogues, and now provide online portals that add-value to the same information by encouraging user participation and feedback with those resources through integrated web 2.0 technologies. Brophy (2007) argues that the ‘long tail’[3] effect, that of more and more users joining up to a service provided over the same network thus adding value for each user, is how the idea of web 2.0 marries with digital libraries in the creation of ‘Library 2.0’.

3. Information needs

One function of web 2.0 technologies is to allow for the personalisation or customisation of information, and this is dictated by the information needs of the user - its appropriateness will depend on the nature of the library itself.[4]

Catalogues and databases are the entry portals for library users who have specific information needs. Web 2.0 can help satisfy needs by directing users to search catalogues in different ways. Computers can analyse the frequency of search terms and relate them to items accessed on the catalogue. Every item is catalogued under a specific subject and relational links are then made by the software. Tagging clouds, as used on the City University library catalogue[5], show the popularity of interrelated search terms or subjects after you make an initial search: the larger words present the most popular and therefore most seemingly relevant related searches. This collaborative employment of web 2.0 technologies and user input creates a knowledge base of information that is realised and actioned through social web interaction by writing those needs as the metadata into the software, and seemingly is the way forward. Truly speaking, “knowledge – its content and organization – is becoming a social act”.[6]

4. Digital representation and organisation

Libraries have always traditionally maintained a role in selecting, indexing and making available information resources to their users: the form in which it is now represented and organised (digital data) has evolved with technological innovations, but is limited by new constraints.

Modern libraries, hosting OPACs (online public access catalogues), now adhere to MARC formats when cataloguing which are “standards for the representation and communication of bibliographic and related information in machine-readable form.”[7] MARC signified the introduction of library catalogue records as binary code, which allows for the composition of metadata through XML schema particular to that bibliographic information. This in turn allows for implementation and integration of the data through web-hosted services such as application programming interfaces (APIs) and mash-ups[8], and consequently leads into user-manipulation activities such as personalisation and customisation.

An interface that distracts or deviates from its purpose (accessing information) because it makes navigation cumbersome or impracticable; or where it does not limit the amount of information[9] on the screen by editing the content (remedied by using markers such as hyperlinks or concertinaed drop-down menus) is an indication that the designer has not understood the relevant user information needs.

The ideology of short-of-time library ‘browsers’ also manifests itself in the online library environment – functional design of library service interfaces enables the optimal representation of information in digital form although it must be clearly signposted and transparent in order to satisfy user needs efficiently.

5. Real life ‘Library 2.0’ manifestations

The widespread of ‘Library 2.0’ can be seen across all library sectors, and present their users with radical options in accessing information and services virtually.

In public libraries, there has been a slowly expanding realisation of web 2.0 technologies outside of the conventional online library OPACs. Pinfield et al (1998) have identified the current model to be that of the “hybrid library”: a combination of a digital library within the operation of a traditional physical library, illustrated by community library services[10] now providing an e-library in which selected digital material can loaned over the internet.[11]

In private sector libraries, such as law, Web 2.0 has been used as a tool for marketing and promoting library services to users.[12] Law is a subject which is constantly evolving and information sources very quickly become out-of-date. Inner Temple Library host a Current Awareness blog[13] which provides “up-to-date information on new case law, changes in legislation and legal news” in the form of hyperlinks to online information resources. This information is also pushed through to users signed up to the Twitter feed[14] or Facebook page[15] and the blog is further supplemented by applications such as buttons to subscribe to RSS feeds, mailing lists and hyperlinks to the organisation’s online presence on other websites. Podcasts[16], instant text messaging and online chat enquiries[17] are other examples of how web 2.0 is being integrated into libraries, however Brophy (2007) notes that there is still a long way to go and in particular that “most library applications do not yet support conversational use and data sharing, including the support of individual and group activity” (at p.143).

6. Semantic library portals

The implementation of web 3.0 as semantic technologies for use in libraries has not yet been fully realised, but arguably this is not far off.

Chowdhury and Chowdhury (2003) support the use of semantic web technologies in the development of digital libraries, and argue that the seamless access to digital information resources “calls for a mechanism that permits analysis of the meaning of resources, which will facilitate computer processing of information for improved access” (at p.220).

Such a mechanism is arguably akin to a semantic information portal, as advocated by Reynolds, Shabajee and Cayzer (2004) which serves as “... a community information portal that exploits the semantic web standards to improve structure, extensibility, customization and sustainability” (at p.290).

The Inner Temple Library website, in providing links to a number of databases, blogs and other websites on one page, appears to be the base upon which to build this approach although elements of web 2.0 interactivity, in which comments can be added and attached links shared through social media website accounts, are currently missing and its usefulness and purpose in that context is still questionable as mixing work with pleasure on social networks remains a contentious area for most.

7. Below the surface – hidden depths and dangers

Digital information representation and provision needs to be transparent, and information searched for, retrieved for an accessed over a dynamic platform such as the World Wide Web[18] becomes a murky prospect to successfully accomplish.

Baeza-Yates and Ribero-Neto (2011) identify the main challenges posed by the internet in terms of searching for information which include an unbalanced distribution of data across limited bandwidth and unreliable network connections, harbouring inconsistent access due to changing file destinations and broken links; and a semantic redundancy of semi-structured data where the same HTML pages are constantly replicated through searching - added to the problems inherent in self-published information of questionable integrity being in different formats and languages.

All the above factors arguably affect the recall (if the amount of searchable information increases although with the number of data formats) and precision (if the search results present data which is not of high integrity) of information search retrieval results.

Baker (2008) identifies the hurdles to clear as the “need to develop shared and common standards, interoperability, and open access using the same concepts and terminologies and on an international scale” (at p.8).

Libraries in particular, as trusted information providers in their community, should seek the correct balance in maintaining their editorial control[19] to preserve data integrity, whilst listening to their user base in developing search strategies, using metadata to enrich their catalogue records to satisfy their information needs by reducing search time and producing highly relevant results.

8. Conclusion

To return to the ‘information iceberg’ analogy advocated in the introduction, the problems in providing discoverable information appear firmly connected to the widening of the ‘semantic gap’, which increases as more and more information is uploaded online without first being catalogued or indexed, creating an insurmountable iceberg of older unclassified information that sinks down under the surface as more and more user-generated data (and metadata) is populated online, although authenticity and reliability of such data immediately comes into doubt as it lacks authority.

In addition, search engines will need to deal with all the content afforded by web 2.0 technologies through HTML pages which are dynamically generated and therefore inherently complex”.[20] New technological advances therefore create new problems to overcome.

Van Dijk (1999) perhaps gives a stark warning as to the future of digital information caught up in Web 3.0 in foreseeing the risk and consequences of over-reliance of information agents as the semantic solution:

“Systems get smarter, but their users might become stupider” (at p.185).

Whilst computers can adapt to user preferences, they cannot react to changing human values and emotions and cannot be completely pre-programmed. Over-reliance on information devices can isolate the user from the real world and therefore miss out on interactions and opportunities one can only obtain from human contact. Traditional libraries and their human agents, in providing and supporting the resources necessary in user information searches face-to-face within defined real-world environments, avoid these problems instantly.

The information iceberg cannot be left to expand without human observation or we risk losing control, order and sight of the value of digital information.

References

[1] The term ‘Library 2.0’ was first introduced by Michael Casey in his blog entry titled ‘Librarians without Borders’ dated 26^th September 2005. See: http://www.librarycrunch.com/2005/09/

[2] As defined by Sarah Houghton-John in her blog entry titled ‘Library 2.0: Michael Squared dated December 2005. See: http://librarianinblack.net/librarianinblack/2005/12/library_20_disc.html

[3] Based in ideas propounded by Chris Anderson in his blog entry titled ‘The Long Tail’ dated October 2004. See: http://web.archive.org/web/20041127085645/www.wired.com/wired/archive/12.10/tail.html

[4] Coles (1998) identified that the information needs of users between public and private libraries are not the same.

[5] See ‘refine by tag’ window appearing on the right hand side of the screen after a user search is made.

[6] Weinberger (2007) at p. 133

[7] http://www.loc.gov/marc/

[8] A good example of a mobile mash-up application is Library Anywhere, used by City University Library, which provides access to library catalogues and user accounts via a smart phone screen: see http://libguides.city.ac.uk/content.php?pid=234596&sid=2157583

[9] Chu (2010) states the essential problem in information representation and retrieval to be “how to obtain the right information at the right time despite the existence of other variables in the [...] environment” (at p.18).

[10] The writer’s local example: Hertfordshire County Council – online library services: http://www.hertsdirect.org/services/libraries/online/

[11] Herts e-library service: http://herts.lib.overdrive.com/8F191FBA-0D95-4AA6-915E-691A653360D5/10/491/en/Default.htm

[12] Harvey (2003) at p. 37 notes that current awareness blogs can be used to remind users of [information] services they may have been previously unaware of, and also allows for innovation on the part of the blog writer in developing and improving those services.

[13] http://www.innertemplelibrary.com

[14] https://twitter.com/inner_temple - Twitter username: @Inner_Temple

[15] http://www.facebook.com/innertemplelibrary

[16] Such as that provided by the British Library, see: http://www.bl.uk/podcast

[17] For example, 24/7 live chat communication with a librarian is provided through the government supported People’s Network website: http://www.peoplesnetwork.gov.uk/

[18] Baeza-Yates and Ribeiro-Neto (2011) notes this to be “chaotic and unstructured, providing information that may be of questionable accuracy, reliability, completeness or currency” (at p.685).

[19] See Weinberger (2007)

[20] Baeza-Yates and Ribeiro-Neto (2011) at p. 450.

Bibliography

Anderson, C. (2004) The Long Tail, Wired (blog), [online] available at: http://web.archive.org/web/20041127085645/www.wired.com/wired/archive/12.10/tail.html [accessed 20^th December 2011]

Baeza-Yates, R., and Ribeiro-Neto, B. (2011) Modern information retrieval : the concepts and technology behind search. 2^nd ed. London: Pearson Education.

Baker, D. (2008) From needles and haystacks to elephants and fleas: strategic information management in the information age, New Review of Academic Librarianship, 14: 1–16 [online] via LISTA, accessed 31^st October 2011.

British Library website, Podcasts, [online] available at: http://www.bl.uk/podcast [accessed 20^th December 2011]

Casey, M. (2005) Librarians Without Borders, [online] available at: http://www.librarycrunch.com/2005/09/ [accessed 20^th December 2011]

Casey, M. and Savastinuk, L.C. (2006) Library 2.0: service for the next generation library, Library Journal, [online] available at: http://www.libraryjournal.com/article/CA6365200.html [accessed 20^th December 2011]

Chowdhury, G.G and Chowdhury, S. (2003) Organizing information - from the shelf to the web. London: Facet Publishing.

Chu, H. (2010) Information representation and retrieval in the digital age. 2^nd ed. Medford, New Jersey: Information Today, Inc.

City University Library website, [online] available at: http://www.city.ac.uk/library/ [accessed 16^th December 2011]

City University Library – LibGuides – Mobile Devices webpage, [online] available at: http://libguides.city.ac.uk/content.php?pid=234596&sid=2157583 [accessed 31^st December 2011]

Coles, C. (1998) Information seeking behaviour of public library users: use and non-use of electronic media. In: Wilson, T.D. and Allen, D.A., ed. 1999. Exploring the contexts of information behaviour. London: Taylor Graham Publishing, 321-329

Current Awareness from the Inner Temple Library website, [online] available at http://www.innertemplelibrary.com/ [accessed: 7^th December 2011].

Harvey, T. (2003) The role of the legal information officer. Oxford: Chandos Publishing.

Hertfordshire County Council – Libraries website, [online], available at: http://www.hertsdirect.org/services/libraries/online/ [accessed 31^st December 2011]

Herts e-library service website, [online], available at: http://herts.lib.overdrive.com/8F191FBA-0D95-4AA6-915E-691A653360D5/10/491/en/Default.htm [accessed 31^st December 2011]

Houghton-John, S. (2005) Library 2.0 Discussion: Michael Squared, Librarianinblack (blog), [online] available at: http://librarianinblack.net/librarianinblack/2005/12/library_20_disc.html [accessed 31^st December 2011]

Inner Temple Library Facebook page, [online] available at: http://www.facebook.com/innertemplelibrary [accessed 31^st December 2011]

Inner Temple Library Twitter page, [online] available at: https://twitter.com/inner_temple [accessed 31^st December 2011]

MARC STANDARDS: Library of Congress – Network Development and MARC Standards Office website, [online] available at: http://www.loc.gov/marc/ [accessed 31^st December 2011]

People’s Network – online services from public libraries website, [online] available at: http://www.peoplesnetwork.gov.uk/ [accessed 31^st December 2011]

Pinfield, S., Eaton, J., Edwards, C., Russell, R., Wissenberg, A., and Wynne, P. (1998) Realizing the hybrid library, D-Lib Information Magazine, October 1998, [online] available at: http://www.dlib.org/dlib/october98/10pinfield.html [accessed 19^th December 2011].

Reynolds, D., Shabajee, P and Cayzer, S. (2004) Semantic information portals, [online] available at: http://www2004.org/proceedings/docs/2p290.pdf [accessed 7^th December 2011].

Van Dijk, J. (1999) The network society : social aspects of media [translated by Leontine Spoorenberg]. London: Sage Publications.

Weinberger, D. (2007) Everything is miscellaneous: the power of the new digital disorder. New York: Times Books/Henry Holt and Company.

Thursday, 27 October 2011

DITA coursework blog - Web 1.0 (the internet and WWW, databases and information retrieval)

The URL of this coursework blog is http://library-liam.blogspot.com/2011/10/dita-coursework-blog-web-10-internet.html

Title: Language and access in Digital Information Technologies and Architecture, with a focus on law libraries

1. Introduction

An underlying principle of digital information is that it is data which must be written in a specific language so that it can be stored in sources, communicated by systems and retrieved by users. Once this is achieved, access to data must be managed using appropriate technologies. I will consider this statement in the context of modern law libraries to assess the present and future impact on the provision of digital resources to their users.

2. Evaluating

Digital technologies must take into account the information needs of library users, who in today’s digital age, most commonly seek information from online subscription databases and web resources. Sources of information in law libraries are typically law reports, journal articles or legislation: predominantly accessed as either printed or digital text based information. The latter must be in a specified format in order to be read: it is data attributed a form capable of precise meaning through logical coding and sequencing – in essence a ‘language’.

Computers are system linguists which communicate data over connected networks (the internet) via a service (the World Wide Web). Computers read and interpret data in binary form: bits are assigned characters and form words as ASCII text; and collected together, they create files which make up documents, such as database records or web pages. Human users are only able to subjectively evaluate text for meaning and relevance in a form they understand. Computers do not understand “human” language, and so evaluate the language within the data: metadata. Hypertext is a language used to inter-link data in one document, or link data between documents. Web pages are written in Hypertext Mark-up Language (HTML) so the data can be read by internet browsers, which interpret metatags (ordered ASCII text relaying strict instructions on layout and structure) as distinct from standard ASCII text.

The advent of e-books has seen a shift towards digital readership, where books translated into ASCII text can enjoy wider distribution to library users over the internet. This indicates the future of how libraries will provide materials to their users; but issues of cost, reliability and user misgivings on rapid technological advancement still impact on access.

3. Managing

Managing data at core is concerned with providing users with access points. There are two sources of digital information available to library users: internal (databases) and external (the internet).

Databases organise and order available data in accordance with the user’s information needs, a primary example being an OPAC catalogue of a library’s holdings. Language is the control. Structured Query Language (SQL) commands relational databases to perform queries to retrieve selective data from a number of interrelated data tables.

Databases permit searches by two methods: natural language and controlled vocabularies. If the natural language search terms are not clear, or irrelevant search results are returned, the user may deploy query modification to adjust the language used and yield better results. Controlled vocabularies such as indexing and thesauri may signpost users in context to data that may or may not be relevant. We should expect more relevant database search results than compared to say an internet search engine's results, permitting that the data is there to be retrieved.

Libraries can combine access to both databases and the web concurrently to permit wider scope for information retrieval. Brophy (2007, p.113-4) sees an importance of use behind the access and retrieval process, thus directly linking users to resources. He also implies that use involves the creation of “information objects of various kinds”. A library portal, such as created by the Inner Temple Library[1], is a good example of this – it is an online access point to a number of databases, together with hyperlinks to web resources including a subject index and current awareness blog. Maloney and Bracke (2005, p.87) emphasises that this “is not a single technology. Rather it is a combination of several systems, standards and protocols that inter-operate to create a unified experience for the user”. This means of federated searching[2] is emerging as a possible solution to remove the complexities of cross-searching multiple databases.

Information retrieval over the web is a double-edged sword: on one hand there is a wealth of dedicated resources available online; however an inexpert user will only ever retrieve a small percentage of relevant data from this due to the “invisible web”[3]: a detrimental consequence of a global resource that is dynamically evolving, but where authenticity and permanence is compromised as more and more information goes online. Limb (2004, p.60) believes this could be combated by building federated repositories to harvest in a wealth of relevant cyber resources, but the task may appear onerous and unmanageable.

4. Conclusion

The communication chain between users, systems and sources is dependent on the efficient and concise use of language in order to access and retrieve data. A break in the chain, such as incomplete HTML code or a broken hyperlink, can shutdown access to information, leaving the information seeker locked-out. The architects of the computer systems dictate the choice and methods by which data is represented, but as non-subject specialists, they may not understand the information they give access may not fulfil the user’s needs. A compromise perhaps should be reached.[4]

Recent developments such cloud sourcing[5] look set to change how society store and access digital information, in that information users can retrieve documents via the internet without prior knowledge of where the source document is physically rooted. It appears cloud sourcing makes the service, the source.[6]

I cannot see how law libraries could happily subscribe to these developments: information retrieval is too deeply rooted in specialist knowledge and language coupled with the need for reasonable proximity between the user and their sources. As technologies enable information to become cheaper to produce and maintain; the information is more eagerly consumed by non-experts who have inexpert skill and knowledge in accessing and evaluating relevant information.

The legal information professional, acting as the bridge between users, systems and sources, therefore remains crucial to the information access and retrieval processes.

Bibliography

Brophy, P. (2007). The library in the twenty-first century. 2^nd ed. London: Facet Publishing.

The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011).

Maloney, K. & Bracke, P.J. (2005). Library portal technologies. In: Michalak, S.C., ed. 2005. Portals and libraries. New York: The Haworth Information Press. Ch.6.

Limb, P. (2004). Digital Dilemmas and Solutions. Oxford: Chandos Publishing.

Pedley, P. (2001). The invisble web: searching the hidden parts of the internet. London: Aslib-IMI.

Harvey, T. (2003). The role of the legal information officer. Oxford: Chandos Publishing.

Géczy, P., Izumi, N. and Hasida, K. (2012). Cloudsourcing: managing cloud adoption. Global Journal of Business Research, 6(2), 57-71. (accessed: EBSCOhost - 25th October 2011.)

References

[1] The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011)

[2] See Limb (2004, p.59).

[3] For further discussion, see: Pedley (2001) The Invisible Web: Searching the hidden parts of the internet. London: Aslib-IMI.

[4] See Harvey (2003, p.143-6) for a persuasive discussion on the ‘librarian vs lawyer’ in terms of information retrieval within the legal profession.

[5] For detailed discussion of the concerns and benefits of cloud sourcing, see Géczy, Izumi and Hasida (2012) in Global Journal of Business Research, 6(2), 57-71.

[6] i.e. the internet becomes the storage and service provider of digital documents, which are no longer anchored to a physical location.

Tuesday, 18 October 2011

DITA - Understanding blog No. 4: Information Retrieval

After last week's session on retrieving structured data from a database management system, this week's ask of retrieving unstructured data from the wide expanse of the Internet seems a staggering insurmountable task on paper. But is it really? I argue not. We do this kind of thing on a daily basis and we don't really give it much thought. The next time you want to use Google to search for tickets for an upcoming gig or theatre show, think carefully about what you are actually ... retrieving specific information from a whole mass of information deposited on the net. It has some order (websites, webpages) but we don't know exactly where we are going to find it, or even whether we will actually find anything relevant.

Information retrieval has three definitions depending on your viewpoint as either a user, a system or a source. A user typically has inadequate knowledge on the subject they are searching for, and hence seek to retrieve information through a search request to enlighten them. A system stores information, processes it and makes it available for retrieval through software and hardware. it is the technology that allows the user to search how they want to. A source is the document that contains the information we wish to retrieve, it has an intended purpose and audience. Information is a valuable commodity which is ripe for exploitation: it can be bought and sold as a service.

Information retrieval on the internet occurs whenever we make a web search (we want to find some information online). Broder (2000) conceived a taxonomy for web searching by looking at the different types of query we make:

Navigational queries (e.g. finding the home page for a company when you don't know the precise URL)
Transactional queries (e.g. a mediated activity, such as purchasing a box of fudge)
Informational queries (e.g. finding information on a particular subject, such as what is available or how to do something)

All the above queries are textual based (i.e. we are seeking a written record of the information). The web is home to a selection of different non-textual media, such as images and videos, and therefore the scope of our searching can be expanded to the following categories:

Known-item retrieval i.e. the user knows the exact item necessary to satisfy their informational need (e.g. a particular movie or video hosted online)
Fact retrieval i.e. the user knows what they want but do not have the precise information in order to fulfill their need. (e.g. which actor played a certain part in a particular movie)
Subject retrieval i.e. the user is looking to a subject, which is not precisely defined (e.g. the most memorable deaths in horror films)
Exploratory retrieval i.e. checking out what data is available for a provided selection (e.g. searching for classical music on iTunes)

Before information can be searched, it needs to be in a specific format in order to be retrieved (e.g. HTML, XML, MPEG). Media needs to be processed in the correct way before it can indexed correctly. In order to assist the indexing process, a number of processes should be followed with the text descriptors for the media to be retrieved:

     0. identify the fields you wish to make searchable in the index (e.g. the most memorable parts of the
          document which are typically searched for, such as title, author, year etc. This allows for highly
          accurately, focused searching to be carried out)

identify words that will act as keywords for a search procedure, which will be those terms or phrases that are likely to be searched by the user. A consideration of whether digits, non A-Z characters will be included or excluded needs to be undertaken. Keeping the keywords in lowercase will yield more accurate search results.
remove stop words such as and, the.
stem words, by cutting off the suffix to allow for wider searching of a concept or term e.g. act! would bring up results for acting, actors, actions etc.
define synonyms, i.e. different words that have the same meaning.

Once you the information has been prepared for indexing, it needs to be formatted into a structure - it can be in the form of a surrogate record, i.e. a record within the database which acts as a 'list of records' for all the information contained in the database that you are interested in) or as an inverted file (i.e. we look at words to find documents, rather than the other way around ... looking from the inside out!)

Index structure in place ... we can now search! Search models for information retrieval include boolean connectors (AND, OR, NOT), proximity searching (within same sentence, paragraph or phrase; word adjacency), best match results generated through ranking systems built into search engines such as Google, and simply browsing the internet (bypasses any indexes in place).
Should the preliminary search fail, we can then try a manual query modification (by adding or removing terms from the initial search query) or try an automatic query modification such as a 'show me more on this topic' which is provided for you by the search engine.

Once you have conducted a search, how do you determine how relevant the results are? You need to evaluate it.

It can be done qualitatively, from a user viewpoint (was that user was satisfied with the search results) or a sources viewpoint (how much should the user be charged for search services providing relevant results)

It can be done quantitatively from a systems viewpoint, by which we can evaluate the retrieval effectiveness and efficiency by calculating precision and recall respectively:

Precision = the proportion of retrieved documents that are relevant
   = relevant documents retrieved
      total documents returned

Recall = the proportion of relevant docs documents
   = relevant documents retrieved
       total number of relevant documents in the database

The practical lab session allowed us to explore the exercise of information retrieval by using two internet search engines, Google and Bing, to search for a variety of information by making search queries, then calculating the precision and recall of each engine. Because we are already well versed in searching the internet, and because I already used advanced search models such as boolean connectors for online searching, I was able to find relevant searches efficiently. The session as a whole however reinforced the need for well structured index structures and precise searching models to be in place if we are to retrieve information that is relevant to our needs at the time we need to access it.

Saturday, 15 October 2011

DITA - Understanding Blog No. 2B: HTML and the Internet (Practical)

The practical lab exercise essentially asked us to explore HTML (hyperlink text mark-up language) and create some documents that we would be able to publish on the web through the University's webspace (too kind City, too kind!).

HTML, like any language, needs to be a clearly defined set of instructions in terms which must be followed and understood by the end user. The document is the mouthpiece of the creator (here, for example, our instructions are set out as ASCII text in a simple wordpad format), and the listener is the world wide web (it is reads the HTML code from the wordpad document, translates and reproduces the "ideas" in a visual form which it publishes on the designated medium i.e. as a webpage on the internet). It must be universal in application otherwise we would there would be inconsistencies and misunderstandings in the content, structure and meaning of the information which we wish to communicate. It is therefore crucial that we understand how to communicate fluently in HTML, otherwise the information we wish to share will become "lost in translation".

The 'instructions' of the HTML are known as tags. Examples are for paragraph, which specifies that a new paragraph is to be inserted; <hr> for horizontal rule, which specifies that a horizontal line is inserted at the place on the document (presumably to act as a a divider) and <ol type=""><li></li></ol> for an ordered list, which specifies that you are making a list of items which are to run in a specific order (i.e. they are numbered, lettered).
If you've ever posted on an internet forum, you might already have a flavour of what the basic tags are and how to use them (I am an absolute stickler for making things bold, underlinedand using lots of pretty colours to grab your attention when reading this). The essence of tags is that they must consist of clear instructions, which fundamentally tell the WWW where and when the requested formatting of the ASCII text is to start and where it is to stop on the webpage. A start tag is the instruction in brackets ; the end tag is a forward slash preceding the instruction again in brackets . Tags work in pairs; if you only have one, the solo tag will be read as ASCII text only.

Soooo, with the basics in place, we can now confidently write a basic webpage in HTML. The example used in the lecture being:

A Simple HTML Page With Hyperlink

<HTML>

  <HEAD>

    <TITLE>A Simple HTML Page</TITLE>

  </HEAD>

  <BODY>

    A web page using HTML to produce

    a hyperlink to

    <a href="http://www.city.ac.uk/">

    City University</a>.

  </BODY>

</HTML>

The HTML page opens with a <HTML> start tag and closes with a </HTML> stop tag. This tells the receiver that we will be writing HTML code to say what we want to appear on our page. Every webpage has a HEAD and a TITLE is contained within that. The BODY is the context that appears on in the main browser window, which can include ASCII text, images and hyperlinks.

By creating more HTML webpages, you can effectively create a website by linking them together.

Here is my self-made webpage, as published on the City webspace! Liam's webpage
(note how basic it is ... I have included a few links to other webpages, an ordered and unordered list. I did create subsequent pages and an index page to link them all, but clearly I forgot to publish them. D'oh!)

Cascading style sheets (CSS) can additionally be applied to the internet browser you are using to view the HTML code as a webpage, which applies different stylistic qualities to the format, font size, background colours etc.

So if we master the language, create some content and apply a little creativity (and remember to publish it!!!) ... we can all make our thoughts accessible through HTML and the internet!