Wednesday 4 January 2012

DITA coursework blog 2 - Web 2.0/3.0 and beyond

The URL of this coursework blog is http://library-liam.blogspot.com/2012/01/dita-coursework-blog-2-web-2030-and.html

Title: Library 2.0 colliding with the semantic web: an iceberg effect?

1.  Introduction

The integration of web 2.0 technologies into online library portal interfaces is changing how library users access and interact with information sources; but there is a real danger that user input will create ‘information icebergs’ in which the volume of information expands rapidly with a build-up of a small amount of ‘searchable’ user-centric ontological metadata that ‘freezes’ above the useful primary data, pushing it below the surface where it lurks unseen and inaccessible without deep and effective search retrieval. The library’s raison d’être for making information transparent and discoverable to all users is seemingly lost in the process. The iceberg effect the writer speaks of here is akin to the concept of web 3.0 (the semantic web) which threatens to sink the traditional notion of librarianship.

A comparative analysis of how Web 2.0 technologies have been incorporated into the user interfaces of academic, private sector and public library services, and how the idea of semantic web strategies could apply to electronic library services in the future will allow the writer to consider whether such technologies are truly compatible or not, or if they ever will be.

2.  Library 2.0?

Library 2.0[1] appears to be the next logical evolution of library services in our increasingly digitised online society, and seemingly follows the same trajectory path as the platform (the World Wide Web) that facilitated and now realises the concept. If Web 1.0 first allowed machine-readable data to reach the first online generation as digitally presented information, and Web 2.0 permitted the same, albeit maturing, users to read and write their own data to either add to, edit or remove online information, then it is predicted that the semantic web (Web 3.0) will allow the future generations of internet users to read, write and execute information through provision of user-created ontological metadata to bolster the artificial intelligence of computer servers in performance of our fine-tuned information needs. 

Library 2.0 “simply means making your library’s space (virtual and physical) more interactive, collaborative, and driven by community needs.”[2] Libraries allow access to information collected through their materials, traditionally through read-only catalogues, and now provide online portals that add-value to the same information by encouraging user participation and feedback with those resources through integrated web 2.0 technologies. Brophy (2007) argues that the ‘long tail’[3] effect, that of more and more users joining up to a service provided over the same network thus adding value for each user, is how the idea of web 2.0 marries with digital libraries in the creation of ‘Library 2.0’. 

3.  Information needs

One function of web 2.0 technologies is to allow for the personalisation or customisation of information, and this is dictated by the information needs of the user - its appropriateness will depend on the nature of the library itself.[4]
 
Catalogues and databases are the entry portals for library users who have specific information needs. Web 2.0 can help satisfy needs by directing users to search catalogues in different ways. Computers can analyse the frequency of search terms and relate them to items accessed on the catalogue. Every item is catalogued under a specific subject and relational links are then made by the software. Tagging clouds, as used on the City University library catalogue[5], show the popularity of interrelated search terms or subjects after you make an initial search: the larger words present the most popular and therefore most seemingly relevant related searches. This collaborative employment of web 2.0 technologies and user input creates a knowledge base of information that is realised and actioned through social web interaction by writing those needs as the metadata into the software, and seemingly is the way forward. Truly speaking, “knowledge – its content and organization – is becoming a social act”.[6]

4.  Digital representation and organisation

Libraries have always traditionally maintained a role in selecting, indexing and making available information resources to their users: the form in which it is now represented and organised (digital data) has evolved with technological innovations, but is limited by new constraints.

Modern libraries, hosting OPACs (online public access catalogues), now adhere to MARC formats when cataloguing which are “standards for the representation and communication of bibliographic and related information in machine-readable form.”[7] MARC signified the introduction of library catalogue records as binary code, which allows for the composition of metadata through XML schema particular to that bibliographic information. This in turn allows for implementation and integration of the data through web-hosted services such as application programming interfaces (APIs) and mash-ups[8], and consequently leads into user-manipulation activities such as personalisation and customisation. 

An interface that distracts or deviates from its purpose (accessing information) because it makes navigation cumbersome or impracticable; or where it does not limit the amount of information[9] on the screen by editing the content (remedied by using markers such as hyperlinks or concertinaed drop-down menus) is an indication that the designer has not understood the relevant user information needs. 

The ideology of short-of-time library ‘browsers’ also manifests itself in the online library environment – functional design of library service interfaces enables the optimal representation of information in digital form although it must be clearly signposted and transparent in order to satisfy user needs efficiently.

5.  Real life ‘Library 2.0’ manifestations

The widespread of ‘Library 2.0’ can be seen across all library sectors, and present their users with radical options in accessing information and services virtually.

In public libraries, there has been a slowly expanding realisation of web 2.0 technologies outside of the conventional online library OPACs. Pinfield et al (1998) have identified the current model to be that of the “hybrid library”: a combination of a digital library within the operation of a traditional physical library, illustrated by community library services[10] now providing an e-library in which selected digital material can loaned over the internet.[11]
 
In private sector libraries, such as law, Web 2.0 has been used as a tool for marketing and promoting library services to users.[12] Law is a subject which is constantly evolving and information sources very quickly become out-of-date. Inner Temple Library host a Current Awareness blog[13] which provides “up-to-date information on new case law, changes in legislation and legal news” in the form of hyperlinks to online information resources. This information is also pushed through to users signed up to the Twitter feed[14] or Facebook page[15] and the blog is further supplemented by applications such as buttons to subscribe to RSS feeds, mailing lists and hyperlinks to the organisation’s online presence on other websites. Podcasts[16], instant text messaging and online chat enquiries[17] are other examples of how web 2.0 is being integrated into libraries, however Brophy (2007) notes that there is still a long way to go and in particular that “most library applications do not yet support conversational use and data sharing, including the support of individual and group activity” (at p.143).

6.  Semantic library portals

The implementation of web 3.0 as semantic technologies for use in libraries has not yet been fully realised, but arguably this is not far off.

Chowdhury and Chowdhury (2003) support the use of semantic web technologies in the development of digital libraries, and argue that the seamless access to digital information resources “calls for a mechanism that permits analysis of the meaning of resources, which will facilitate computer processing of information for improved access” (at p.220).

Such a mechanism is arguably akin to a semantic information portal, as advocated by Reynolds, Shabajee and Cayzer (2004) which serves as “... a community information portal that exploits the semantic web standards to improve structure, extensibility, customization and sustainability” (at p.290). 


The Inner Temple Library website, in providing links to a number of databases, blogs and other websites on one page, appears to be the base upon which to build this approach although elements of web 2.0 interactivity, in which comments can be added and attached links shared through social media website accounts, are currently missing and its usefulness and purpose in that context is still questionable as mixing work with pleasure on social networks remains a contentious area for most.

7.  Below the surface – hidden depths and dangers

Digital information representation and provision needs to be transparent, and information searched for, retrieved for an accessed over a dynamic platform such as the World Wide Web[18] becomes a murky prospect to successfully accomplish.

Baeza-Yates and Ribero-Neto (2011) identify the main challenges posed by the internet in terms of searching for information which include an unbalanced distribution of data across limited bandwidth and unreliable network connections, harbouring inconsistent access due to changing file destinations and broken links; and a semantic redundancy of semi-structured data where the same HTML pages are constantly replicated through searching - added to the problems inherent in self-published information of questionable integrity being in different formats and languages.

All the above factors arguably affect the recall (if the amount of searchable information increases although with the number of data formats) and precision (if the search results present data which is not of high integrity) of information search retrieval results.

Baker (2008) identifies the hurdles to clear as the “need to develop shared and common standards, interoperability, and open access using the same concepts and terminologies and on an international scale” (at p.8).

Libraries in particular, as trusted information providers in their community, should seek the correct balance in maintaining their editorial control[19] to preserve data integrity, whilst listening to their user base in developing search strategies, using metadata to enrich their catalogue records to satisfy their information needs by reducing search time and producing highly relevant results.

8.  Conclusion

To return to the ‘information iceberg’ analogy advocated in the introduction, the problems in providing discoverable information appear firmly connected to the widening of the ‘semantic gap’, which increases as more and more information is uploaded online without first being catalogued or indexed, creating an insurmountable iceberg of older unclassified information that sinks down under the surface as more and more user-generated data (and metadata) is populated online, although authenticity and reliability of such data immediately comes into doubt as it lacks authority.

In addition, search engines will need to deal with all the content afforded by web 2.0 technologies through HTML pages which are dynamically generated and therefore inherently complex”.[20] New technological advances therefore create new problems to overcome.

Van Dijk (1999) perhaps gives a stark warning as to the future of digital information caught up in Web 3.0 in foreseeing the risk and consequences of over-reliance of information agents as the semantic solution:

“Systems get smarter, but their users might become stupider” (at p.185).

Whilst computers can adapt to user preferences, they cannot react to changing human values and emotions and cannot be completely pre-programmed. Over-reliance on information devices can isolate the user from the real world and therefore miss out on interactions and opportunities one can only obtain from human contact. Traditional libraries and their human agents, in providing and supporting the resources necessary in user information searches face-to-face within defined real-world environments, avoid these problems instantly. 

The information iceberg cannot be left to expand without human observation or we risk losing control, order and sight of the value of digital information.


References


[1] The term ‘Library 2.0’ was first introduced by Michael Casey in his blog entry titled ‘Librarians without Borders’ dated 26th September 2005. See: http://www.librarycrunch.com/2005/09/
[2] As defined by Sarah Houghton-John in her blog entry titled ‘Library 2.0: Michael Squared dated December 2005. See: http://librarianinblack.net/librarianinblack/2005/12/library_20_disc.html
[3] Based in ideas propounded by Chris Anderson in his blog entry titled ‘The Long Tail’ dated October 2004. See: http://web.archive.org/web/20041127085645/www.wired.com/wired/archive/12.10/tail.html
[4] Coles (1998) identified that the information needs of users between public and private libraries are not the same.
[5] See ‘refine by tag’ window appearing on the right hand side of the screen after a user search is made.
[6] Weinberger (2007) at p. 133
[8] A good example of a mobile mash-up application is Library Anywhere, used by City University Library, which provides access to library catalogues and user accounts via a smart phone screen: see http://libguides.city.ac.uk/content.php?pid=234596&sid=2157583
[9] Chu (2010) states the essential problem in information representation and retrieval to be “how to obtain the right information at the right time despite the existence of other variables in the [...] environment” (at p.18).
[10] The writer’s local example: Hertfordshire County Council – online library services: http://www.hertsdirect.org/services/libraries/online/
[12] Harvey (2003) at p. 37 notes that current awareness blogs can be used to remind users of [information] services they may have been previously unaware of, and also allows for innovation on the part of the blog writer in developing and improving those services.
[14] https://twitter.com/inner_temple - Twitter username: @Inner_Temple
[16] Such as that provided by the British Library, see: http://www.bl.uk/podcast
[17] For example, 24/7 live chat communication with a librarian is provided through the government supported People’s Network website: http://www.peoplesnetwork.gov.uk/
[18] Baeza-Yates and Ribeiro-Neto (2011) notes this to be “chaotic and unstructured, providing information that may be of questionable accuracy, reliability, completeness or currency” (at p.685).
[19] See Weinberger (2007)
[20] Baeza-Yates and Ribeiro-Neto (2011) at p. 450.


Bibliography

Anderson, C. (2004) The Long Tail, Wired (blog), [online] available at: http://web.archive.org/web/20041127085645/www.wired.com/wired/archive/12.10/tail.html [accessed 20th December 2011]

Baeza-Yates, R., and Ribeiro-Neto, B. (2011) Modern information retrieval : the concepts and technology behind search. 2nd ed. London: Pearson Education.

Baker, D. (2008) From needles and haystacks to elephants and fleas: strategic information management in the information age, New Review of Academic Librarianship, 14: 1–16 [online] via LISTA, accessed 31st October 2011.

British Library website, Podcasts, [online] available at: http://www.bl.uk/podcast [accessed 20th December 2011]

Casey, M. (2005) Librarians Without Borders, [online] available at: http://www.librarycrunch.com/2005/09/ [accessed 20th December 2011]

Casey, M. and Savastinuk, L.C. (2006) Library 2.0: service for the next generation library, Library Journal, [online] available at: http://www.libraryjournal.com/article/CA6365200.html [accessed 20th December 2011]

Chowdhury, G.G and Chowdhury, S. (2003) Organizing information - from the shelf to the web. London: Facet Publishing.

Chu, H. (2010) Information representation and retrieval in the digital age. 2nd ed. Medford, New Jersey: Information Today, Inc.

City University Library website, [online] available at: http://www.city.ac.uk/library/ [accessed 16th December 2011]

City University Library – LibGuides – Mobile Devices webpage, [online] available at: http://libguides.city.ac.uk/content.php?pid=234596&sid=2157583 [accessed 31st December 2011]

Coles, C. (1998) Information seeking behaviour of public library users: use and non-use of electronic media. In: Wilson, T.D. and Allen, D.A., ed. 1999. Exploring the contexts of information behaviour. London: Taylor Graham Publishing, 321-329

Current Awareness from the Inner Temple Library website, [online] available at http://www.innertemplelibrary.com/ [accessed: 7th December 2011].

Harvey, T. (2003) The role of the legal information officer. Oxford: Chandos Publishing.
Hertfordshire County Council – Libraries website, [online], available at: http://www.hertsdirect.org/services/libraries/online/ [accessed 31st December 2011]

Herts e-library service website, [online], available at: http://herts.lib.overdrive.com/8F191FBA-0D95-4AA6-915E-691A653360D5/10/491/en/Default.htm [accessed 31st December 2011]

Houghton-John, S. (2005) Library 2.0 Discussion: Michael Squared, Librarianinblack (blog), [online] available at: http://librarianinblack.net/librarianinblack/2005/12/library_20_disc.html [accessed 31st December 2011]

Inner Temple Library Facebook page, [online] available at: http://www.facebook.com/innertemplelibrary [accessed 31st December 2011]

Inner Temple Library Twitter page, [online] available at: https://twitter.com/inner_temple [accessed 31st December 2011]

MARC STANDARDS: Library of Congress – Network Development and MARC Standards Office website, [online] available at: http://www.loc.gov/marc/ [accessed 31st December 2011]

People’s Network – online services from public libraries website, [online] available at: http://www.peoplesnetwork.gov.uk/ [accessed 31st December 2011]

Pinfield, S., Eaton, J., Edwards, C., Russell, R., Wissenberg, A., and Wynne, P. (1998) Realizing the hybrid library, D-Lib Information Magazine, October 1998, [online] available at: http://www.dlib.org/dlib/october98/10pinfield.html [accessed 19th December 2011].

Reynolds, D., Shabajee, P and Cayzer, S. (2004) Semantic information portals, [online] available at: http://www2004.org/proceedings/docs/2p290.pdf [accessed 7th December 2011].

Van Dijk, J. (1999) The network society : social aspects of media [translated by Leontine Spoorenberg]. London: Sage Publications.

Weinberger, D. (2007) Everything is miscellaneous: the power of the new digital disorder. New York: Times Books/Henry Holt and Company.

Tuesday 15 November 2011

DITA - Understanding Blog 7: Mobile Information

This topic is of huge interest to me, because sadly, I will confess now that ... I ... am ... a ... mobile information addict.

There, I said it! I'm not ashamed to make such an admission, but part of me wishes I could turn back time to a period of my earlier life where I was able to blissfully wander this earth without having access of all kinds of useful and useless information at my fingertips, where ever I was. Life was so much simpler back then!

Ask just about anyone these days to produce their phone on the spot, and you'll gather that most people will have a smart phone, which is essentially a small computer with internet connectivity, with, oh yeah! ... a phone built in. You can surf the WWW, check your emails, download music, movies, ringtones and install all kind of useful software applications which in some way aim to improve your life, or find ways to distract you from it!
Speaking from experience, I use this device more for the smart (capabilities) than the phone (function), although I carry with me and use that latter function more as a security which I value above my superficial needs to check what all my friends are up to on Facebook.

The main issue faced by the (hardware) architects of these devices, and subsequently by the software programmers presently, is the hurdle of providing access to services and programmes on mobile devices that were initially designed as desktop applications for larger computers and laptops, which have larger memory, more processing power and bigger displays than their new, smaller relatives.

Mobile devices are also context sensitive (i.e. they know where they are) and the integration of GPS technology in them is now standard. It allows for interactivity through applications that utilise measurements of longitude and latitude to provide localised information relevant to the user on the move, orientation of the device itself to translate the mobile user's kinetic movement of the device and camera/imaging recognition through the use of a built-in camera. They also utilise Bluetooth technology to communicate with other mobile devices by sharing small amounts of data.

An excellent example of a mobile application that I use which demonstrates all these capabilities is 'Nearest Tube' on the iPhone. It is an 'augmented reality' app which uses the camera to give you a view of your surroundings through the device's screen. Transposed on top of that real time image is layered data indicating your position relative to that of the nearest tube station. By holding the phone and rotating the angle of the lens, the position of the fixed location markers change to indicate whether you are moving closer or further away to them. This is functional to the user as it acts an interactive visual compass that they can follow to reach a destination providing only the information that need to know in order to achieve that objective (i.e. their position, the position of the tube station, the distance and the direction between the two places.)


In order to facilitate the mobility of information from static workstations (desktop PCs) to 'on-the-go' devices (smart phones), a preliminary evaluation of user information needs has to take place. We need to assess what information is available to the users, determine is the most valuable or desired information the user will want to know or access (the core data) and keep it, putting to the one side the less valued information or be disposing of it altogether. Finally we take the core data and build functionality around it in order to display it to the user using the most effective means. All subsidiary information is concertinaed to make it less visible and prominent but still accessible should the user wish to use it.

This is where we can revisit the idea of APIs and mash-ups, as a workable solution to this information-reducing conundrum. The idea is to extract only the core data and build an interface around it on the platform which optimises the visibility and prominence of that information, all the while being aware of the limits or constraints of that platform. For example, software plug-ins such as Flash, Javascript or Shockwave which embed interactive moving animations into web pages consume computing resources and are therefore not suitable, or compatible, with smaller processors. Access such a web page from your smart phone, for example, and an 'error/incompatibility' message appears in place of the plug-in. Access denied. Website owners, be it the press, business or even individuals without commercial intent, are wise to this, and at the risk of alienating a large market of mobile internet users who will visit their sites to read information, have developed mobile versions of their sites. The objective of such sites is go back to the basics in providing key information with no thrills - a diluted version of their full site which places accessibility over content in terms of assessing the user's needs. Functionality is used as the tool to bridge the gap between the two, as in essence you can have the best of both worlds.

A good mobile site, for example, provides clear navigation to the heavily used functions of that site, such as news stories, images or maps, timetables or calculators which all provide some practical immediate use to the user. The data involved here should be anything considered so important that the user should not have to spend any time searching for its location. It should be prominent and grab the reader's attention within a matter of seconds. Anything that is exploratory (i.e. information which is supplementary in nature, requires more time to digest or is too large to condense) can be hidden away under a concertinaed menu, or in a smaller font, off to the side for example. This allows for access, but the user will have to specifically search for it if they want to access it. The underlying focus here is on conveying content using a minimalist design.

A good mobile application takes the data from a website and imports it for use on the mobile device platform, accessed through an interface designed to harness the power and limitations of that platform, in order to present the data in the more relevant and consumable form. The approach is remove any white space or any unessential features of the website/information, and make the available (limited) screen space as functional as possible by filling it with large clear buttons and fonts that present the data as information which is spoon-fed to the user. The use of virtual, context sensitive keyboards which understand what type of data-input is required in order to access or manipulate the information is an intuitive step forward (such as months and years to select a date, figures for inputting a telephone number, or a special symbols such as @ or [.com] when typing an email). Touch screen 'gestures' such as swipe to move between documents, pinch bigger or smaller to zoom in and out also aid navigation on a small screen by reducing the need to scroll further along or down a page in order to access the pre-existing navigation functions of the site.

The practical exercise for this topic asked us to design a mobile application to support our learning in this subject. This is already available as a web resource (Moodle) which, admittedly, has been very well designed as a learning portal. The task therefore seems to ask how we could effectively make Moodle mobile ... a Moobile application :-)
The user is a student, and their information need is that they want to find out the basic amount of information to see them through a day at University. They want to know what lectures they have to attend, what subject or topic will be covered in the lecture, what reading to do in conjunction with the lecture, receive messages about their course that are relevant, such as changes to the timetable, coursework submission deadlines etc.
A mobile application would take into account all of these needs - by presenting a clear, simple interface that provides the user localised information on that particular day: an upcoming events box with three events, be it their lectures or social club activities that they have booked into, that updates over time and is therefore fresh and dynamic. A window to the Moodle subject area should also be prominent, that is context and time sensitive thus presenting a link to read the next lecture notes before and during its allotted time. A portal displaying the last 5 emails received on their University accounts, and a separate portal linked into their library account showing the number of books they have taken out, when the next loan is due to be returned and when loan requests that available to collect, together with fines that have been incurred. All other user account information is concertinaed under drop down menus at the bottom of the screen, that links to the full website version of the relevant web pages.

These are just a few ideas, but ones that this user will happily consume on the move. I just need to remember the necessity to switch-off the desire to access information in my pocket when there is not a social or immediate need for it!

Thursday 10 November 2011

The best mash-up I've ever heard.

Following from my lengthy DITA write-up earlier in the week-up, I forgot to mention my favourite mash-up song!

Here it is. A collision of 39 songs!

Enjoy:

Tuesday 8 November 2011

DITA - Understanding Blog No. 5 & 6 - Web 2.0, Web Services and APIs

I have decided to consolidate my learning on the first two topics of the second half of the DITA module, as they seem to interlink quite nicely. Explaining how the Web 2.0 technologies work, then describing the methods by which they provide information as a service to users via the internet, and finally looking at the interfaces (APIs) created to mask the internal complexities of the client systems underneath to make the information personalized to the user will be my main focus.

Its only human nature to be nosy. We spend our waking hours actively locating and investigating information about the world, other people and sometimes about ourselves! The most accessible portal for doing this is undeniably now the internet, a powerhouse of interconnected data networks, sprinkled with services and applications that essentially hand us the information we seek if we know how to find it or where to find it.

Web 2.0 was the term coined in 2005 to describe the emergence of ICTs being used to provide online services that allow the user to collaborate with other users across the same network. Traditionally the internet has been catergorised by the server-client model by which requests for data are made, received and sent back with a definite start (the client request made to the server) and a definite end (the server response received by the client). Web 2.0 effectively turns this on its head, whereby the clients, no longer statically content or restricted on waiting to receive another's generated answers, are now empowered by technology to pro-actively create and send their own data, rapidly and at will. The clients become the servers: the internet becomes the system platform; the server computers become the access point rather than the facilitator.

The online world truly becomes a global social network.

Web 2.0 applications have consumed our daily lives: they are addictive, often gluttonous in terms of data access, rapidly evolving and rapidly updating according to our needs and whims. We have now more social places to 'hang out' online, either by killing our own time (YouTube, Wikipedia) or be using our time to informally engage with others (Twitter, Facebook). Proximity and time is never an issue when we can access these places on the go using mobile devices.

All these Web 2.0 applications feel inclusive as they give us the choice as to whether we engage or spectate - create our own new data to cast-off, or swim in the same sea of other people's data. The choice is ours because it is now in our hands - technologies have become cheaper and quicker to produce and maintain, which enables us to post updates, share photos and write own our website without any technical knowledge or skill required on our part. It creates a rich user experience without any of stresses involved in understanding how to make it work. It is open and available to all, although again the choice as to whether we involve ourselves is subjective, the choice determined by own own ethical, moral and political sensibilities.

The Web 2.0 applications (such as the very blog I am typing) are all examples of  web services. They are in essence computer software applications that have been installed 'on the internet', as opposed to the local hard-drive contained in your laptop/PC. In a similar vein, the data created through a new blog entry, a tweet or a facebook status update isn't saved or stored on your PC, it floats in limbo somewhere on the internet until such point when we ask to access it. We can access this data from any location that has internet connectivity. Cloud computing appears to be the next big thing, what with Google (Googledocs) and Apple (iCloud) offer cloud services to their users.

In his lecture notes, Richard Butterworth sets out a concise definition for web services, by distinguishing it from a web page:


A web page is a way of transferring information over the internet which is primarily aimed to be read by humans
A web service, in contrast, is a way of transferring information over the internet which is primarily aimed to be read by machines.

So in essence, a web service uses web technology to pass information around the internet that is readable by machines. It is in the form of a 'language' that computers read and process in accordance with the metatags that are assigned to the data therein. The information pushed around is content only: there are no structure or presentation instrcutions included. Computers do not know or understand the meaning behind text: they can't distinguish between different parts of data as text unless there is some explicit instruction in the programming code they receive that 'label' the text accordingly as having some different meaning. Computers don't know the different between the titles and authors of some work: we as humans do though!

Web services are not intended expected to be the user end point. They are the means by which we send machine-readable data to client PCs, who then reprocess it and make it more appropriate and accessible to the user.

The programming code for a web service is XML (eXtensible Mark-Up Language). It provides, as a set of machine-readable instructions, the core data marked up with metadata (via metags) to clearly give a value or worth ("name", "price", "location" etc.) that can be interpreted by a number of other machine systems, which then display the data in the correct context, albeit in different parameters.

A good example of this would be Facebook. The positioning and level of information that are visible to the user when logged in through a computer terminal will be different (fuller, due to the optimisation of space and function provided by internet browsers and plugins) than for the same page accessed through a different machine (a tablet, or smart phone for example).

XML allows us to manipulate data  to describe it in the form of your choice. Facebook understand they can't replicate the exact same layout on a web browser and on say an iPhone, so they create a new interface (an app) for the platform they wish to deliver their service to, to enable the same data in the XML code to be reproduced in the most efficient way on that platform.

This is an example of an API (Application Programming Interface). Think of using the analogy of a car: You don't need to know what's under the bonnet of your car in order to drive it!

It allows programmers to build an external shell (such as a mobile phone application), compatible with the XML code, without being concerned with how the complicated internal workings of the system underneath actually works. Programmers build upon the functionality of existing web services by creating add-ons that slot into the DNA of the service and allow users to interact in innovative or progressive ways. Examples of APIs are widgets that you can write into HTML code, and effectively place a portal to another part or service on the internet. A Twitter feed box that updates with your tweets as you send them, a button under a news story allowing you to 'like' that story and publish it on your facebook profile, or a Google map box which reproduces a section of map and marks your business/office location to enable a website visitor to find you. I have just described and examples of some of the combinations of web services with APIs, which allow for interesting mash-ups to be created in the online community. Advanced programming language such as Javascript in your web browser makes allow for this level of  web service manipulation. As part of the practical lab exercise, I set up a page and included some APIs into the HTML code. Click here to see some of the examples explained above!

The same old dangers seemingly lurk under the surface however: the amount of information going online needs moderation and control, permanence and integrity of data is compromised. How data is stored, accessed and retrieved, and the reasons behind these activities are highly contentious, controversial and  potentially damaging. How we classify, order and regulate the information we create, by creating metadata such as tag clouds and folksonomies, is loose and imprecise if there are no existing guidelines to follow, and leads to misinterpretations and cyber squabbles over use in context, if we don't agree on it. Web 2.0 threatens to engulf our lives and identities if we allow such technologies to define us as a society.

Final thought: the real danger appears to be that we don't know the extent of how much of our personal data is held on the internet. We may never get to see it all ... we only see whatever they want us to see!

Thursday 27 October 2011

DITA coursework blog - Web 1.0 (the internet and WWW, databases and information retrieval)


Title: Language and access in Digital Information Technologies and Architecture, with a focus on law libraries

1. Introduction

An underlying principle of digital information is that it is data which must be written in a specific language so that it can be stored in sources, communicated by systems and retrieved by users. Once this is achieved, access to data must be managed using appropriate technologies. I will consider this statement in the context of modern law libraries to assess the present and future impact on the provision of digital resources to their users.

2. Evaluating

Digital technologies must take into account the information needs of library users, who in today’s digital age, most commonly seek information from online subscription databases and web resources. Sources of information in law libraries are typically law reports, journal articles or legislation: predominantly accessed as either printed or digital text based information. The latter must be in a specified format in order to be read: it is data attributed a form capable of precise meaning through logical coding and sequencing – in essence a ‘language’. 

Computers are system linguists which communicate data over connected networks (the internet) via a service (the World Wide Web). Computers read and interpret data in binary form: bits are assigned characters and form words as ASCII text; and collected together, they create files which make up documents, such as database records or web pages. Human users are only able to subjectively evaluate text for meaning and relevance in a form they understand. Computers do not understand “human” language, and so evaluate the language within the data: metadata. Hypertext is a language used to inter-link data in one document, or link data between documents. Web pages are written in Hypertext Mark-up Language (HTML) so the data can be read by internet browsers, which interpret metatags (ordered ASCII text relaying strict instructions on layout and structure) as distinct from standard ASCII text. 

The advent of e-books has seen a shift towards digital readership, where books translated into ASCII text can enjoy wider distribution to library users over the internet. This indicates the future of how libraries will provide materials to their users; but issues of cost, reliability and user misgivings on rapid technological advancement still impact on access.

3. Managing

Managing data at core is concerned with providing users with access points. There are two sources of digital information available to library users: internal (databases) and external (the internet). 

Databases organise and order available data in accordance with the user’s information needs, a primary example being an OPAC catalogue of a library’s holdings. Language is the control. Structured Query Language (SQL) commands relational databases to perform queries to retrieve selective data from a number of interrelated data tables. 
Databases permit searches by two methods: natural language and controlled vocabularies. If the natural language search terms are not clear, or irrelevant search results are returned, the user may deploy query modification to adjust the language used and yield better results. Controlled vocabularies such as indexing and thesauri may signpost users in context to data that may or may not be relevant. We should expect more relevant database search results than compared to say an internet search engine's results, permitting that the data is there to be retrieved.

Libraries can combine access to both databases and the web concurrently to permit wider scope for information retrieval. Brophy (2007, p.113-4) sees an importance of use behind the access and retrieval process, thus directly linking users to resources. He also implies that use involves the creation of “information objects of various kinds”. A library portal, such as created by the Inner Temple Library[1], is a good example of this – it is an online access point to a number of databases, together with hyperlinks to web resources including a subject index and current awareness blog. Maloney and Bracke (2005, p.87) emphasises that this “is not a single technology. Rather it is a combination of several systems, standards and protocols that inter-operate to create a unified experience for the user”. This means of federated searching[2] is emerging as a possible solution to remove the complexities of cross-searching multiple databases.

Information retrieval over the web is a double-edged sword: on one hand there is a wealth of dedicated resources available online; however an inexpert user will only ever retrieve a small percentage of relevant data from this due to the “invisible web”[3]: a detrimental consequence of a global resource that is dynamically evolving, but where authenticity and permanence is compromised as more and more information goes online. Limb (2004, p.60) believes this could be combated by building federated repositories to harvest in a wealth of relevant cyber resources, but the task may appear onerous and unmanageable.

4. Conclusion

The communication chain between users, systems and sources is dependent on the efficient and concise use of language in order to access and retrieve data. A break in the chain, such as incomplete HTML code or a broken hyperlink, can shutdown access to information, leaving the information seeker locked-out. The architects of the computer systems dictate the choice and methods by which data is represented, but as non-subject specialists, they may not understand the information they give access may not fulfil the user’s needs. A compromise perhaps should be reached.[4]

Recent developments such cloud sourcing[5] look set to change how society store and access digital information, in that information users can retrieve documents via the internet without prior knowledge of where the source document is physically rooted. It appears cloud sourcing makes the service, the source.[6] 

I cannot see how law libraries could happily subscribe to these developments: information retrieval is too deeply rooted in specialist knowledge and language coupled with the need for reasonable proximity between the user and their sources. As technologies enable information to become cheaper to produce and maintain; the information is more eagerly consumed by non-experts who have inexpert skill and knowledge in accessing and evaluating relevant information. 

The legal information professional, acting as the bridge between users, systems and sources, therefore remains crucial to the information access and retrieval processes.

Bibliography

Brophy, P. (2007). The library in the twenty-first century. 2nd ed. London: Facet Publishing.

The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011).

Maloney, K. & Bracke, P.J. (2005). Library portal technologies. In: Michalak, S.C., ed. 2005. Portals and libraries. New York: The Haworth Information Press. Ch.6.

Limb, P. (2004). Digital Dilemmas and Solutions. Oxford: Chandos Publishing.

Pedley, P. (2001). The invisble web: searching the hidden parts of the internet. London: Aslib-IMI.

Harvey, T. (2003). The role of the legal information officer. Oxford: Chandos Publishing.

Géczy, P., Izumi, N. and Hasida, K. (2012). Cloudsourcing: managing cloud adoption. Global Journal of Business Research, 6(2), 57-71. (accessed: EBSCOhost - 25th October 2011.)

References


[1] The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011)
[2] See Limb (2004, p.59).
[3] For further discussion, see: Pedley (2001) The Invisible Web: Searching the hidden parts of the internet. London: Aslib-IMI.
[4] See Harvey (2003, p.143-6) for a persuasive discussion on the ‘librarian vs lawyer’ in terms of information retrieval within the legal profession.
[5] For detailed discussion of the concerns and benefits of cloud sourcing, see Géczy, Izumi and Hasida (2012) in Global Journal of Business Research, 6(2), 57-71.
[6] i.e. the internet becomes the storage and service provider of digital documents, which are no longer anchored to a physical location.

Tuesday 18 October 2011

DITA - Understanding blog No. 4: Information Retrieval

After last week's session on retrieving structured data from a database management system, this week's ask of retrieving unstructured data from the wide expanse of the Internet seems a staggering insurmountable task on paper. But is it really? I argue not. We do this kind of thing on a daily basis and we don't really give it much thought. The next time you want to use Google to search for tickets for an upcoming gig or theatre show, think carefully about what you are actually ... retrieving specific information from a whole mass of information deposited on the net. It has some order (websites, webpages) but we don't know exactly where we are going to find it, or even whether we will actually find anything relevant.

Information retrieval has three definitions depending on your viewpoint as either a user, a system or a source. A user typically has inadequate knowledge on the subject they are searching for, and hence seek to retrieve information through a search request to enlighten them. A system stores information, processes it and makes it available for retrieval through software and hardware. it is the technology that allows the user to search how they want to. A source is the document that contains the information we wish to retrieve, it has an intended purpose and audience. Information is a valuable commodity which is ripe for exploitation: it can be bought and sold as a service.

Information retrieval on the internet occurs whenever we make a web search (we want to find some information online). Broder (2000) conceived a taxonomy for web searching by looking at the different types of query we make:
  • Navigational queries (e.g. finding the home page for a company when you don't know the precise URL)
  • Transactional queries (e.g. a mediated activity, such as purchasing a box of fudge)
  • Informational queries (e.g. finding information on a particular subject, such as what is available or how to do something)
All the above queries are textual based (i.e. we are seeking a written record of the information). The web is home to a selection of different non-textual media, such as images and videos, and therefore the scope of our searching can be expanded to the following categories:
  • Known-item retrieval i.e. the user knows the exact item necessary to satisfy their informational need (e.g. a particular movie or video hosted online)
  • Fact retrieval i.e. the user knows what they want but do not have the precise information in order to fulfill their need. (e.g. which actor played a certain part in a particular movie)
  • Subject retrieval i.e. the user is looking to a subject, which is not precisely defined (e.g. the most memorable deaths in horror films)
  • Exploratory retrieval i.e. checking out what data is available for a provided selection (e.g. searching for classical music on iTunes)
Before information can be searched, it needs to be in a specific format in order to be retrieved (e.g. HTML, XML, MPEG). Media needs to be processed in the correct way before it can indexed correctly. In order to assist the indexing process, a number of processes should be followed with the text descriptors for the media to be retrieved: 

     0.  identify the fields you wish to make searchable in the index (e.g. the most memorable parts of the
          document which are typically searched for, such as title, author, year etc. This allows for highly
          accurately, focused searching to be carried out)
  1. identify words that will act as keywords for a search procedure, which will be those terms or phrases that are likely to be searched by the user. A consideration of whether digits, non A-Z characters will be included or excluded needs to be undertaken. Keeping the keywords in lowercase will yield more accurate search results.
  2. remove stop words such as and, the.
  3. stem words, by cutting off the suffix to allow for wider searching of a concept or term e.g. act! would bring up results for acting, actors, actions etc.
  4. define synonyms, i.e. different words that have the same meaning.
Once you the information has been prepared for indexing, it needs to be formatted into a structure - it can be in the form of a surrogate record, i.e. a record within the database which acts as a 'list of records' for all the information contained in the database that you are interested in) or as an inverted file (i.e. we look at words to find documents, rather than the other way around ... looking from the inside out!)

Index structure in place ...  we can now search! Search models for information retrieval include boolean connectors (AND, OR, NOT), proximity searching (within same sentence, paragraph or phrase; word adjacency), best match results generated through ranking systems built into search engines such as Google, and simply browsing the internet (bypasses any indexes in place).
Should the preliminary search fail, we can then try a manual query modification (by adding or removing terms from the initial search query) or try an automatic query modification such as a 'show me more on this topic' which is provided for you by the search engine.

Once you have conducted a search, how do you determine how relevant the results are? You need to evaluate it.

It can be done qualitatively, from a user viewpoint (was that user was satisfied with the search results) or a sources viewpoint (how much should the user be charged for search services providing relevant results)

It can be done quantitatively from a systems viewpoint, by which we can evaluate the retrieval effectiveness and efficiency by calculating precision and recall respectively:

Precision = the proportion of retrieved documents that are relevant
   = relevant documents retrieved
      total documents returned

Recall = the proportion of relevant docs documents
   =  relevant documents retrieved
       total number of relevant documents in the database

The practical lab session allowed us to explore the exercise of  information retrieval by using two internet search engines, Google and Bing, to search for a variety of information by making search queries, then calculating the precision and recall of each engine. Because we are already well versed in searching the internet, and because I already used advanced search models such as boolean connectors for online searching, I was able to find relevant searches efficiently. The session as a whole however reinforced the need for well structured index structures and precise searching models to be in place if we are to retrieve information that is relevant to our needs at the time we need to access it.