Tuesday 15 November 2011

DITA - Understanding Blog 7: Mobile Information

This topic is of huge interest to me, because sadly, I will confess now that ... I ... am ... a ... mobile information addict.

There, I said it! I'm not ashamed to make such an admission, but part of me wishes I could turn back time to a period of my earlier life where I was able to blissfully wander this earth without having access of all kinds of useful and useless information at my fingertips, where ever I was. Life was so much simpler back then!

Ask just about anyone these days to produce their phone on the spot, and you'll gather that most people will have a smart phone, which is essentially a small computer with internet connectivity, with, oh yeah! ... a phone built in. You can surf the WWW, check your emails, download music, movies, ringtones and install all kind of useful software applications which in some way aim to improve your life, or find ways to distract you from it!
Speaking from experience, I use this device more for the smart (capabilities) than the phone (function), although I carry with me and use that latter function more as a security which I value above my superficial needs to check what all my friends are up to on Facebook.

The main issue faced by the (hardware) architects of these devices, and subsequently by the software programmers presently, is the hurdle of providing access to services and programmes on mobile devices that were initially designed as desktop applications for larger computers and laptops, which have larger memory, more processing power and bigger displays than their new, smaller relatives.

Mobile devices are also context sensitive (i.e. they know where they are) and the integration of GPS technology in them is now standard. It allows for interactivity through applications that utilise measurements of longitude and latitude to provide localised information relevant to the user on the move, orientation of the device itself to translate the mobile user's kinetic movement of the device and camera/imaging recognition through the use of a built-in camera. They also utilise Bluetooth technology to communicate with other mobile devices by sharing small amounts of data.

An excellent example of a mobile application that I use which demonstrates all these capabilities is 'Nearest Tube' on the iPhone. It is an 'augmented reality' app which uses the camera to give you a view of your surroundings through the device's screen. Transposed on top of that real time image is layered data indicating your position relative to that of the nearest tube station. By holding the phone and rotating the angle of the lens, the position of the fixed location markers change to indicate whether you are moving closer or further away to them. This is functional to the user as it acts an interactive visual compass that they can follow to reach a destination providing only the information that need to know in order to achieve that objective (i.e. their position, the position of the tube station, the distance and the direction between the two places.)


In order to facilitate the mobility of information from static workstations (desktop PCs) to 'on-the-go' devices (smart phones), a preliminary evaluation of user information needs has to take place. We need to assess what information is available to the users, determine is the most valuable or desired information the user will want to know or access (the core data) and keep it, putting to the one side the less valued information or be disposing of it altogether. Finally we take the core data and build functionality around it in order to display it to the user using the most effective means. All subsidiary information is concertinaed to make it less visible and prominent but still accessible should the user wish to use it.

This is where we can revisit the idea of APIs and mash-ups, as a workable solution to this information-reducing conundrum. The idea is to extract only the core data and build an interface around it on the platform which optimises the visibility and prominence of that information, all the while being aware of the limits or constraints of that platform. For example, software plug-ins such as Flash, Javascript or Shockwave which embed interactive moving animations into web pages consume computing resources and are therefore not suitable, or compatible, with smaller processors. Access such a web page from your smart phone, for example, and an 'error/incompatibility' message appears in place of the plug-in. Access denied. Website owners, be it the press, business or even individuals without commercial intent, are wise to this, and at the risk of alienating a large market of mobile internet users who will visit their sites to read information, have developed mobile versions of their sites. The objective of such sites is go back to the basics in providing key information with no thrills - a diluted version of their full site which places accessibility over content in terms of assessing the user's needs. Functionality is used as the tool to bridge the gap between the two, as in essence you can have the best of both worlds.

A good mobile site, for example, provides clear navigation to the heavily used functions of that site, such as news stories, images or maps, timetables or calculators which all provide some practical immediate use to the user. The data involved here should be anything considered so important that the user should not have to spend any time searching for its location. It should be prominent and grab the reader's attention within a matter of seconds. Anything that is exploratory (i.e. information which is supplementary in nature, requires more time to digest or is too large to condense) can be hidden away under a concertinaed menu, or in a smaller font, off to the side for example. This allows for access, but the user will have to specifically search for it if they want to access it. The underlying focus here is on conveying content using a minimalist design.

A good mobile application takes the data from a website and imports it for use on the mobile device platform, accessed through an interface designed to harness the power and limitations of that platform, in order to present the data in the more relevant and consumable form. The approach is remove any white space or any unessential features of the website/information, and make the available (limited) screen space as functional as possible by filling it with large clear buttons and fonts that present the data as information which is spoon-fed to the user. The use of virtual, context sensitive keyboards which understand what type of data-input is required in order to access or manipulate the information is an intuitive step forward (such as months and years to select a date, figures for inputting a telephone number, or a special symbols such as @ or [.com] when typing an email). Touch screen 'gestures' such as swipe to move between documents, pinch bigger or smaller to zoom in and out also aid navigation on a small screen by reducing the need to scroll further along or down a page in order to access the pre-existing navigation functions of the site.

The practical exercise for this topic asked us to design a mobile application to support our learning in this subject. This is already available as a web resource (Moodle) which, admittedly, has been very well designed as a learning portal. The task therefore seems to ask how we could effectively make Moodle mobile ... a Moobile application :-)
The user is a student, and their information need is that they want to find out the basic amount of information to see them through a day at University. They want to know what lectures they have to attend, what subject or topic will be covered in the lecture, what reading to do in conjunction with the lecture, receive messages about their course that are relevant, such as changes to the timetable, coursework submission deadlines etc.
A mobile application would take into account all of these needs - by presenting a clear, simple interface that provides the user localised information on that particular day: an upcoming events box with three events, be it their lectures or social club activities that they have booked into, that updates over time and is therefore fresh and dynamic. A window to the Moodle subject area should also be prominent, that is context and time sensitive thus presenting a link to read the next lecture notes before and during its allotted time. A portal displaying the last 5 emails received on their University accounts, and a separate portal linked into their library account showing the number of books they have taken out, when the next loan is due to be returned and when loan requests that available to collect, together with fines that have been incurred. All other user account information is concertinaed under drop down menus at the bottom of the screen, that links to the full website version of the relevant web pages.

These are just a few ideas, but ones that this user will happily consume on the move. I just need to remember the necessity to switch-off the desire to access information in my pocket when there is not a social or immediate need for it!

Thursday 10 November 2011

The best mash-up I've ever heard.

Following from my lengthy DITA write-up earlier in the week-up, I forgot to mention my favourite mash-up song!

Here it is. A collision of 39 songs!

Enjoy:

Tuesday 8 November 2011

DITA - Understanding Blog No. 5 & 6 - Web 2.0, Web Services and APIs

I have decided to consolidate my learning on the first two topics of the second half of the DITA module, as they seem to interlink quite nicely. Explaining how the Web 2.0 technologies work, then describing the methods by which they provide information as a service to users via the internet, and finally looking at the interfaces (APIs) created to mask the internal complexities of the client systems underneath to make the information personalized to the user will be my main focus.

Its only human nature to be nosy. We spend our waking hours actively locating and investigating information about the world, other people and sometimes about ourselves! The most accessible portal for doing this is undeniably now the internet, a powerhouse of interconnected data networks, sprinkled with services and applications that essentially hand us the information we seek if we know how to find it or where to find it.

Web 2.0 was the term coined in 2005 to describe the emergence of ICTs being used to provide online services that allow the user to collaborate with other users across the same network. Traditionally the internet has been catergorised by the server-client model by which requests for data are made, received and sent back with a definite start (the client request made to the server) and a definite end (the server response received by the client). Web 2.0 effectively turns this on its head, whereby the clients, no longer statically content or restricted on waiting to receive another's generated answers, are now empowered by technology to pro-actively create and send their own data, rapidly and at will. The clients become the servers: the internet becomes the system platform; the server computers become the access point rather than the facilitator.

The online world truly becomes a global social network.

Web 2.0 applications have consumed our daily lives: they are addictive, often gluttonous in terms of data access, rapidly evolving and rapidly updating according to our needs and whims. We have now more social places to 'hang out' online, either by killing our own time (YouTube, Wikipedia) or be using our time to informally engage with others (Twitter, Facebook). Proximity and time is never an issue when we can access these places on the go using mobile devices.

All these Web 2.0 applications feel inclusive as they give us the choice as to whether we engage or spectate - create our own new data to cast-off, or swim in the same sea of other people's data. The choice is ours because it is now in our hands - technologies have become cheaper and quicker to produce and maintain, which enables us to post updates, share photos and write own our website without any technical knowledge or skill required on our part. It creates a rich user experience without any of stresses involved in understanding how to make it work. It is open and available to all, although again the choice as to whether we involve ourselves is subjective, the choice determined by own own ethical, moral and political sensibilities.

The Web 2.0 applications (such as the very blog I am typing) are all examples of  web services. They are in essence computer software applications that have been installed 'on the internet', as opposed to the local hard-drive contained in your laptop/PC. In a similar vein, the data created through a new blog entry, a tweet or a facebook status update isn't saved or stored on your PC, it floats in limbo somewhere on the internet until such point when we ask to access it. We can access this data from any location that has internet connectivity. Cloud computing appears to be the next big thing, what with Google (Googledocs) and Apple (iCloud) offer cloud services to their users.

In his lecture notes, Richard Butterworth sets out a concise definition for web services, by distinguishing it from a web page:


A web page is a way of transferring information over the internet which is primarily aimed to be read by humans
A web service, in contrast, is a way of transferring information over the internet which is primarily aimed to be read by machines.

So in essence, a web service uses web technology to pass information around the internet that is readable by machines. It is in the form of a 'language' that computers read and process in accordance with the metatags that are assigned to the data therein. The information pushed around is content only: there are no structure or presentation instrcutions included. Computers do not know or understand the meaning behind text: they can't distinguish between different parts of data as text unless there is some explicit instruction in the programming code they receive that 'label' the text accordingly as having some different meaning. Computers don't know the different between the titles and authors of some work: we as humans do though!

Web services are not intended expected to be the user end point. They are the means by which we send machine-readable data to client PCs, who then reprocess it and make it more appropriate and accessible to the user.

The programming code for a web service is XML (eXtensible Mark-Up Language). It provides, as a set of machine-readable instructions, the core data marked up with metadata (via metags) to clearly give a value or worth ("name", "price", "location" etc.) that can be interpreted by a number of other machine systems, which then display the data in the correct context, albeit in different parameters.

A good example of this would be Facebook. The positioning and level of information that are visible to the user when logged in through a computer terminal will be different (fuller, due to the optimisation of space and function provided by internet browsers and plugins) than for the same page accessed through a different machine (a tablet, or smart phone for example).

XML allows us to manipulate data  to describe it in the form of your choice. Facebook understand they can't replicate the exact same layout on a web browser and on say an iPhone, so they create a new interface (an app) for the platform they wish to deliver their service to, to enable the same data in the XML code to be reproduced in the most efficient way on that platform.

This is an example of an API (Application Programming Interface). Think of using the analogy of a car: You don't need to know what's under the bonnet of your car in order to drive it!

It allows programmers to build an external shell (such as a mobile phone application), compatible with the XML code, without being concerned with how the complicated internal workings of the system underneath actually works. Programmers build upon the functionality of existing web services by creating add-ons that slot into the DNA of the service and allow users to interact in innovative or progressive ways. Examples of APIs are widgets that you can write into HTML code, and effectively place a portal to another part or service on the internet. A Twitter feed box that updates with your tweets as you send them, a button under a news story allowing you to 'like' that story and publish it on your facebook profile, or a Google map box which reproduces a section of map and marks your business/office location to enable a website visitor to find you. I have just described and examples of some of the combinations of web services with APIs, which allow for interesting mash-ups to be created in the online community. Advanced programming language such as Javascript in your web browser makes allow for this level of  web service manipulation. As part of the practical lab exercise, I set up a page and included some APIs into the HTML code. Click here to see some of the examples explained above!

The same old dangers seemingly lurk under the surface however: the amount of information going online needs moderation and control, permanence and integrity of data is compromised. How data is stored, accessed and retrieved, and the reasons behind these activities are highly contentious, controversial and  potentially damaging. How we classify, order and regulate the information we create, by creating metadata such as tag clouds and folksonomies, is loose and imprecise if there are no existing guidelines to follow, and leads to misinterpretations and cyber squabbles over use in context, if we don't agree on it. Web 2.0 threatens to engulf our lives and identities if we allow such technologies to define us as a society.

Final thought: the real danger appears to be that we don't know the extent of how much of our personal data is held on the internet. We may never get to see it all ... we only see whatever they want us to see!

Thursday 27 October 2011

DITA coursework blog - Web 1.0 (the internet and WWW, databases and information retrieval)


Title: Language and access in Digital Information Technologies and Architecture, with a focus on law libraries

1. Introduction

An underlying principle of digital information is that it is data which must be written in a specific language so that it can be stored in sources, communicated by systems and retrieved by users. Once this is achieved, access to data must be managed using appropriate technologies. I will consider this statement in the context of modern law libraries to assess the present and future impact on the provision of digital resources to their users.

2. Evaluating

Digital technologies must take into account the information needs of library users, who in today’s digital age, most commonly seek information from online subscription databases and web resources. Sources of information in law libraries are typically law reports, journal articles or legislation: predominantly accessed as either printed or digital text based information. The latter must be in a specified format in order to be read: it is data attributed a form capable of precise meaning through logical coding and sequencing – in essence a ‘language’. 

Computers are system linguists which communicate data over connected networks (the internet) via a service (the World Wide Web). Computers read and interpret data in binary form: bits are assigned characters and form words as ASCII text; and collected together, they create files which make up documents, such as database records or web pages. Human users are only able to subjectively evaluate text for meaning and relevance in a form they understand. Computers do not understand “human” language, and so evaluate the language within the data: metadata. Hypertext is a language used to inter-link data in one document, or link data between documents. Web pages are written in Hypertext Mark-up Language (HTML) so the data can be read by internet browsers, which interpret metatags (ordered ASCII text relaying strict instructions on layout and structure) as distinct from standard ASCII text. 

The advent of e-books has seen a shift towards digital readership, where books translated into ASCII text can enjoy wider distribution to library users over the internet. This indicates the future of how libraries will provide materials to their users; but issues of cost, reliability and user misgivings on rapid technological advancement still impact on access.

3. Managing

Managing data at core is concerned with providing users with access points. There are two sources of digital information available to library users: internal (databases) and external (the internet). 

Databases organise and order available data in accordance with the user’s information needs, a primary example being an OPAC catalogue of a library’s holdings. Language is the control. Structured Query Language (SQL) commands relational databases to perform queries to retrieve selective data from a number of interrelated data tables. 
Databases permit searches by two methods: natural language and controlled vocabularies. If the natural language search terms are not clear, or irrelevant search results are returned, the user may deploy query modification to adjust the language used and yield better results. Controlled vocabularies such as indexing and thesauri may signpost users in context to data that may or may not be relevant. We should expect more relevant database search results than compared to say an internet search engine's results, permitting that the data is there to be retrieved.

Libraries can combine access to both databases and the web concurrently to permit wider scope for information retrieval. Brophy (2007, p.113-4) sees an importance of use behind the access and retrieval process, thus directly linking users to resources. He also implies that use involves the creation of “information objects of various kinds”. A library portal, such as created by the Inner Temple Library[1], is a good example of this – it is an online access point to a number of databases, together with hyperlinks to web resources including a subject index and current awareness blog. Maloney and Bracke (2005, p.87) emphasises that this “is not a single technology. Rather it is a combination of several systems, standards and protocols that inter-operate to create a unified experience for the user”. This means of federated searching[2] is emerging as a possible solution to remove the complexities of cross-searching multiple databases.

Information retrieval over the web is a double-edged sword: on one hand there is a wealth of dedicated resources available online; however an inexpert user will only ever retrieve a small percentage of relevant data from this due to the “invisible web”[3]: a detrimental consequence of a global resource that is dynamically evolving, but where authenticity and permanence is compromised as more and more information goes online. Limb (2004, p.60) believes this could be combated by building federated repositories to harvest in a wealth of relevant cyber resources, but the task may appear onerous and unmanageable.

4. Conclusion

The communication chain between users, systems and sources is dependent on the efficient and concise use of language in order to access and retrieve data. A break in the chain, such as incomplete HTML code or a broken hyperlink, can shutdown access to information, leaving the information seeker locked-out. The architects of the computer systems dictate the choice and methods by which data is represented, but as non-subject specialists, they may not understand the information they give access may not fulfil the user’s needs. A compromise perhaps should be reached.[4]

Recent developments such cloud sourcing[5] look set to change how society store and access digital information, in that information users can retrieve documents via the internet without prior knowledge of where the source document is physically rooted. It appears cloud sourcing makes the service, the source.[6] 

I cannot see how law libraries could happily subscribe to these developments: information retrieval is too deeply rooted in specialist knowledge and language coupled with the need for reasonable proximity between the user and their sources. As technologies enable information to become cheaper to produce and maintain; the information is more eagerly consumed by non-experts who have inexpert skill and knowledge in accessing and evaluating relevant information. 

The legal information professional, acting as the bridge between users, systems and sources, therefore remains crucial to the information access and retrieval processes.

Bibliography

Brophy, P. (2007). The library in the twenty-first century. 2nd ed. London: Facet Publishing.

The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011).

Maloney, K. & Bracke, P.J. (2005). Library portal technologies. In: Michalak, S.C., ed. 2005. Portals and libraries. New York: The Haworth Information Press. Ch.6.

Limb, P. (2004). Digital Dilemmas and Solutions. Oxford: Chandos Publishing.

Pedley, P. (2001). The invisble web: searching the hidden parts of the internet. London: Aslib-IMI.

Harvey, T. (2003). The role of the legal information officer. Oxford: Chandos Publishing.

Géczy, P., Izumi, N. and Hasida, K. (2012). Cloudsourcing: managing cloud adoption. Global Journal of Business Research, 6(2), 57-71. (accessed: EBSCOhost - 25th October 2011.)

References


[1] The Inner Temple Library Catalogue: http://www.innertemplelibrary.org/external.html (accessed: 25th October 2011)
[2] See Limb (2004, p.59).
[3] For further discussion, see: Pedley (2001) The Invisible Web: Searching the hidden parts of the internet. London: Aslib-IMI.
[4] See Harvey (2003, p.143-6) for a persuasive discussion on the ‘librarian vs lawyer’ in terms of information retrieval within the legal profession.
[5] For detailed discussion of the concerns and benefits of cloud sourcing, see Géczy, Izumi and Hasida (2012) in Global Journal of Business Research, 6(2), 57-71.
[6] i.e. the internet becomes the storage and service provider of digital documents, which are no longer anchored to a physical location.

Tuesday 18 October 2011

DITA - Understanding blog No. 4: Information Retrieval

After last week's session on retrieving structured data from a database management system, this week's ask of retrieving unstructured data from the wide expanse of the Internet seems a staggering insurmountable task on paper. But is it really? I argue not. We do this kind of thing on a daily basis and we don't really give it much thought. The next time you want to use Google to search for tickets for an upcoming gig or theatre show, think carefully about what you are actually ... retrieving specific information from a whole mass of information deposited on the net. It has some order (websites, webpages) but we don't know exactly where we are going to find it, or even whether we will actually find anything relevant.

Information retrieval has three definitions depending on your viewpoint as either a user, a system or a source. A user typically has inadequate knowledge on the subject they are searching for, and hence seek to retrieve information through a search request to enlighten them. A system stores information, processes it and makes it available for retrieval through software and hardware. it is the technology that allows the user to search how they want to. A source is the document that contains the information we wish to retrieve, it has an intended purpose and audience. Information is a valuable commodity which is ripe for exploitation: it can be bought and sold as a service.

Information retrieval on the internet occurs whenever we make a web search (we want to find some information online). Broder (2000) conceived a taxonomy for web searching by looking at the different types of query we make:
  • Navigational queries (e.g. finding the home page for a company when you don't know the precise URL)
  • Transactional queries (e.g. a mediated activity, such as purchasing a box of fudge)
  • Informational queries (e.g. finding information on a particular subject, such as what is available or how to do something)
All the above queries are textual based (i.e. we are seeking a written record of the information). The web is home to a selection of different non-textual media, such as images and videos, and therefore the scope of our searching can be expanded to the following categories:
  • Known-item retrieval i.e. the user knows the exact item necessary to satisfy their informational need (e.g. a particular movie or video hosted online)
  • Fact retrieval i.e. the user knows what they want but do not have the precise information in order to fulfill their need. (e.g. which actor played a certain part in a particular movie)
  • Subject retrieval i.e. the user is looking to a subject, which is not precisely defined (e.g. the most memorable deaths in horror films)
  • Exploratory retrieval i.e. checking out what data is available for a provided selection (e.g. searching for classical music on iTunes)
Before information can be searched, it needs to be in a specific format in order to be retrieved (e.g. HTML, XML, MPEG). Media needs to be processed in the correct way before it can indexed correctly. In order to assist the indexing process, a number of processes should be followed with the text descriptors for the media to be retrieved: 

     0.  identify the fields you wish to make searchable in the index (e.g. the most memorable parts of the
          document which are typically searched for, such as title, author, year etc. This allows for highly
          accurately, focused searching to be carried out)
  1. identify words that will act as keywords for a search procedure, which will be those terms or phrases that are likely to be searched by the user. A consideration of whether digits, non A-Z characters will be included or excluded needs to be undertaken. Keeping the keywords in lowercase will yield more accurate search results.
  2. remove stop words such as and, the.
  3. stem words, by cutting off the suffix to allow for wider searching of a concept or term e.g. act! would bring up results for acting, actors, actions etc.
  4. define synonyms, i.e. different words that have the same meaning.
Once you the information has been prepared for indexing, it needs to be formatted into a structure - it can be in the form of a surrogate record, i.e. a record within the database which acts as a 'list of records' for all the information contained in the database that you are interested in) or as an inverted file (i.e. we look at words to find documents, rather than the other way around ... looking from the inside out!)

Index structure in place ...  we can now search! Search models for information retrieval include boolean connectors (AND, OR, NOT), proximity searching (within same sentence, paragraph or phrase; word adjacency), best match results generated through ranking systems built into search engines such as Google, and simply browsing the internet (bypasses any indexes in place).
Should the preliminary search fail, we can then try a manual query modification (by adding or removing terms from the initial search query) or try an automatic query modification such as a 'show me more on this topic' which is provided for you by the search engine.

Once you have conducted a search, how do you determine how relevant the results are? You need to evaluate it.

It can be done qualitatively, from a user viewpoint (was that user was satisfied with the search results) or a sources viewpoint (how much should the user be charged for search services providing relevant results)

It can be done quantitatively from a systems viewpoint, by which we can evaluate the retrieval effectiveness and efficiency by calculating precision and recall respectively:

Precision = the proportion of retrieved documents that are relevant
   = relevant documents retrieved
      total documents returned

Recall = the proportion of relevant docs documents
   =  relevant documents retrieved
       total number of relevant documents in the database

The practical lab session allowed us to explore the exercise of  information retrieval by using two internet search engines, Google and Bing, to search for a variety of information by making search queries, then calculating the precision and recall of each engine. Because we are already well versed in searching the internet, and because I already used advanced search models such as boolean connectors for online searching, I was able to find relevant searches efficiently. The session as a whole however reinforced the need for well structured index structures and precise searching models to be in place if we are to retrieve information that is relevant to our needs at the time we need to access it.

Marathon update. It happened. I rocked.

In-between a heavy dose of DITA today (almost caught up!), I totally forgot to follow up on something important.

I ran the Abingdon Marathon on Sunday and it didn't kill me!!!

Debut marathon time of 3 hours 18 minutes and 31 seconds. Finished 206th out of 777 runners. So, so proud of this achievement. I've spent the last quarter of a year training for that one race and I'm relieved that I gave it a bloody good go!

Loved the experience, but won't be repeating it a hurry! Having a week off from running but I'll be back out there before you know it ;-)

DITA - Understanding Blog No. 3 - Relational Databases

To start off, I will confess now that databases fascinate me. In every job I have worked, from being on the customer service desk in a busy supermarket to managing the IP portfolio for a multinational drinks producer, I have used a Database Management System (DBMS) to assist me in my vocation. The three main components of a database are the data-set, the users requiring access to that data, and the systems applications and processes which permit the user to access that data. A balance of three is required although I would suggest that a database without users loses its sense of identity and purpose and reverts back to being simply information. If users are the valuers of the data; the database is the facilitator of data access.

DBMS manage 'structured' data, that is, data which we have carefully selected and stored in a specific form, to be accessed for a particular purpose, on any number of occasions, potentially by a number of different users. Its acts as the user interface to access this information, and imposes security controls to restrict access according to the data/user type. The efficient management of large amounts of data is crucial, because we are only likely to need to access small amounts of data subjectively relevant to our needs at any particular time, such as when we make a query on an individual component of any data-set (i.e. we may only want to find out two pieces of data such as name, location, salary of a specified employee - not every employee in the same building or department). Every database user is likely to have a different informational need that such a query seeks to satisfy, and the ability of a DBMS to sift through and filter data in accordance with our individual requirements is fundamental in achieving this.

DBMS facilitate and permit access to a core set of data. This eliminates the need for duplicate entries, and thus promotes user efficiency and improves data integrity. In order to provide better access to the data, relationships need to be established between pieces of data which draw upon the logical process of user enquiry. In its simplest form, a database can be represented as a table consisting of rows and columns: each row stands as a single entry under a data-field (column) (i.e. a person, company, item), each row has a unique identifier, 'a primary key', which identifies the data in that row to be different from all other rows. Where data in the table is duplicated or diluted (i.e. it is not focused on the user), data from one column is removed and is replaced by a 'foreign key' in the first table, which links to a second separate table which contains the now removed data-field. Each row in the second table relates to the specified 'foreign key' for data contained in the first table, thus creating links between the two tables. This linking is the essence of database construction. With the links now in place, we can now search for data between the available tables to create a 'database'.

Creating a simple database requires the use of precise, uniform language to retrieve data from a number of tables. This is commonly known as SQL (Structured Query Language, in full). I'll refer to basic examples given in the lecture notes, as follows:

"To create a table, we can insert the following SQL commands:
create table tablename ( column1, column2, ... columnN );
...where column is the column name, followed by the column data type,
possibly followed by modifiers like 'primary key.

To populate a table you use the insert into command as follows...
insert into Department values ( 1, 'Sales', 'London' );"










Once we have created and populated the table, we can now query it.

To query a database, we need to SELECT a data field (i.e. name, location, salary) FROM a specified table/s (i.e. user table, location table) WHERE certain conditions are met (i.e. = equals, a "precise item", > is greater etc.) AND where a second condition is required.

An example of an advanced SQL command is:

SELECT Fname, Lname, Dept_Name
FROM Employee, Department
WHERE Dept = 2
AND Dept = Dept_No


Our practical lab session involved us interrogating a database containing bibliographic data for a number of publication, using a variety of increasingly complex SQL commands to retrieve specific information from the database. Getting the SQL commands correct and retrieving something resembling 'useful' information was an uphill struggle to begin with, but improved as I became more fluent in the language. Like HTML, it is essential that the instructions you give are fully realised and executed with precision, as you are given no leeway if a single character is wrong or is out of place!
The clear and correct use of commands and connectors is imperative for effective querying.
 
I have never thought of the "science" behind databases, and this session gave me a great insight and appreciation of the DNA of a simple relational database. One day (technical ability abiding), I would love to be be able to write a database, but until I get the hang of SQL and querying other users' databases, it may be a long day coming!!!

Saturday 15 October 2011

Getting up to speed ... and a loooooong run

Finally getting back up to speed with my blog entries. Obviously a little behind (I need to get a shift on and write about relational databases!), but other 'life' things are happening and are causing a distraction.

Major distraction with this weekend is my first ever marathon. Arrrrrrgh!

I'm running the Abingdon Marathon in Oxfordshire. The race starts at 8:45am (majorly early!) and I hope it'll only take me a few hours or so to run it!

It feels like I'm about to take my final exam, or graduate or something: 17 long weeks of intensive training, and it all comes down to one day!

Once that's out of my system, I'm head-down back into study! Wish me luck and I'll update you next week as to how it all went!

DITA - Understanding Blog No. 2B: HTML and the Internet (Practical)

The practical lab exercise essentially asked us to explore HTML (hyperlink text mark-up language) and create some documents that we would be able to publish on the web through the University's webspace (too kind City, too kind!).

HTML, like any language, needs to be a clearly defined set of instructions in terms which must be followed and understood by the end user. The document is the mouthpiece of the creator (here, for example, our instructions are set out as ASCII text in a simple wordpad format), and the listener is the world wide web (it is reads the HTML code from the wordpad document, translates and reproduces the "ideas" in a visual form which it publishes on the designated medium i.e. as a webpage on the internet).  It must be universal in application otherwise we would there would be inconsistencies and misunderstandings in the content, structure and meaning of the information which we wish to communicate. It is therefore crucial that we understand how to communicate fluently in HTML, otherwise the information we wish to share will become "lost in translation".

The 'instructions' of the HTML are known as tags. Examples are <p> for paragraph, which specifies that a new paragraph is to be inserted; <hr> for horizontal rule, which specifies that a horizontal line is inserted at the place on the document (presumably to act as a a divider) and <ol type=""><li></li></ol> for an ordered list, which specifies that you are making a list of items which are to run in a specific order (i.e. they are numbered, lettered).
If you've ever posted on an internet forum, you might already have a flavour of what the basic tags are and how to use them (I am an absolute stickler for making things <b>bold</b>, <u>underlined</u>and using lots of pretty colours to grab your attention when reading this). The essence of tags is that they must consist of clear instructions, which fundamentally tell the WWW where and when the requested formatting of the ASCII text is to start and where it is to stop on the webpage. A start tag is the instruction in brackets <p>; the end tag is a forward slash preceding the instruction again in brackets </p>. Tags work in pairs; if you only have one, the solo tag will be read as ASCII text only.

Soooo, with the basics in place, we can now confidently write a basic webpage in HTML. The example used in the lecture being:

A Simple HTML Page With Hyperlink
<HTML>
  <HEAD>
    <TITLE>A Simple HTML Page</TITLE>
  </HEAD>
  <BODY>
    A web page using HTML to produce
    a hyperlink to
    <a href="http://www.city.ac.uk/">
    City University</a>.
  </BODY>
</HTML>

The HTML page opens with a <HTML> start tag and closes with a </HTML> stop tag. This tells the receiver that we will be writing HTML code to say what we want to appear on our page. Every webpage has a HEAD and a TITLE is contained within that. The BODY is the context that appears on in the main browser window, which can include ASCII text, images and hyperlinks.

By creating more HTML webpages, you can effectively create a website by linking them together.

Here is my self-made webpage, as published on the City webspace! Liam's webpage
(note how basic it is ... I have included a few links to other webpages, an ordered and unordered list. I did create subsequent pages and an index page to link them all, but clearly I forgot to publish them. D'oh!)

Cascading style sheets (CSS) can additionally be applied to the internet browser you are using to view the HTML code as a webpage, which applies different stylistic qualities to the format, font size, background colours etc.

So if we master the language, create some content and apply a little creativity (and remember to publish it!!!) ... we can all make our thoughts accessible through HTML and the internet!



Thursday 6 October 2011

DITA - Understanding Blog No. 2A: All things Internet and World Wide Web

Our lecture opened with an analogy: if the Internet is the road infrastructure, then the WWW is the car driving down it. I like analogies :-)

The Internet is an large infrastructure connecting computers across networks. This allows us to share and access information remotely. It forms the building blocks of all online communications.

The World Wide Web (WWW) is the service or the vehicle designed to enable us to use and manage information across the global network we refer to as the Internet.

The Internet facilitates the operation of the WWW: the latter being dependent on the former. In essence, client computers (such as the everyday PCs or laptops we use to surf the web, check emails etc.) send requests for information to all powerful server computers (which store masses of archived data backups) whenever we attempt to access an online resource such as a webpage. The server computer listens out for the requests and by way of acknowledging them, send back the requested information to the client computer. The lines by which the electronic communications travel are the networks, this global network of networks being the Internet.

Everything you see and touch in the online world is anchored: the resource file containing that information will be saved on a hard disk somewhere i.e. it has a physical location. In order to access that file, we need to ask for it. If we know the precise location, it becomes easy to find. We can do this using a Uniform Resource Locator (URL). A typical URL contains the name of the server, domain and the folder and/or sub-folders containing the file on the server computer.

In the lecture notes, a URL is represented using the following formula:

<protocol>://<server dns name >/<local file path in relation to server folder>

http://www.fvspartans.org.uk/clubchamps.shtml
can be broken up into
http://    www.    fvspartans   .org.uk/     clubchamps.shtml

The first two bits of information tell us that we are seeking a world wide web document and that it is to be transferred to us through the hypertext transfer protocol (HTTP). The file will seek is therefore a hypertext mark-up language document (HTML) called 'clubchamps', stored on the server machine named 'www' at 'fvspartans', which is part of the domain 'org' in the United Kingdom, or 'uk'. HTML uses a special type of natural language which only exists in the digital world, which links sections of documents or documents to other documents. Text marked up with links is referred to as hypertext.

The practical side of this topic, explored fully in the lab tutorial, looked at the composition of HTML, which is largely a series of content (such as text and images) surrounded by mark-up codes (such as meta tags which define style and format).

We have been asked to generate a simple HTML document and publish it on the City University web server. Due to time constraints, I am only 60% of the way there and hence will be revisiting this topic in the concluding part of this DITA understanding blog 2B.

Sunday 2 October 2011

A Standalone statistic ...

Okay, I aside from my desire and urges to learn *everything* informatics and science (looking forward to another dose of DITA and a load of LISF tomorrow!), I took part in a race this morning. As in running race ... not the race to get in the shower, get my clothes on and either slouch in the garden/on the sofa etc etc.

Standalone 10K was that race. Set in Letchworth, Hertfordshire (10 minutes drive away from me), its a popular local race where my running club always have a formidable presence. Seriously, we looked like a mob of troublemakers standing on the street corner all in our identical stripey blue club vests.

My stat is this:

I finished 52nd out of 1062 finishers. That's in the top 5% of all people who turned up and decided that running up and down the unkindest hills in the county, on the hottest Sunday in October I've ever had the pleasure of getting sweaty on, would actually be a good, *fun* idea. Categorically we're labelled "runners" which means we have a high pain threshold and do not garner any sympathy from other normal folk, who believe we are actually bonkers. Those normal folk are sadly correct in their beliefs.

Oh, and it was a personal best time for me too - 39 minutes and 39 seconds! I've wanted to go under 40 minutes over 10K for a long while now so it felt really, really great ... once I had got my breath back and recovered of course!

If I can source a picture of me being a loon on Sunday morning, I'll post it here.

Now ... back to the fun part of my day. Study!

Wednesday 28 September 2011

Thought for the day

Working from home is a double-edged sword. I love the unlimited access to tea, hygienic toilet facilities and the fridge, coupled with the most relaxed dress-code I've experienced in about 5 years (dressing gown and slippers look and feel great at 2pm!).

On the other hand, my home is crammed full of DISTRACTIONS which are making me feel completely counterproductive, and are therefore categorically EVIL. Unlimited access to the internet and Facebook is not good. Family don't understand that you're not on holiday and should be studying and so engage with you as if its a lazy Sunday afternoon.

In short: I need to stop using this blog as a distraction and get back to the reading list!!!!

Tuesday 27 September 2011

DITA - Understanding Blog No. 1: Introduction to Computing

This is my first "understanding" blog, which I've been advised to post after each DITA session to consolidate my learning and knowledge on each particular topic.

DITA is the acronym for my first module, Digital Information Technologies and Architectures.

Today we were given an overview of the nature and potential of digital information, by looking at the different levels in which data can be represented. Part of my job as a Library Scientist (yes! I have a cool unofficial student occupation :-) ) is concerned with how we use, manage and manipulate this data in order to make it accessible to a wide and variable audience, be it work colleagues, friends or family, or the online world etc. Another important "user" I need to consider is the computer software program that will very kindly interpret the binary information (the building blocks of all digital information) and display in a format that can be easily interpreted by the end user (i.e. the person reading this blog ... YOU!). If we were presented with a long screen of binary code, we'd possibly consider it nonsensical and alien. That would not be the case for a computer program such as say word-pad or paint, which take the code and manifest this into words or colours instantly. Every code is an instruction to be followed. Getting the code right and then ensuring that code can be interpreted is paramount if we are to represent and interpret the data in the intended manner.

The basic level at which data is represented is a bit, which has a base of 2 (represented in 1 or 0 at its purest form). Different combinations of bits can be assigned characters and hence we can represent words using the binary code. More popularly known as ASCII. Yay!

Everything else builds upwards. Bytes are sequences of 8 bits. Kilobytes consist of 1024 bytes. Megabytes consist of 1024 kilobytes and so on.

My visual aid-memoir is to look at data as the building blocks of everything you see and read in digital form, using a tool (or toy) we've all encountered from our childhoods. If you were to visit Legoland, you may be dazzled by the monumental replicas of famous global landmarks, but when you look closely, every object there is made from the smallest blocks of Lego, all starting from a single block at the bottom ... with another equal sized block placed on top or around it. The accumulation and combination of these Lego blocks form one entity and we can arrange or format them to create and convey a specific containable idea, message or expression (a file). The means of presenting this data to the end user, i.e. the point at which we view and analyse the representation of the data itself, is the document. So 1000 individual blocks together can represent something bigger and intrinsically different than the sum of its parts, simply by careful arrangement of the variable building blocks in a manner that we will interpret them to mean or say something to us when we view them in a certain order.

One thing I have learned from this session is that any form of digital information is stored in one centralised location, from where it can be accessed by any number of programs or persons potentially. We don't simply reproduce it each the same piece of data over and over whenever we want to access it. Instead we can link to the primary source, and as an offshoot, this can be embedded or formatted into a different document-view to add value and depth to the information as it is subsequently accessed by the user (e.g. an example of a different document-view for a text document would be where that text and a YouTube video can both be made available on a html page accessed through the internet). Data takes up a lot of space. We live in a data-intense society in which we are lives depend on instant access to information at every waking moment of the day in order to operate. We need to be economical with storage and act responsibly in the methods by which data is provided and managed.

Monday 26 September 2011

Sitting on a blog ...

Hello blogging-sphere!

I've just set up my new blog to aid me with my studies into Library Science at City University over the coming year, so I'm likely to get all techie and library-ee on your a**!

As an awesome and valued member of the Fairlands Valley Spartans running club in my hometown of Stevenage, I might be tempted to talk about life as their current PB Machine!!!

Keep it here kids :-)