From Our Guest Bloggers alumtag, archives, brooklyn museum, folksonomies, metadata, tagging
Tag Cloud image courtesy of BlogTipz.com
A hot topic often discussed in library school circles is digitization and the immense possibilities for increased access that it presents. Once online, even the most obscure cultural artifacts have the potential to be shared, cited, recommended, remixed, and mashed-up in previously inconceivable ways. In this age of hyperconnectivity, there is perhaps no better example of this than the growing use of social tagging as a means to classify online collections.
Allowing users to contribute metadata (i.e., tags) is less labor-intensive and directly tied to users’ own vocabulary, which can be both a blessing and a curse. Because tags are in the language of the users, issues with synonyms, plurals (e.g., cat and cats), and specificity of tags are quite common. Additional concerns involve relying too heavily on user contributions and the accuracy of each tag. Many information professionals and institutions have been experimenting with ways to combat these problems, and one solution that has been gaining popularity over the last few years is the use of games.
One of the institutions leading the way in the development of these “metadata games” is New York’s own Brooklyn Museum, which has created two games. The first, Tag! You’re it!, displays images from one of their many digitized collections along with a brief description of the item. Users are then prompted to enter as many, or as few, tags as they see fit for each image, earning points for each tag entered. The Museum’s other game, Freeze Tag!, is focused primarily on “cleaning up” existing inaccurate tags on images in their online collections. Once again, users are presented with an image along with a brief description; however, instead of creating new tags, users are asked to evaluate all existing tags for each image, and they again receive points for every tag rated.
Currently, the metadata games gaining the most notoriety are coming out of the Tiltfactor Laboratory directed by Mary Flanagan, professor of digital humanities at Dartmouth College. Thanks to an NEH start-up grant and a fellowship from the American Council of Learned Societies, Tiltfactor has teamed up with Dartmouth’s Rauner Library to create AlumTag. Similar to Tag! You’re it!, AlumTag displays photographs donated by Dartmouth Alumni and prompts users to enter as many, or as few, words associated with the image as they see fit. After four turns, users receive a score based on the number of tags generated, and also receive bonus points for tags that match what other users have contributed. Tiltfactor is also working on other metadata games such as Zen Tag, which is similar to AlumTag, and Guess What?, a two-player game where one user is presented with an array of images to choose from based on clues sent by an anonymous networked partner.
Although most metadata games are still in their experimental phases, the results show enormous potential for their use as tools in the future. According to Flanagan, during the pilot phase of AlumTag, players generated about 32 or 33 tags per image, over 90 percent of which were considered useful. While metadata games can never fully replace the role of information professionals in cataloging online collections, they definitely have enormous potential for use in conjunction with existing classification systems to allow for increased input and access like never before.
UPDATE: I just found out that Mary Flanagan is going to be delivering the opening keynote speech THIS THURSDAY at the CUNY Graduate Center’s “Minding the Body” Conference, which is part of their Digital Initiatives Program. Check it out!
From Our Guest Bloggers cataloging, ebooks, Librarians, Libraries, linked data, metadata, scholarly communication, technology
I was surprised (and pleased) to get so much feedback on my post about the future of libraries and the skills and mindset new librarians should be cultivating. I wrote that post as a way to prepare soon-to-be-degreed librarians for the profession, but a lot of commenters pointed out that current librarians might need to hear that message even more. And they’re right. I think my plea for engaged, creative librarians is motivated largely by my fears about how slow-moving the library world has been in the last decade or two.
When I meet library school students, and current librarians, who seem disinterested in learning about library technologies, or who are skeptical about the value of social networking, the semantic web, smart phones, and e-books, I fear for the future of our profession. We are already playing catch-up in so many areas, and we just can’t afford to continue to waffle in the face of technological change.
I wanted to follow up a bit with some more specifics about what kind of technologies new librarians should be familiar with, or at least know a little something about. These are the things that I think could have a massive positive impact for libraries, if (and when?) we figure out how to implement them.
Maybe I’m just influenced by my current research, but I think linked data could have a huge impact on how libraries manage bibliographic records and catalogs. Right now we’re all doing this ridiculous thing wherein we each buy a copy of some very expensive software, and we copy records into our own personal database, so that bibliographic metadata is duplicated over and over and over again in thousands of different places. I think this is silly, and frankly, leads to poorly managed metadata, and way too much overhead in terms of librarian labor. There is big potential for significant change in the way we manage our metadata, but we need people who understand the benefits and the costs, and who are willing to take a chance on something new. Want to know more? Check out W3C Library Linked Data Incubator Group, and the LODLAM blog. There are some really terrific articles on library linked data, if you have access to a database like LISTA. I highly recommend this article, “The Cataloger’s Revenge: Unleashing the Semantic Web,” by Virginia Schlling, for a good overview.
Another significant thing to start paying attention to are changes in scholarly publishing models, especially if you’re interested in academic librarianship. Due to recent changes to requirements for NSF grants, faculty have to start paying a lot more attention to data management, access, and preservation, and libraries are starting to play a huge role here. Researchers in all fields, even the humanities, are going to start generating more and more data, and we can help them manage it. A lot of people are interested in changing scholarly communication models, and libraries can be significant players, but we have to get involved in the conversation. And we have to be knowledgeable about research practices, digital archiving practices, and the technology that can be provide access to research produced by our universities.
It’s become pretty clear that ebooks are here to stay, and that reading is going to shifting more and more into the digital sphere. We have to be ready for that, and we should be working tirelessly to ensure that we aren’t excluded from the publishing and reading spheres. Learn about digitization initiatives like HathiTrust and the Google Books Project, stay up to date on current lending practices for ebooks, and be aware of challenges, both technological and legal, and potential solutions. You might love the smell of books, and hope that your print collections will continue to draw patrons, but you can’t pretend ebooks don’t exist. If you don’t already have some kind of ebook reader, you should. Kindle apps are free! At the very least, you should have some real experience with digital reading practices, because more and more of your patrons will.
There are some very exciting changes on the horizon for libraries, but we have long had a tendency to bury our heads in the sand and continue doing things the same way, because it’s what we know, because we’re intimidated by the scope of change needed and we don’t think we have the money or time to do what has to be done. But the longer we wait, the harder those changes are going to be.
I’m going to step off my soap box now. I’m heartened to hear from so many young librarians (and I’m not talking about age here) who are enthusiastic about the challenges ahead. Good luck to all of you in your job searches and in your sure-to-be-exciting careers in library land. Hopefully I’ll meet some of you at future conferences and library events: Library land is a small place, after all.
From Our Guest Bloggers archiving, CDL, digital libraries, librarian, metadata, WEST
Working at the California Digital Library is very different from working in a traditional academic library setting. We are part of the University of California system, and provide services to the libraries on all 10 UC campuses, but we aren’t directly part of the library system. The UC libraries are not a streamlined, unified entity, though many initiatives are being undertaken to bring library practices across the UC into closer alignment. The CDL operates under the administrative branch of the UC, the UC Office of the President (UCOP). Working for a huge academic institution like the UC is a lot different from working for a small, private liberal arts college like Whitman, which was my last place of employment. There are a lot of moving pieces, and it’s not always clear how they fit together. And of course, California’s budget crisis is hitting the UC hard, though I think it’s felt a little bit less by CDL than it is in the other UC libraries. We are, after all, providing services that help the other libraries save money and work more efficiently, and in today’s library environment, that’s a pretty important role to play.
Our team of about 70-80 people works on a lot of varied projects. I’ve been here for about five months now, and I still don’t completely understand, or even know, everything that we do. On a very basic level, we provide access to electronic content to all the UC libraries at a reduced cost, because we can purchase it “in bulk,” so to speak. We’re kind of like the managing agent for all of the electronic content subscriptions that the UC libraries purchase collectively, and thus, we also run all the technology that provides access to that content, like link resolvers and authentication services.
Our other primary service is Melvyl, a UC-wide union catalog that allows patrons at any UC campus to search for and borrow materials from any other UC campus. This catalog was traditionally run via an ILS installed here at the CDL, in which each institution’s records were duplicated from their own local catalog. However, we’ve recently made the transition to OCLC’s WorldCat Local, a process that has not been without its headaches and growing pains. Overall, though, it’s been a good move. We at the CDL also manage the inter-UC campus loan service, and the courier service.
Besides these foundational services, CDL runs the UC-wide institutional repository, e-Scholarship, UC Shared Print archiving initiatives, the Mass Digitization project undertaken with Google and the Internet Archive, UC Publishing services, preservation services including EZID, Web archiving services, and DataCite, digital special collections like Calisphere…honestly, the list of things we manage and create is pretty huge, and it’s kind of fun to dig around the CDL website to check some of these projects out.
I wanted to come work at the CDL because it’s a place that is often on the cutting edge of library services. I think of the CDL as the research arm of the UC libraries. We have the time and resources to experiment with new ideas, make mistakes and learn and figure out what will work best to make the UC libraries the most efficient and effective library system it can be. I get to work with really smart, motivated, and passionate people every day, and that is really fun.
I was brought into the CDL fold to work, at least initially, on a project associated with the Western Regional Storage Trust (WEST). WEST is a distributed print journal archiving program, in which a number of institutions across the US are coming together to make collective decisions about retrospective archiving of print journals. Our project, the Print Archives Preservation Registry (PAPR) is affiliated with the Center for Research Libraries, and is being designed to support the kinds of collaborative archiving decisions that bodies like WEST need to make. At its most basic, PAPR is designed to ingest library records from a group of libraries, and analyze the holdings collectively to determine who would be the best candidate to hold an archive of a particular title (or list of titles, more accurately). PAPR will also provide a searchable registry of archived titles and archiving programs, so that individual libraries can make informed de-selection and archiving decisions.
So what does a metadata analyst do on a project like this? My job has been shifting with the project as development and deployment progress. When I first came on to this project, I spent a fair amount of time getting intimately familiar with the requirements of the project, the purpose of our tool, and the needs of our “clients,” both WEST and the CRL. I took an active role in working with the project manager and all the various players in finalizing project requirements and deliverables, and helped the team to really understand the data (largely MARC records) that we are going to be working with. I brought to the team a working knowledge of library practices and library metadata. I worked with our developers to ensure that the database and ingest programs being developed would reflect the kind of data we are going to be receiving.
Now that we’re into the deployment phase, I’m receiving each set of library metadata (i.e. MARC cataloging records) we receive and writing specifications for our programmer to create programs that will allow the data in the record to be ingested into a relational database. As much as MARC is a standard, there are a lot of non-standard elements in each set of records we receive. I have to identify where the rogue data is in each set of records so that our ingest programs can accurately find the data we need in the records.
Whenever I hear catalogers and people who are familiar with MARC records and bibliographic data talk about how they don’t know much about metadata, I feel deeply sad. Because library records ARE metadata. I’m not sure how we managed to convince ourselves that metadata is this new and foreign thing; we’ve been the masters of metadata for centuries. Say it with me now: cataloging librarians are metadata librarians.
The most exciting thing about being a metadata librarian right now is how much things are changing, and how much there is to learn. Our traditional standards and practices are about to change in major ways, but the way I see it, we all managed to learn card catalogs, and we all managed to learn MARC. There’s no reason we can’t all learn something new.
There are a few more projects here at the CDL that I’m going to start working on, and I’m excited to expand my horizons and learn about even more of the awesome things we’re doing here at the CDL. If you’re interested in seeing the future of library services, take a look at the CDL website and explore some of our projects. Maybe you’ll find some new things to introduce into your own library practices.
From Our Guest Bloggers archives, cataloging, Gregory Bateson, Libraries, metadata, Michael Buckland, museums, Suzanne Briet
Museums, archives and libraries all contain collections of assets determined to be valuable or useful at a level of degree the institution decides is acceptable for retention, circulation, or preservation. Their holdings reflect years of acquisitions made for the benefit of their unique user groups. In return, the intellectual results from these collections have inspired an ever expanding body of knowledge produced by their patrons. The cycle is relatively simple: collection > access > creation. But, what fuels this cycle? What keeps the perpetual expansion of knowledge in movement? This is not at all easy to answer; however a single element could be at the nucleus: information.
In his paper â€œInformation-as-Thing,â€ Michael Buckland drew upon the work of early 20th century European Documentalists. Wherein he described their thinking that â€œobjects are not ordinarily documents but become so if they are processed for informational purposes (Buckland, 1991, p. 355).â€ From the Documentalists point of view, documents are seen â€œas a generic term to denote informative thingsâ€ and â€œinclude natural objects, artifacts, objects bearing traces of human activities, objects such as models designed to represent ideas, and works of art, as well as texts (Buckland, 1991, p. 355).â€ He reports the example of an antelope that â€œwould not be a document, but a captured specimen of a newly discovered species that was being studied, described, and exhibited in a zoo would not only have become a document (Buckland, 1991, p. 355).â€ According to Suzanne Briet, a cataloged antelope is the primary document and all derivative documents are secondary. The example of the antelope brings to light a fundamental principle about information- that any thing (whether a text, a mineral, or a living entity) is not informative information until it is intentionally made useful for â€œinformational purposesâ€ and has made an informative difference.
Taking his cues from cybernetics and Enlightenment philosophy, Gregory Bateson wrote, â€œwhat we mean by informationâ€”the elementary unit of informationâ€”is a difference which makes a difference (Bateson, 1972, p. 459).â€ With this simple statement, he removes the concept from the thing (or antelope), and perfectly isolates a core principle: that information is a difference.
Information as â€œa difference which makes a differenceâ€ simply explains that characteristic(s) of something expose its respondent to elements which effect the respondent somehow. Through this detection, inevitable relationships form between differences. These relationships are the foundations for systems. As relationships form and are defined within a system, a structure emerges. â€œEvery effective difference denotes a demarcation, a line of classification, and all classification is hierarchic (Bateson, 1972, p. 463).â€ The demarcation for an â€œeffective difference denotesâ€ itsÂ relationship- a line drawn within the classification. Relationships are the binding connections between differences.
Through a myriad of disparate cataloging standards, the digital data held within libraries, archives and museums is unfortunately rendered inoperable resulting in isolated collections stored within institutional networks. This problem is well documented. What these institutions all have in common however, is the foundation of collection systems built through basic descriptive differences and their relationships. A semantic ontological solution could bridge the inoperable divide that locks cultural heritage collections in their digital silos.
What I have suggested over past few weeks in my posts, is that catalogers are the key to access. After all, without our records, there wouldn’t be a catalog. We understand the delicate differences and relational structures that bind and define our collections. As technology advances, we cannot forget the importance of our role. We must continually develop new methods for achieving better organization and access for the ease of our users…because without users, what use would collections have?
From Our Guest Bloggers darwincore, dublincore, metadata, opensource, pbcore, standards, vracore
I’m not your typical cataloger working deep within the belly basement of a library pounding away at MARC records on OCLC Connexion. Like many recent MLIS graduates, I’ve found work outside the library- I work for a software firm.Â It’s not entirely out of scope, this firm designs the open source collection management software, CollectiveAccess. I won’t turn this into an advertisement for our system, but I will say that the most exciting feature is that you are never locked into a rigid metadata schema. When you download the software you can choose from a myriad of cataloging interfaces, like collection-specific OR standards-based, and even from there you can customize it further to meet your collection or institution specific needs.
For the past 6 months, I have developed all the standards-based interfaces and have become intimately familiar with these schemas. Now I have a bone to pick with what I would like to call the “Core.”
When you think of the core of something you visualize the essence, its center, the most basic elements of a thing. “Core” metadata schemas were developed to supposedly capture the most necessary information within a given subject area. Really? Let’s take a closer look:
The first and probably the most widely recognized “Core” metadata schema. With only 15 simple elements and a easily understandable qualifiers, DC makes for one flexible structure standard. What I like the most about DC is its rules: no elements are required, and repeat elements as necessary.
Some opponents say that it is too simple, then I say it’s not the schema for them. Dublin Core doesn’t have to solve the world’s metadata problems, but it is a great starting point to begin any form of description. Of all the “Cores.” DC really is a Core. Simple and to the point. Descriptive access at it’s most basic level.
VRA Core is the structure standard for the cultural heritage (primarily visual culture) community. With 18 elements at its simplest and 53 at its most complex, VRA Core 4.0 can be either basic or complex. It follows DC’s 1:1principle, while allowing for the historically interconnected structures of cultural heritage collections. However it confusingly prescribes the Collection / Work / Image relationship. How can these complex relationships be expressed through a 1:1 ratio?
PBCore was developed for the Public Broadcasting community to describe motion picture works. (It’s currently in Version 1.2; however the only way to view the update is through a graphic mapping available for download through the website. boo.) Version 1.1 has 53 elements arranged in 15 containers and 3 sub-containers, all organized under 4 content classes. Regardless of the differences between 1.1 and 1.2 – it contains over 50 elements! This hardly seems “core.” PBCore controlled vocabularies, called “pick-lists,” are ridiculously long and confusing. This schema could be really useful outside the public broadcasting community, but it needs to be edited and trimmed down to the most “core” information required to access these kinds of assets.
And finally we get to DarwinCore the Biodiversity “core” metadata schema. I don’t know what to say about this schema. It has so many elements in what seem to be a relational structure; however that structure isn’t clearly defined. How many identifiers can be packed into a schema? According to Darwin Core – 22. While I completely agree that biodiversity data requires a metadata schema, I think Darwin Core requires more testing. Its complexity is debilitating.
I could write pages about these schemas, and certainly many people have. But for now I just wanted to rant and rave for a few paragraphs and bring these “cores” to the center of our conversation for this week. Just remember as Prof. Block says, “Standards are like toothbrushes, everyone agrees theyâ€™re a good thing but nobody wants to use anyone elseâ€™s.â€
Until next time…
From Our Guest Bloggers a. billey, cataloging, emerging standards, metadata, Rick Block, Tim Bray
Hello catalogers, content strategists, information architects, knowledge organizers, metadata librarians, metadata specialists and all those who love and appreciate our kind of librarianship. December was a busy month and I didn’t post nearly as much as I should have, so the kind folks at Desk Set have invited me back for some March Metadata Madness! Over the coming weeks I will be discussing emerging standards, professional development, and perhaps a special interest or two. I invite you to send questions concerning cataloging, metadata, and all things technical services. But for now, let’s get back to basics.
Whether you call it cataloging or metadata, in principle it’s the same thing. We are generating and recording (whether automatically or manually) some kind of information about an asset, information package, item, whatever you want to call it…some thing in a collection. How the information is captured is all that separates metadata from traditional cataloging, and even that is a thin line. Both rely on structure standards, content standards, and value standards to create their syndetic structures- they just use different standards…and that’s ok.
Rick Block once described, “standards are like toothbrushes, everyone agrees theyâ€™re a good thing but nobody wants to use anyone elseâ€™s.” Is that such a bad thing? I used to think it it was. I once thought that to provide access to all the collections in the world, we would have to agree on a single standard and single method for interoperability. Well, that just isn’t practical. Experience has shown us that no one standard can capture the unique information required for all kinds of collections.
So then what is most important is continual creation of quality records based on the accepted standards of the time, and the needs of your collection as well as users. To fuel this development we need continual experimentation with new technologies that will enable us work toward descriptive independence and system interoperability. At the latest ASIS&T conference this past November in Vancouver the keynote speaker, Tim Bray, encouraged information professionals to experiment with emerging (open source) technologies to create innovative information systems for their users. He told us to “just do it” – that “…things have changed…you don’t need to know IT to create something useful anymore, you need to know your subject and users.” This is a very reassuring idea to subject specialists, I’m sure.
Bray also said, “The culture of online is epistolary…we are in a golden age of writing…a golden age of archiving and libraries.” If this is true, and I believe it is, what an exciting time to be a librarian! As digital data proliferates, it is our job to provide access to it – through any means necessary. No longer can we be boxed into 15 elements, MARC tags, or meta tags. What I’m describing here hasn’t been developed yet, and what excites me is that it will be our job as catalogers to develop these technologies of organization and access.