Code4Lib 2008: Portland, OR

Proposal Election Test results

score candidates

I will discuss MARCThing, a self-contained web service which aims to do for MARC and Z39.50 what Solr did for searching. MARCThing can run off a thumb drive, but is powerful enough to handle the needs of a large site like It was designed to free developers from the complexity of MARC and Z39.50 and their idiosyncrasies in the real world.


The Open Library project ( is a collaboration between publishers, libraries, booklovers, and technologists to create a wiki with a page for every book. So far, we've been parsing Library of Congress records, ONIX feeds, Amazon pages, and more. And all the code and data used along the way is completely open. We'll discuss the project's goals, what we've built so far, and how you can help.

At code4lib 2007, Casey Durfee demonstrated a Django/Solr interface to library resources. It was then adapted by Dan Scott for use as a FACeted BACKup OPAC (FBO), and is now the OPAC for Paul Smith's College, courtesy of Mike Beccaria. Plans are underway to extend FBO/Helios as a discovery layer and a testbed for OPAC design. I'd like to talk about the state of the project, the choices made, and possible future directions.

See for more info.

CouchDB has gained buzz in the last year as an ad-hoc, schema-free, web-friendly data store. Slapstick hilarity results when a self- confessed relational database bigot experiments with CouchDB and reports on the good, the bad, and the meh.

Translation into serious-ese: I will introduce CouchDB, show how one or more applications interact with it, and share any "wows" or "gotchas" that I ran into.

Using a CSS framework can allow you to speed up your development time, normalize your code base, and avoid some common browser bugs. In this talk I will discuss when it is appropriate to use a framework, potential pitfalls of common frameworks, how the Yahoo User Interface (YUI) Grids system has been implemented in the default installation of the VuFind software, and demonstrate the creation of a 3 column CSS layout from scratch in under 5 minutes.

Open source Wayback Machine 1.4 (Feb08) will conclude a year of substantial evolution of the tool. We will describe the new app framework designed to facilitate flexible configuration, customization, and integration with other applications and workflows, to extend file and repository support, and to eliminate dependencies on client-side javascript. We will also review added/enhanced replay modes and pluggable components, the multi-tier, exclusion management system, and improved performance, scalability and extensibility of the app.


Last year I spoke about my research and initial investigations of building a "Next Generation Catalog" using XML technologies coined as the MyResearch Portal. The software has since progressed into an open source project known as VuFind. In this presentation I will talk about architecture and design decisions that were made to turn VuFind into a viable open source project and what future plans are in store, as well as how making the project open source has aided the project (and put me into project leader overtime).

David Walker, from California State University, will show a prototype that uses the new OCLC Grid Services WorldCat API. The presentation will detail the newly released WorldCat API, examining its strengths and weaknesses. The prototype will include some examples of how libraries might integrate the API with their local systems in order to build a custom WorldCat interface designed specifically for their users.


I'll give a brief introduction to Erlang[1], and the features that distinguish it from more commonly used languages. Time permitting, I'll demonstrate concurrency and hot-code-updates, and a concurrent MARC reader and writer. The basic theme will be "why you should consider adding this language to your toolbox."


The Ümlaut is an open source OpenURL middleware layer intended to improve the link resolving chain by analyzing incoming citations and intelligently querying resources to better enable access to them. The Ümlaut takes multiple approaches for locating items such as conference proceedings, preprints, postprints and gray literature utilizing search engines, Amazon, social citation managers and more. By utilizing the Worldcat registry, it is also personalizable and geospatially aware of available collections allowing access to all resources available to the user beyond the subscriptions of the Ümlaut's host institution.

This talk will cover the history, architecture and community of this project.

We've recently hacked an API for the NYPL Digital Gallery to share images with the video collaboration platform, Kaltura. This could be your library's dream, or nightmare, depending where you sit.

Is there a sweet spot between offering lightweight APIs - with possibly limited reliability - vs. trying to develop a bullet-proof API? Is the possible solution to seed the API to interested parties through feeds, with the implied expectation that it's a work in progress?


As distributing metadata creation to authors, catalogers and other system users who may or may not have an understanding of XML becomes more and more important it has become imperative to develop reliable methods for creating and editing XML documents.

This presentation will focus on the two MODS editors developed by UVM and Brown Universities and introduce the XForms technology as a means for digital libraries to create and manage complex metadata.

This presentation introduces a new open source, web-based cataloging application, started for the 2007 Google Summer of Code and currently developed at LibLime. It provides a full featured, customizable, fast application for original and copy cataloging. It uses the ExtJS user interface toolkit, Google Gears for local storage of bibliographic records, PazPar2 for searching multiple Z39.50 servers, and it will feature an integrated Jabber client for exchanging records.

Git is a distributed revision control system created in 2005 and is most notably used by the Linux kernel project. In mid-2007, Git was adopted by the Koha open source ILS project, replacing CVS.

I will discuss Git's distributed repository model and the Koha developers' experience adjusting to it, then end with some speculation about how decentralized information exchange applies to library metadata by playing with the metaphor of LC as a central CVS repository.

We will present an overview of the Digital Asset Management System, a locally developed digital repository designed to store and manage any digital asset (images, documents, video, etc.). DAMS is an expression of our XDRE (eXtensible Digital Resource Environment) framework, an RDF, Solr, JSON, SRB, ARK, and Java based development platform. DAMS consumes and produces XML as a web service and uses XSLT and CSS to produce any kind of output (HTML, OAI, RSS, CSV).


I've been doing a lot of work around extracting as much meaning as possible from Marc records, using RDF as the data structure. Marc is a record-centric format, the result is anything but, with the RDF allowing rich relationships to form in the data that can then be used to drive different navigation techniques. This is about making the most of what is in Marc data to do something new.

Using OAI-PMH metadata harvesting, the Solr indexer and an SRU external interface, the UNT Libraries developed a system to provide search access to over a million records from cultural heritage institutions in Texas. An IMLS National Leadership Grant funded the creation of a federated search portal located at the Texas State Library which uses this service. This presentation discusses the decisions made in the development of the OAI-PMH harvesting system and addresses challenges faced in the development of this service.


Digital Archive Services (DASe) is an open source PHP5/(MySQL|PostgreSQL|SQLite|XML) asset repository and web services engine. DASe is built using REST/ROA principles and is lightweight, embeddable, federation-capable, and highly scalable. DASe leverages the Atom/AtomPub protocols and (proposed) OpenSearch 1.1 and RDFa standards for maximum interoperability and extensibility.

The current installation at UT Austin includes over 4 million pieces of metadata about 300K+ audio, video, images, and document files. Public beta release is scheduled for Dec 2007.

Folk love our new single interface for requesting books via WorldCat from our catalog (III) and consortial systems: BLC Virtual Catalog (URSA), Borrow Direct (URSA), InRhode (INNReach), and Interlibrary Loan (ILLiad). We used php, java, and django to build a dozen web-APIs coordinated via mysql and a python script. I'll show our SOA architecture that allows consortial members to use components of our (recently open-sourced) code, and describe lessons-learned.

As libraries continue their move to offering resources online, the issue of categorizing and organizing information continues to be important. In particular, the "findability" of resources depends on accurate and standardized machine readable markings. Creative Commons has been a leader in encouraging users to mark their work with metadata indicating the license status. This talk will discuss the CC Rights Expression Langauge (ccREL) as a concrete example of co-locating machine and human readable markup. Colocated markup lowers the threshold effort required for content creators to add machine readable information. Buzz words: microformats, semantic web, RDF, RDFa, Creative Commons :).

Code4lib is a successful brand in the library world, but with its continued growth it is time to consider the future of Code4Lib. This presentation will discuss options Code4lib has for the future, including becoming a non-profit 501(3)c corporation, joining with another organization, or keeping everything as it is. The implications of forming an organizational structure will be discussed and ample time will be giving to solicit input from conference attendees.


Synapse is a new Django-powered web application designed to provide research institutions with the ability to collect and manage bibliographic reference data of publications written by the institution's researchers. Uses include creating lists of significant work to bolster grant applications, generating CVs, and keeping up with the latest work of one's fellow researchers.

I'll cover the challenges we faced in importing bulk data in mixed character sets, parsing author names, establishing relationships between canonical lists of employees and mixed representations of author credits in various publications, and how Django enabled us to meet an amazingly short deadline for a working, well-built web application.

The Synapse web application will launch at in early January 2008.

Representatives from the Center for History and New Media will introduce Zotero, a free and open source extension for Firefox that allows you to collect, organize and archive your research materials. After a brief demo and explanation, we will discuss best practices for making your projects "Zotero ready" and other opportunities to integrate with your digital projects through the Zotero API.

Libraries need a simple solution for sharing and publishing collections on the web. Omeka can help. Open source, robust, and easy to install, Omeka gives cultural and academic institutions the means to publish archived content into beautiful, customizable web sites and exhibits.

We'll show you how Omeka works, and how to extend it with plugins and custom themes. Finally, we'll explore the possibilities for migrating and publishing existing collections from other management systems using Omeka.

Few tools or systems outside of libraries understand MARC. So why is MARC the lingua franca for electronic ordering and batch loading of purchase orders from vendors? Is this a function that libraries will continue to expect from other bibliographic formats, say ONIX? This session will discuss some ideas about how we can move forward and build bridges between our acquisitions systems and other financial systems.

This talk proposes to introduce the DCMI/RDA task group, formed to analyze the relationship of RDA to other metadata communities and to examine the modeling of library metadata. Recent DCMI developments will be discussed, including the DC Abstract Model, Application Profiles, RDF declarations of metadata element sets and value vocabularies, the Singapore Framework, and the emerging concept of description set profiling. It is hoped that this will help to foster collaboration between code4lib and DCMI.


The DLF ILS Discovery Interface Task Force was charged with creating a technical proposal that would provide standardized integration between integrated library systems and external applications, better enabling libraries to replace their OPAC with an external discovery system. This talk would provide background for the project and an overview of the recommendation (hopefully published by code4lib 2008), as well as address how the library developer community can contribute to API implementation.

More information and current work on the recommendation are available:

A breakout session could facilitate further feedback and discussion of next steps.


I would discuss the challenges in developing WKAR's on-demand media repository (to launch in January), developed in PHP with a MySQL backend, which employs PBCore metadata, exposes RDF metadata, and offers an OAI-PMH service. I would also address the opportunities for collaboration between public broadcasters, libraries and technologists, including opening up locally-produced media in a structured, searchable fashion, and making it accessible for both researchers and the public.

Presentation on the development of a LOCKSS plugin for harvesting CONTENTdm collections into a Private LOCKSS network. Discussion around the plugin development as well as the issues of related to preservation of the archival assets vs the presentation format, and the need to effectively save not only the digital objects but the associated metadata in a useful format.

During a 48 hour programming contest we leveraged Ruby on Rails to build, an avatar approach to searching the internet for specific content. You drop your fish into the internet "ocean", and then periodically check on what sites it has visited. You can tune your fish's DNA by supplying custom Ruby code to help its site selection process. I'll demonstrate a few fish; including a fish that looks for libraries at .edu sites, as well as a fish you can train using Bayesian logic. The possible applications are endless.

Google- and Microsoft-funded mass scanning projects provide access to vast, heterogeneous collections of text, but fail to meet the needs of specific research communities. The Biodiversity Heritage Library (BHL) uses semantic tools developed with collaborators in the bioinformatics community to provide interfaces to legacy print literature specifically tailored to its intended audience of scientists and taxonomists. These "taxonomically intelligent" algorithms and services create access points into a domain-specific digital library unavailable by traditional cataloging means.

Example: This page shows a 'discovered bibliography' where the scientific name Festuca arundinacea (tall fescue, a common grass used in hay production) occurs throughout digitized content in BHL. Session would step through process & tools required to go from a scanned page to extracting scientific names using Natural Language Processing-based web services, plus the services that BHL support to further distribute this information.

WMS is an integrated digital object workflow utility aimed at providing a front-end platform for the open source repository packages, such as Fedora, as well as creating a vehicle for flexible and extensible metadata architectures. WMS is designed as a functional module container, the core modules of which are digital resource handling utility with configurable digital file handling capabilities, and metadata cataloging utility with a flexible schema builder. Other modules include metadata schema mapping, batch import/export, authentication/authorization, and collection/project/user management.

Using Ruby on Rails, we built a web page publishing system. The system allows librarians with minimal technical expertise to create dynamic pages that integrate Web 2.0 features with traditional library content. Students use the pages to connect quickly to selected library resources. We'll discuss why we chose a custom solution, give an overview of our agile development processes, and then look under the hood.

Over the last 8 months, several members of the code4lib community have been hard at work creating a new journal from the ground up. The first issue will be released in mid-December, and work on upcoming issues is well underway.

This presentation will cover the customizations we made to WordPress to use it as a CMS for The Code4Lib Journal, as well as the various tools used for coordinating and organizing the editorial process.

See also:

Tod Olson and I can show folks around technical underpinnings of the University of Chicago implementation of AquaBrowser, soon to be made public. We'll be hacking AquaBrowser as a platform, but we'll keep it generic with a focus on handling integration of LCC refine facets, overlaying other data on MARC records, and relevancy ranking in the context of mixed content. We'll probably also go into FRBR-ization techniques and display considerations using builtin automated algorithms, and we'll have some good statistics to share about usage of next-gen interface techniques, and about patron participation in the catalog using social tools in the academic library context.

We will describe three Ruby on Rails plugins developed as part of the DLF Aquifer project:

acts_as_xml extends ActiveRecord functionality to a table with a column containing XML, e.g. MODS or DC. The plugin binds Ruby methods and classes to XML elements and attributes similar to Java's XMLBeans.

acts_as_solr_xml, like act_as_solr, maps the XML column above to a SOLR index.

act_as_sru adds SRU/SRW server-side capabilities to a RoR application.

Faceted navigation, which is an increasingly common feature of library OPACS, was initially developed to browse hierarchical data. MARC data however, has relatively little hierarchy, and user-generated tags have even less. The flatness of this data, makes the navigation of search result-sets cumbersome and often ineffective. BiblioCommons has been tracking academic research and industry best-practices in this realm, and experimenting with different methods of adding structure to these datasets. This session will share learning to date.


I would like to speak on Agile Project Management and how it can be used effective within library development teams to produce better code in less time. Concrete examples will be given from some of our recent project here at Emory.

An introduction to the Metadata Registry, an open source vocabulary, metadata schema, and DC application profile manager and registry. The Registry provides a bridge between the XML an RDF worlds, providing its output in XML Schema and SKOS/OWL, as well as providing managed namespace services, URI design, permanent URLs with content negotiation, support for multi-user ontology design, change history and version management tools.

Whether tags are used for discovery, retrieval, or personal notes, user involved tagging is a popular component in web 2.0 content. If you're like the University of Michigan Library, you don't have this kind of tag associated with most of your data, but you would like to. I propose to discuss the theory, tools, and resources (perl, Wikipedia, Lucene, Infomap, Technorati) we are investigating to seed the tagging system we will deploy in early 2008.


I propose to describe how MySQL tables interact with images, census records, digitized newspaper articles, and XML records to record the history of the Pullman town and company, 1881-1950. Perl code is used to create dynamically-built pages as well as editing (CRUD) screens for data manipulation. (

Or in hai-ku, even shorter:

"Pullman is my obsession; See how I did it with LAMP, Perl, and XML."

Technology conferences tend to lack in diversity - diversity in gender, and diversity in relation to people of color and underrepresented cultures. While the overall problem of increasing diversity within the larger technology sector is an issue larger than code4lib, we have the unique opportunity to lead other technology and library/technology conferences in supporting, promoting, and encouraging a community of inclusion. As code4lib is a conference, a website, a chatroom, a listserv, etc., our ability to pull together around the topics of coding, libraries, and technology also can be leveraged to pull together in expanding our community out in ways that are inclusive to all peoples.

This session will focus on how we can hack the community in ways that further code4lib's ability to promote diversity - it will be part presentation, part proposal, part brainstorming, and all-inclusive. Follow-up from the outcomes of this session will take place via the other code4lib communication tools.

Ajax and other dynamic scripting languages have created opportunities to engage the library’s users as active contributors and annotators, while keeping these users in the flow of their existing activity streams (browsing the catalog, e-resources; renewing items already check out, etc).

But with this potential often comes the temptation to create bloated, slow interfaces that drive many users crazy – and are just plain unusable to others (whether on dial-up, or disabled).

New approaches and environments are emerging. This session will share lessons learned from documented best practices of some of the largest interactive sites – as well as our own trial and error.


The shelf directional map is an interactive map display in an online public access catalog for library patrons. After the patron has initiated a search and selected a particular book or other library resource, the patron is given an option to view a dynamic map for the chosen resource based on the combination of its call number, location code and format. See examples. The session will discuss the strategy and development of this map system. The codes will be opened to the attendees.

LISInfo (, presented by Brett Bonfield, will be a comprehensive catalog of things related to library and information science (e.g., associations, blogs, conferences, jobs, mailing lists).

It's not a wiki; it will have a controlled vocabulary and its records will be created by editors familiar with the field of librarianship. This open source, open data, Django-Solr hybrid with a faceted public interface is being developed by Brett Bonfield and Gabriel Sean Farrell.

The California Digital Library is currently developing the Web Archiving Service (WAS) to enable librarians to capture, curate and preserve web-based information. The Web Archiving Service is a Java application with a Ruby-on-Rails front-end and employs a number of open-source web archiving tools.

CDL proposes to demonstrate the Web Archiving Service in progress. Particular emphasis will be placed on:

• Particular strengths of the Web Archiving Service, such as site change analysis.

• Design decisions driven by feedback from the library community.

• Challenges and opportunities offered by the technologies used.

Further information about the Web Archiving Service and the NDIIPP Web-at-Risk grant is available at:


It's true that RSS is the information lifeline of the truly geeky library technologist, but we get perturbed when our favorite sites fail to meet our RSS addiction. Enter Feed43. I'll detail how to manipulate the patterns and parameters in a variety of different sites with this service to make the Web your RSS oyster.

Why does your catalog is FULL of fixed resources, really? One of your new gig a rails request comes in with openurl resolver it will deter any federal agents. Do people use Plugins? When java started using assert as a general discussion of ruby/sru and rubyforge around these parts? This would solve the problem then take her to do a lot of what these are truly awesome band names page dates back to my place in Second Life. A markov chain that can introduce bugs.


Various recent reports (Pew, Forrester and Nielsen) have documented the tremendous participation gap that is evolving with the emergence of social web. Participation in the social web is far from universal. Moreover, the demographic profile of those who do participate is by no means representative of the internet user population at large. Yet this minority of users who are active contributors have had a significant, and growing, impact on how our collective attention is distributed – and ultimately in determing that which gets seen, read and heard. This session will explore why this divide matters (so much), and why libraries – collectively – are uniquely positioned to engage so many new voices.

2 the recently launched Community and Resource Portal, for Asian Studies for 20+ Universities, Think-tanks and Research Institutes in the Nordic Region. Building on the semantic capability of Talis Platform storage we are bridging the relationships between our community members and the knowledge they are using and producing. Using/creating OS tools, graphical relationship browser, and APIs to link Institutions, Research Events, People, Books, etc., integrating with Facebook, Yahoo and Amazon Webservices.