Code4Lib 2016: Philadelphia, PA

Presentations results

score candidates
356
  • Jordan Fields & Mark Noble

With modern discovery layers, libraries are finally able to integrate all of their content into a single search experience, but fluid discovery across content sources using a single interface is still lacking. Users are typically given one of two options: 1) bento box results where once a user chooses a path they must start their search over to switch content sources 2) federated searches that mash everything together in a muddled mess, hiding the most relevant resources.

Using linked data and smart algorithms, we will demonstrate how Marmot is enhancing the open-source Pika discovery layer to create an experience that combines the best of these scenarios. Users looking for books will have the opportunity to discover related digital content at every step as their search evolves. Users researching digitized historic photos will be offered books on the history of the places theyre looking at. This type of fully integrated discovery layer allows users to focus on their primary research objective while revealing previously hidden gems.

299
  • Shira Peltzman & Alice Prael

By now there is wide consensus within the field regarding the technical, practical, and policy requirements of maintaining digital information over time. But the reality is that digital preservation done right is hugely expensive, and few institutions have the resources necessary to meet the stringent requirements that the standards demand. Well cover several tools, strategies, and skills implemented both during and beyond our National Digital Stewardship Residencies to help you safeguard your digital content regardless of your institutions size or budget. In short, perfect is never going to happen, so theres no reason to wait. This talk will equip you to step up your digital preservation game no matter how far along your program is (even if its non-existent!).

284
  • Sebastian Hammer

Imagine a world where libraries are free to choose their own ideal combination of vendor-supplied services and open source or home-built custom code. Where traditional core functions like cataloging and circulation can be chosen independently, and can interoperate freely with learning management, scholarly communication, and all the things we haven't even thought of yet. It's a world where libraries and vendors challenge each other to innovate, to explore new opportunities for libraries to bring value, and to reduce costs.

To realize such a vision, we need to think differently about how library software and applications are built and communicate. It requires standard interfaces: Language-agnostic APIs. Simple yet powerful interfaces that support independent processes and tie loosely-coupled applications together into robust wholes.

There are many challenges. We need a lean, implementation-driven approach to defining our interfaces and data models, rather than sluggish, do-it-all designed-by committee specifications that are outdated before the first draft is complete. We need to challenge vendors and libraries alike to a dialog about best practices, and we need to build toolsets and platforms to lower the bar of entry, to make it easy for everyone to benefit and contribute.

We would like to discuss these questions and our work on this framework, as well as invite ideas and collaboration.

283
  • Nick Ruest, Ian Milligan, Jimmy Lin

The growth of digital sources since the advent of the World Wide Web in 1991, and the commencement of widespread web archiving in 1996, presents profound new opportunities for social and cultural opportunities. In simple terms, we cannot study the 1990s without web archives: they are both primary sources that reflect how people consume and understand media, as well as repositories that document the thoughts, opinions, and activities of millions of everyday people. These are a dream for social historians. For example, consider GeoCities, which grew to some thirty-eight million pages created by as many as seven million users during the fifteen years between 1994 and 2009. There are untold opportunities to understand the recent past, based on the voices of people who never before would have been included in a traditional historical record.

But wait, with all this opportunity comes challenges: large data, the need for interdisciplinary collaboration between historians who might have the questions but not the technical resources or knowledge to work with these sources, and basic questions around what a web archive is and how to access them.

Libraries and archives are perfectly positioned to work in this new emerging field that brings together historians, computer scientists, and information specialists. In our talk, we discuss the fruits of one collaboration that has emerged at York University and the University of Waterloo. Bringing together a librarian, a historian, a computer scientist, and an interdisciplinary team of undergraduate and graduate students, York has become a collaborative hub: using a combination of centralized and de-centralized infrastructure to run data analytics, store web archives, provide a publicly-facing portal (http://webarchives.ca/), and to collaborate using Slack, a research team has taken shape. Well discuss the challenges of working in an interdisciplinary environment, and give insights into how the team has been working through in-detail case studies of our work with http://webarchives.ca and the warcbase web analytics platform. The combination of computer scientists and humanists is not always a simple one, and York University Libraries provided the infrastructure, help, and leadership to make the team a success.

280
  • Mark A. Matienzo

As technologists in the cultural heritage domain, we are in a constant struggle between chaos and order with data for which we have some responsibility. In particular, our professional lives are exposed to that struggle through the form of spreadsheets. Some of us malign this as a common, unsatisfactory representation or an insufficient replacement for a database, citing it as the source of poorly controlled values. Others still may see spreadsheets as a manifestation of where our supposedly turnkey, integrated systems begin to fail us. Through a process of reflection, reeducation, and reconciliation, there is hope to redeem yourself from this problematic view. This talk will focus on presenting a studied defense of the value of spreadsheets from the perspective of the user, by considering them in their historical and contemporary contexts as both a program and data. Specifically, this combination gives end users -- who do not self-identify as developers -- a large amount of freedom to work with their data as domain experts. We will also investigate how, despite the increasing separation of tabular data from the program, this idea holds still using the emergent work of the W3C's CSV on the Web Working Group.

278
  • Andreas Orphanides

As the designers and implementers of complex systems (such as websites, discovery tools, and knowledge repositories), we have great -- if sometimes unrealized -- power. And, as Stan Lee says, with great power comes great responsibility. In this presentation we will explore three key lessons in the ethics of systems design, and examine the practical implications of these lessons in the design of your own systems. We'll investigate how design choices can (intentionally or unintentionally) influence user behavior, reveal organizational priorities, and exacerbate or ease conflicts between your interests and those of your users. And we'll discuss how to be mindful of these considerations during the design process in order to ensure that your systems more effectively reflect your values, address your users' needs, and allow both your users and your organization to find success.

270
  • Camille Salas, John Nelson, Sarah Knight and Will Boyd

National Public Radios Research, Archives, and Data Strategy (RAD) team (f/k/a the Library) has been working beyond the limits of traditional database structures -- as well as the traditional library label -- to upgrade our current archival workflow tool, reposition our department and foster new interdepartmental connections. As NPRs structure and content production evolve, the needs of our colleagues across many departments continue to grow. RAD embarked on a collaborative effort with our Digital Media colleagues to refactor NPRs archival database to address these changes and create a more flexible system for future requirements. The out-of-the-box database in use had been selected several years earlier as a means to archive and catalog NPRs broadcast content to fulfill FCC and grant reporting requirements, as well as to retrieve archival content. Users requested additional database services such as metadata reporting capabilities from internal production systems, and more metadata tagging for born-digital content, such as podcasts and blogs. We needed to explore new ways to utilize and deploy the metadata describing these stories. We chose an API-First approach in order to focus first on the data, and then to create a user interface aligned with NPRs other digital platforms. During our work on the refactor, the Library rebranded itself and became the RAD team. We hired our first dedicated developer to collaborate on our evolving archive and workflow needs, and to assist with the implementation of a new taxonomy management and tagging system. Our presentation will focus on the refactor project within the context of the rebranding and share lessons learned thus far about an API-First approach. For reference, our development stack includes: Amazon Web Services, Hypermedia API, ElasticSearch, Node.js and NoSQL.

263
  • Dinah Handel and Ashley Blewer

Much of our work as librarians and archivists is devoted to researching, planning, documenting, and implementing workflows based on our knowledge of best practices and locally defined needs. However, this documentation of workflows rarely leaves the institution it is created for. Instead, we share our processes at THATCamps, unconferences, and local and regional association conferences, and occasionally post them on personal or institutional websites. These exchanges are integral to the development of our field, and anecdotally, we hear from our colleagues that these encounters are some of the most useful and rewarding events to attend professionally.

Our presentation will consider how this sharing of workflows could be supported on a regular basis through the open-sourcing of documented workflows, software, and hardware. We will explore how open-sourcing AV digitization and digital preservation workflows, software, and advocating for open file formats and standards for audio visual archives has the potential to empower and build community among AV archivists. Further, we will look at how this might provide greater transparency and insight into how exactly materials are processed, and encourage collaboration among institutions, organizations, and archivists. We intend to ground our presentation in concrete examples from the field, such as the implementation of open source micro-service scripts for archival processing at a broadcasting station and work on the open source digital video file conformance checker software, MediaConch (a PREFORMA project).

256
  • Ekatarina (Eka) Grguric

Usability testing is often more of a black box process than it needs to be. The outcomes of testing can directly impact development, create new project directions, smooth out contentious issues, and act as a communication bridge between different stakeholder groups. Testing at all stages of a project can also help to prevent costly end-stage failures. Despite these benefits, usability testing is still often viewed as overly time consuming and resource intensive.

This talk will outline an approach to low-cost, fast-paced usability testing and provide strategies for communicating progress and value to stakeholders at all levels of the process. It will demonstrate the importance of guerrilla testing in a common project management workflow and provide examples of how to communicate process and results.

254
  • Demian Katz and Matthew Short

Dime novels were the predominate form of popular fiction in the United States from 1865 to 1915, yet for many years, they have been all but forgotten. Recent digitization efforts have begun to shed new light on this treasure trove of popular culture, now entirely within the public domain. However, fully describing these works offers some unique challenges due to their complex publication histories and tangled webs of authorship. Matthew Short of Northern Illinois University and Demian Katz of Villanova University have undertaken a project to expose and share data about this fascinating literature by combining the best of library cataloging practices, data collected by domain experts, and linked data techniques. This project has demonstrated the feasibility of small-scale linked data projects within existing applications as well as the power and benefits of a productive collaboration.

This talk will discuss lessons learned from the ongoing project and will feature:

  • A realistic assessment of linked data strengths and weaknesses
  • Strategies for taking advantage of linked data in legacy systems
  • A discussion of significant limitations in current bibliographic models (and possible solutions)
  • Awesome dime novel artwork and must-read titles (because every 21st century programmer needs some 19th century entertainment)
252
  • Ted Lawless

Libraries, archives, and museums have begun publishing Linked Open Data (LOD). Yet for many technical teams working in these organizations, the path towards implementing tools or services that both benefit users and utilize LOD remains elusive or out of reach.

This talk will walk the audience through identifying and creating a useful, lightweight "identity hub" of academic journals. It will cover interlinking multiple sources of data and publishing as LOD using a Linked Data Fragments (LDF) server, and embedding useful contextual information in existing web pages.

All data and code used to produce the "identity hub" will be shared. The principles of interlinking and publishing will be applicable to other types of data.

247
  • Marya Sawaf

Does the notion of keyword search and relevance that powers search engines really meet the needs of all searchers? What alternative advanced search features can be implemented to meet fuzzier search needs?

This talk discusses these questions by demonstrating the potential of a new search tool called "Brainforks" in everyday search, scholarly search and book search.

By building an interactive graph of query expansions, the user can take part in a search that is both "exploratory" and "creative".

"Exploratory" search removes the assumption that the searcher has a clearly defined search need and recreates the library browsing experience in your browser.

"Creative" or "serendipitous" search allows the searcher to find solutions from other fields that are functionally analogous, so ideas can be shared across disciplines.

Coded in Python, Brainforks makes use of many semantic tools and natural language processing libraries such as NLTK and data from Dbpedia, Wordnet, Faroo and Google Ngrams.

  • Becky Yoose

You feel like Sisyphus - no matter what you do today, the boulder rolls down the hill the next day. You're tired all the time. Rest becomes a thing you faintly remember while you continue to push the boulder up the hill. Soon enough, the boulder runs over you. You reached the point that many have reached before - burnout.

The technology field is rife with burnout stories and #libtech is no different. Burnout in #libtech, however, is compounded by the integration of key cultural aspects and expectations of the wider technology community and the wider library field. Meritocracies, community values and expectations, invisible labor, and "doing more with less" are only a few cultural aspects that play into #libtech burnout. While #libtech folks are publicly talking about burnout, how can we turn talk into action in our workplaces and community?

This talk will address burnout in the #libtech community: what it is, the causes, and what can be done to help others recover from and prevent burnout in our community. Coworkers, management, and #libtech community members all have actions they can take to address #libtech burnout. The sooner we can get our #libtech colleagues, and ourselves, out of the path of the boulder, the better for us all.

246
  • Christina Harlow

This presentation will discuss 'reconciliation' work, or the work of aligning your metadata with external datasets. Going through a number of possible reconciliation workflows and tools - from GUIs to scripting, using traditional authorities and linked open datasets, either capturing URIs, preferred form terms, or other information - I will show how metadata reconciliation is increasing in importance for metadata work and for creating data ecosystem coherence. I also will mention some of the issues, particularly of scalability, the relatively high accuracy rate needed for library metadata work, issues with false positives, and propagation of data changes over time. This work begins to frame some issues too around the concept of library 'authorities' and how it should change. Finally, I will present this work to show one way a wider range of technology and data skill sets can work together on a common workflow, thus increasing mutual understanding across functional silos.

  • Steven Anderson, Eben English (Boston Public Library)

Perhaps you've got lots of metadata at your institution -- some of it in relational databases, or collections of XML docs, or possibly even an assortment of spreadsheets. Maybe you've been hearing a lot about RDF, SPARQL, and the limitless utopian possibilities of Linked Data, and want to get in on the action. So how do you get your data out those stale old formats and serialized in RDF where it can run amok in the sunny green fields of the Semantic Web? This talk will break down the decisions you'll need to make and the pain points you'll encounter as you attempt to migrate your metadata from tables to triples, elements to predicates, and/or strings to URIs. We'll draw from our experiences as part of the Hydra Project's MODS and RDF Descriptive Metadata Subgroup, but this talk will be applicable to any schema or platform. Topics will include: choosing the right vocabulary and namespace for a particular data point; modeling complex XML hierarchies as a graph; how to understand what you can and can't do with a particular predicate's range; an overview of the status of various efforts to represent metadata schemas in RDF; dealing with the un-ordered-ness of RDF; and knowing when to compromise on data specificity so you don't drive yourself bonkers. Many of the above are rarely covered and documentation on them is fairly limited -- this talk is all about addressing topics that seem to have fallen through the cracks.

  • Katherine Lynch

As libraries continue to grow as hubs of knowledge and information sharing on campuses and in communities, the importance of web accessibility measures ensuring fair, equivalent access has risen in visibility.

Advanced best practices for making a variety of interfaces usable and understandable for a wide selection of assistive technologies will be detailed, including code samples and use cases. This talk will also outline how to conduct accessibility testing during agile development, including footage from an actual web accessibility test of an interface in development.

Operating with the knowledge that we as a community understand why web accessibility is important and are in agreement on the need for baseline accessibility measures, this talk will put advanced tools in the hands of all library professionals to continuously improve their own applications, and also work toward and advocate for better accessibility measures in all areas of Libraries, from their own Libraries website to vendor solutions and beyond.

243
  • Monica Maceli

The jobs.code4lib.org site contains years of usefully-structured data that gives us a unique view into who we are as a community. A holistic look at this data yields numerous insights what to teach, what to learn, where to focus our technical efforts, and the emergence (and decline) of technology trends. This talk will describe the data collected by the code4lib jobs site and the text analysis performed for research purposes, which has yielded multiple publications. The talk will conclude with an open call for possible future collaborations or applications of this data within the code4lib community.

240
  • John Mark Ockerbloom

The subject descriptions of well-cataloged library resources have rich semantics, but most online catalogs and discovery systems do not take full advantage of them, and the headings assigned by librarians do not always match the descriptions users expect. This session features ideas, demonstrations, and discussion on how we can improve the design and the data in our catalogs and discovery systems to improve discovery of relevant materials. It will focus on how to better take advantage of the kinds of data that catalogers already create.

Topics to be discussed include:

-- How to take advantage of subject heading order in summaries and in relevance ranking of user searches. -- How to use relationships between subject headings, both those explicitly stated in authority data, and those that can be automatically inferred from data, to improve browsing and search. -- How to exploit the respective strengths (and work around the respective weaknesses) of LCSH, FAST, and Wikipedia in subject searches.

234
  • Jason Ronallo

Everything that was once a desktop application has gone to the cloud and no one builds desktop applications anymore, right? Wrong. Companies like Github, Slack, and a number of startups are building desktop applications today. Why? I'll show you why desktop applications are still preferred sometimes, how you can use the web technologies you're already familiar with (HTML, CSS, & JavaScript) to build them, and some reasons why you will want to consider creating a desktop application even if you've always written them off in the past. We'll see how easy it is to get started and what's different about this model of programming like how processes can communicate. I'll tie this all back to examples of library and archives use cases for desktop applications.

  • Allison Jai O'Dell and Steven Duckworth

Do your finding aids look a bit dated? Perhaps you wrote that EAD-to-HTML XSLT a decade ago? Take advantage of contemporary development and design tools -- engage users and promote collections! This talk will discuss a finding aid makeover, using existing JavaScript libraries and CSS templates to create finding aids with interactive and responsive design, mobile-friendly image galleries, linked data, and patron request features. We will discuss development of, and user feedback on, The Fancy Finding Aid.

233
  • Mike Shallcross

In April 2014, the Bentley Historical Library received a $355,000 grant from the Andrew W. Mellon Foundation to partner with the University of Michigan Library and Artefactual Systems on the integration of ArchivesSpace, Archivematica, and DSpace in an end-to-end digital archives workflow. The project seeks to expedite the ingest, description, and overall curation of digital archives by facilitating (a) the review and characterization of newly acquired content, (b) the creation and reuse of descriptive and administrative metadata among emerging platforms and (c) the deposit of fully processed content into a digital preservation repository. This presentation will identify key project goals and outcomes and demonstrate features and functionality of Archivematica's new 'Appraisal and Arrangement' tab developed by Artefactual Systems.

231
  • Jim Hahn & Ben Ryckman

With beacon technology, real-time turn-by-turn directions and real-time recommendations in the print collection can be provided to a users mobile device. With the infrastructure and research trajectory developed for an augmented reality experiment ( http://journal.code4lib.org/articles/10881 ), researchers undertook an experimental project to incorporate Estimote beacons ( http://estimote.com/ ) into an Undergraduate Library collection so that students new to the environment can see the location of their mobile device within the library building, supporting wayfinding to items, and discovery of like items with location-based recommendations.

Presenters will demonstrate the distributed computing processes and workflows necessary to integrate beacons into collections-based wayfinding and walk through key components for the recommendation algorithm used for topic spaces in collections.

The experimental locationbased recommendation service is grounded in the advantages of collocation that support information discovery and are supplemented with existing ILS data -- e.g. sum total circulation of a particular item.

Presenters will demonstrate techniques and approaches utilized in developing improvements for beacon precision, enabling increased location granularity in library environments, along with security considerations for location based services.

228
  • Julie Swierczek

This presentation will cover the Open Archival Information System (OAIS) model for long-term digital preservation. It will also cover preferred file formats for long-term storage; a small detour through digital forensics, FRED machines, and retrocomputing; and why archivists cry themselves to sleep at night when the general public conflates archives with backup copies of data.

  • Paul Beaudoin

NYPL Labs and Zooniverse built Scribe, a highly configurable, open source framework for setting up community transcription projects around handwritten or OCR-resistant texts. Scribe suits digital humanities, library, and citizen science projects seeking to extract highly structured, normalizable data from a set of digitized materials (e.g. historical manuscripts, account ledgers, catalog cards, or maritime logbooks).

Scribe prototypes a certain way of thinking about community transcription. The app attempts to break complex identification & annotation flows into small, manageable tasks. By reducing the unit of work, we hope to reduce the cognitive barrier to entry as well as maximize the distribution of effort across multiple people.

The talk will identify the community transcription design patterns that informed the unique architecture of Scribe. March will be a great time to also discuss the successes and challenges of Emigrant City ( http://emigrantcity.nypl.org ), a project built with Scribe that launched in Nov 2015.

http://scribeproject.github.io http://emigrantcity.nypl.org

226
  • Edward M. Corrado, Carl Wilson, and Brett Currier

There are a number of issues to consider and steps to take before pushing out an Open Source Software project, whether it is a new software project or if it is an existing home-grown solution a library, archive, or museum is contemplating releasing as Open Source Software. These issues include, but are not limited to: 1) Reviewing the landscape to determine if there already is an existing project that meets these needs and if so, how does your project differ?; 2) Determining the perspective community of users and developers are; 3) Creating a plan that includes a roadmap and how you are going to judge success; 4) Reviewing organizational policy and issues (such as technology transfer rules) as well as legal and licensing issues; and 5) Exploring perspective funding models for long-term success. Considering these and other factors beforehand can help create a sustainable Open Source Software project.

This presentation comes, in part, out of a full-day workshop held at the 2015 International Conference on Digital Preservation (iPres) about the Roles & Responsibilities for Sustaining Open Source Platforms & Tools. During the workshop a discussion group was formed and explored what an organization needs to consider when releasing new or existing home-grown software as Open Source Software. A subset of this group, Edward M. Corrado (Associate Dean for Library Technology Planning and Policy at The University of Alabama and member of the JHOVE Product Board), Carl Wilson (Technical Lead for the Open Preservation Foundation, current lead developer of JHOVE, and a member of the veraPDF leadership team), and Brett Currier, J.D. (Director of Scholarly Communication at The University of Texas at Arlington), will prepare this presentation based on this workshop and their experience and further investigation.

223
  • David Naughton

Libraries websites often attempt to present a single interface to multiple, disparate search engines: vendor databases, institutional repositories, finding aids, discovery layers, search appliances, etc. How do we impose order on this chaos? How do we know whether our attempts to do so meet users' needs? Do we build our own bespoke portal UIs, using vendor APIs for each and every search engine, refusing to use any vendor UIs? Or do we just provide links, or maybe search forms, that redirect to the vendor UIs?

Janus presents a middle path, allowing libraries to present a single interface to multiple search engines, and to capture and analyze much of the significant use of those search engines, without completely replacing vendor UIs. Janus provides a simple URL API that encapsulates and abstracts away the complexities of vendor APIs. It supports Shibboleth and many other, more common, authentication tools, as well as common logging tools. Janus is implemented in Node.js for high performance and ease of testing of interaction with vendor web UIs, which often rely heavily on JavaScript. Janus has been in production for several months at the University of Minnesota Libraries, where we are preparing to release the code as open source before coe4lib 2016.

219
  • Matt Zumwalt

The 21st century needs infrastructure that allows networks of trust to emerge organically in the exchange of data. Everyone who makes decisions based on data needs to be able to ascertain the trustworthiness of the data they are consuming. Likewise, everyone who wants to influence decisions through data needs to be able to express their trustworthiness to those consumers. These are real, tangible needs that impact the bottom line of every organization in both the Public and the Private sectors. It cuts across almost all industries and fields. We who have chosen to create software for Libraries, Archives and Museums are uniquely equipped to create that infrastructure and maintain it because librarians, archivists and curators are experts at reading networks of trust. Where a marketing expert generates information in order to grab attention and manufacture affinity, a librarian is an expert at seeing information in context and offering ways to navigate through it or select from it based on the characteristics you see as valid or trustworthy.

The world needs this and we know how to build it. We have all the necessary tools and techniques. Its time to remind the world why they need librarians by showing them what we know about trustworthiness.

215
  • Susan Ley, Alyx Rossetti, Adam Cahan

As part of the continuous evolution of library search tools, the time for a revamp of the Getty Research Portal, an aggregated search index of digitized art history books across major art institutions, had arrived.

Drawing from the original application, written primarily with Java and Solr, we have reconstructed the Getty Research Portal 2.0 as an Angular application with an Elasticsearch server. By harnessing the power of Angular for a seamlessly responsive front end and Elasticsearch to provide a schema-less search engine, we have given the Portal refined search capabilities and drastically improved maintainability. We will walk through the benefits and challenges of building an application with these technologies and give the Code4Lib community a high-level understanding of metadata aggregation for libraries the Angular and Elasticsearch way.

213
  • Frances Webb and Jennifer Colt

When Cornell switched from Voyager to Blacklight as the main catalog interface, one of the requirements given to the development team was to create an authorities-augmented headings browse that would replace the Voyager headings browse. In this talk we'll discuss the work that went into both Solr and Blacklight development in order to create a good user experience for the browse functionality. (seen here: https://newcatalog.library.cornell.edu/browse )

212
  • Alejandro Paz, Kim Pham, Kirsta Stapelfeldt

To understand how Israeli digital news is being disseminated and used across a global digital newsscape, an anthropologist, the Department of Computer Science and the library's Digital Scholarship Unit at the University of Toronto teamed up to build MediaCAT, a web crawler and archive application suite. MediaCAT is an open-source Django application that uses the Newspaper API to, firstly, perform a crawl given a target list of referring sites against a set of source sites and/or keywords. Secondly, it monitors a set of twitter handles for the same set of sources and keywords. The result is a list of individual URLs or tweets with references, either mentions or hyperlinks, to one of the sources. This application allows for more efficient crawling that indexes and archives only matching URLs and tweets. These URLs and tweets are then captured using PhantomJS to store WARCs for in-depth analysis. The continuously updating data will be used to investigate the process of producing news for a global public sphere. As product manager, the Digital Scholarship Unit plays crucial role in the development process to ensure the application is designed in a responsible, sustainable manner to support future web archiving services provided by the library.

This talk will feature how weve come together work on the project, touching upon the different (and at times conflicting) needs and the resulting decisions that informed the design of the application. We will also discuss the unique problems we've encountered crawling due to the varying web structures of site domain, and we will touch on our workarounds.

This talk will feature a demo of the application, including: scoping the crawl, initiation and termination of site crawl processes, collection analysis (keyword and source site distribution, crawl statistics).

The code and documentation is being actively developed on Github (https://github.com/UTMediaCAT).

208
  • Matt Carruthers

The University of Michigan Library is exploring a framework for fostering new and original research in Special Collections which moves beyond traditional scholarship methods. We are currently utilizing existing Special Collections data from finding aids, along with openly available digital scholarship tools, to identify and visualize hidden connections and social networks among creators of our archival collections. This new information can serve as a novel and innovative resource to help guide further research with Special Collections materials.

In the future, we hope to build upon this framework in an innovative "library lab" model to support digital scholarship on campus, facilitate connections between researchers, and incubate projects, positioning Special Collections as an active collaborator in research using its materials.

In this presentation, we will discuss the methodology and tools we are using to surface hidden connections and social networks among individuals represented in our archival collections, as well as our goals for using this as a basis to develop a digital scholarship service.

  • Michael Gibney (University of Pennsylvania)

XML is widely used for data serialization, transfer, and processing. Unfortunately, because of its document-centric nature, XML is inconvenient to work with in contexts where content consists of multiple independent records, and where streaming and/or fault-tolerant processing are desired. In practice, records are usually grouped together arbitrarily into documents for transfer and processing; this approach often necessitates workarounds to approximate streaming and fault-tolerance, unnecessarily increasing the complexity and inefficiency of workflows.
xmlaminar defines and implements a command-line interface and SAX-based Java API, providing a flexible level of abstraction for managing the document-based nature of XML representations of data. xmlaminar is designed to make working with large amounts of XML more efficient, intuitive, scriptable, and fault-tolerant. Functionality includes:
1. Join multiple XML streams together
2. Split XML streams into smaller chunks
3. Combine 1-2 to resize xml documents by record count
4. Fault-tolerant, order-preserving parallel processing of arbitrarily large XML streams (taking advantage of multiple cores to speed processing; isolating and logging records that fail transformation)
5. Streaming XML representation of information from databases (including MARC->MARCXML)
6. Parameterized SQL queries to allow streaming retrieval (as XML) of records specified by arbitrarily large lists of record ids
7. Integration of multiple corresponding flat XML streams into a single hierarchical XML stream
8. Configurable modules for handling XML output (write to file, stdout, POST to URL, etc.)
9. Combine 1-8 in flexible, composable configurations to design arbitrarily complex (but transparent and efficient) streaming pipelines and workflows
We hope to present an overview of xmlaminar functionality, design challenges and considerations, and current and potential future development and use cases.

207
  • Matt Miller

The New York Public Library has used a number of classification systems to catalog materials over its 120 year history. These include a local system developed in 1899 by the library's first president, a number of fixed order schemas and dozens of various classmarks used in our research divisions. This talk will explore how we are using linked open data to build concordances between our local classification and external systems such as LCC. It will also look at how we are pushing the boundary of our classification by mapping to nontraditional classification systems such as Wikidata. (An early prototype can be seen here: http://billi.nypl.org)

205
  • Eric Hellman

The Library Freedom Project is inviting the library community libraries, vendors that serve libraries, and membership organizations to sign the Library Digital Privacy Pledge of 2015. For this first pledge, were focusing on the use of HTTPS to deliver library services and the information resources offered by libraries. Its just a first step: HTTPS is a privacy prerequisite, not a privacy solution. Building a culture of library digital privacy will not end with this 2015 pledge, but committing to this first modest step together will begin a process that wont turn back. We aim to gather momentum and raise awareness with this pledge; and will develop similar pledges in the future as appropriate to advance digital privacy practices for library patrons.

This talk will discuss HTTPS in theory and in practice, practical difficulties in implementing the pledge, and if time allows, will amusingly demonstrate exploits that HTTPS prevents.

204
  • Melissa Wallace and Jennifer Colt

In an effort to provide a single access point for our digital collections, we have created a new digital collections interface, which is built in Blacklight and pulls metadata from both a vendor API and our hydra repository. Well cover the close collaboration between UX designers and metadata librarians that made the site possible, and the process the two groups used to select core facets and fields that allow for searching across collections. We'll also talk about improvements to the user experience for our image collections, plus the reduced need for separate websites, which improves the discoverability of individual items, while maintaining the context and meaning of the collections.

202
  • Whitni Watkins

A Systems Librarian and a System Admin got together to teach programming for the non-programmer in the library. Learn about the challenges we faced: how we prepared, what worked, what didn't, and what would we change?

We will discuss the guidelines we used on how to plan and deliver coding workshops in a library setting and on a budget of $0. We developed a series of workshops to be taught over the academic year to faculty, students and staff who were interested in coding and but needed an open and neutral environment and some guidance to start and learn. We will also discuss more specific logistics like: how we determined who we needed on our team, delegating tasks and involving outside parties in planning and teaching, advertising strategies, and workshop topic selections.

197
  • Deborah Cane, Carrick Rogers, Paul Clough

The Avalon Media System is an open source system for managing and providing access to large collections of digital audio and video. The freely available system enables libraries and archives to easily curate, distribute and provide online access to their collections for purposes of teaching, learning and research.

For Avalon Media System 4.0, our team worked with Indiana University's UITS Assistive Technology and Accessibility Centers to identify improvements to Avalon's accessibility. We know that many development projects are starting to look closely at accessibility features, and we will discuss the priorities, scoping and design that went into our latest features a focus on keyboard functionality - as well as discuss our next steps for greater accessibility to Avalon for everyone.

  • Hui Zhang

This talk outlines a recent project at Oregon State University Library that uses a RDF-based approach to metadata crosswalk for semantic interoperability. The goal is migrating and transforming the metadata records of 60,000 digital works stored in DSpace from Dublin Core (DC) qualifier standard into triple statements for improvements in data sharing. RDF is used as a model to represent the contextual relationships implicitly embedded between described objects, such as an article with its associated datasets, a degree and its offering department, and an advisor to the graduate student. A ruby gem is developed to transform the DC records exported from DSpace into a series triple statements based on the defined RDF model with linked data paradigm. This talk will describe this transforming process with three primary use cases, which can be helpful for other institutes that are in the process of metadata enhancement or migration.

195
  • Lucas Mak

Michigan State University Libraries recently received a gift of over 680,000 music CD titles. Its sheer quantity posed an unprecedented challenge in providing access to this trove of commercially available music titles. After programmatic checking on existing holdings in local catalog and OCLC WorldCat for copy records, almost 60% of the items remained uncataloged and required original cataloging. Though the donor provided some Dublin Core-like metadata for each title, the brief MARC record derived from these data are far from ideal. In order to improve the quality of these brief records, crowdsourced music websites like Discog.com and MusicBrainz are tapped. This presentation will talk about how to enrich these records by harvesting metadata from these two sites through their APIs using XSLT, and capturing authorized form of names of artist by following external links recorded in artists profile pages. The speaker will also discuss limitations of this process and difficulties in reconciling data from these two sources.

192
  • Sean Aery & Cory Lown

Since 2011 Duke University Libraries has published digital collections in a home-grown Django application that provides a customized interface to collections, including featured images, and ways to search and browse the metadata for each collection (see http://library.duke.edu/digitalcollections/hleewaters/). In 2015 we began ingesting and publishing digital collections using the librarys Hydra + Blacklight based digital repository. With digital collections being published alongside items as diverse as datasets, GIS data, and meeting notes, we were challenged to find a way to customize the repository to provide meaningful and useful discovery and access to digital collections.

Among the unique customizations Duke has implemented in Blacklight are 1) attractive, configurable portal pages for collections, including highlighted items and facets fine-tuned for an optimal user experience; 2) IIIF-based image handling; 3) semantic URLs and 4) a local Bootstrap theme for a branded UI. (See our first digital collection published in our new Hydra + Blacklight based system: https://repository.lib.duke.edu/dc/wdukesons). Developers will share relevant user data and their experiences with Blacklight to accommodate the uniqueness of different digital collections within a shared platform.

191
  • Matt Zumwalt, Jon Stroop, Mike Giarlo

The Hydra Project has seen a flurry of activity around data models in the past year, resulting in a new set of Ruby gems that embody years of collaboration and iteration across multiple institutions. These new gems, built around the Portland Common Data Model (PCDM), allow Hydra adopters to represent their content's structure consistently while retaining full freedom to choose metadata and workflows that suit their needs.

The new gems are hydra-pcdm, hydra-works, curation_concerns and Sufia 7. Each of the gems in this list builds on the preceding ones, providing a higher level of abstraction to suit a different layer of needs. In this way, they function like a stack of optional components that also relies on existing Hydra components like active-fedora, hydra-head, and hydra-access-controls.

This presentation will explain how the new gems fit together and how they leverage PCDM and established conventions to achieve a potent balance between consistency and flexibility.

189
  • Michael Berkowski

Like many institutions, the University of Minnesota Libraries have sought a comprehensive view of electronic resource usage comprising database searches, electronic journal and electronic book access across internal applications and tens or hundreds of vendor platforms. More than raw access counts, we desire a view of those data which can be utilized by subject librarians to understand electronic resource usage patterns across academic units of interest to them.

We will discuss some of our solutions to distilling resource usage out of proxy logfiles and resolving identifiers (for example ISSN, ISBN, or DOI) in conjunction with Shibboleth user attributes identifying academic affiliations to build a relational database agnostic of any user interface or business intelligence tool we wish to attach to it. Perhaps more importantly, we will share what we've learned through this undertaking, the tiers of accuracy we have been able to achieve, and the challenges of balancing patron privacy with the data specificity desired by our subject librarians.

  • Nikitas Tampakis

Blacklight's Solr-powered search expands the traditional catalog from an inventory to a full-fledged discovery system. While the faceting and sorting features present the user with a powerful toolbox for finding content, they also expose improperly-formated and invalid data. This talk will go through some of the strategies utilized by the Princeton University Library to identify and isolate records with unusual publication dates, locations, and names. By using Blacklight to uncover dirty data, we can improve the quality of our catalog records and improve the search experience for our users.

186
  • Stephen Zweibel and Patrick Smyth

DH Box is a cloud-based platform designed to give teachers, students, and scholars ready access to a suite of tools that facilitate research and collaboration.

DH Box allows teachers to instantly deploy a computer lab in the cloud, obviating the need for repetitive installation and configuration tasks. By simplifying this process, DH Box allows users at a variety of skill levels to focus on learning and exploring with DH tools, rather than getting bogged down in setup. And because DH Box is compatible with many different devices and operating systems, students and teachers can access DH tools from their own favored devices, rather than from an assigned lab machine, reducing dependence on institutional resources that are often scarce or inaccessible.

We will discuss the utility of the DH Box platform in the library and academic context, focusing on the evolving role librarians can play in enabling the use of tools for data and text analysis.

184
  • Emily Lynema

At the James B. Hunt Jr. Library at NC State University, we have 5 large video walls that provide a canvas for content development and exploration. Although we have developed a significant content portfolio for these spaces over the past 3 years, much of this content is static in nature. One of our ongoing goals is to increase opportunities for library patrons to interact with the content on these displays. We have been experimenting with techniques ranging from simple QR codes to home-built motion sensors to Microsoft Kinects achieve these goals. This talk would explore a range of technologies and techniques that could be used to transform a static display space into an interactive information environment.

181
  • Katherine Deibel

Most of us do it. We post code to public repositories in the hopes that others may benefit from our efforts. To paraphrase a popular Kevin Costner movie, "If we share it, they will come [and use it]." The reality, however, is that sharing code does not guarantee use by others. As library technologists, we lack a reassuring mysterious voice. A large project failing to attract an audience is disappointing, but perhaps more egregious is when the shared code is meant to fix existing technology issues and needs. Such code might include some HTML/CSS to improve the UI of an online catalog; a JavaScript snippet to address an accessibility issue in an ILL service; a small code package to provide a needed feature that the vendor says is coming soon. These aren't large projects but their contributions can be significant, especially to libraries that lack the resources to develop their own code. Ensuring that these snippet solutions can be readily implemented involves addressing a myriad of issues: composability, customization, different skill levels, and organization. Drawing on research of technology adoption and direct experiences in contributing code for a shared ILS in a library alliance, I will discuss these issues and recommend some best practices to ensure that we build matters.

178
  • Piotr Hebal, Violeta Ilik, and Jason Stirnaman

When faced with a task of digitally displaying a large collection of works by one of dentistry's all-time greats, Greene Vardiman Black, we wanted to use the popular Sufia Rails engine from Project Hydra, but we also wanted features that Sufia lacked. By using Sufia 6 as a base, we opted for implementation of hierarchical, multi-page, IIIF-backed Collections that are viewed through the OpenSeadragon viewer. We achieved our goal while keeping the code-base easily maintainable and upgradeable. The first step was to allow Collections to contain other collections and to allow hierarchical structure among them.The second step involved the use of an existing EAD XML finding aid from which appropriate metadata was added to the corresponding Collections and GenericFiles models. This enabled us to load the GV Black collection while preserving its complete hierarchy. The resulting separated pages had an unwanted effect as they cluttered our catalog, the facets, and the word cloud. To mitigate the clutter, we created a new class called Page that inherits from GenericFile. We anticipate that some of our users will prefer to download whole documents and not just individual pages. For this reason, we created a two-way relationship between Collections and Pages containing combined PDFs. Riiif, being a Rails engine, was a natural choice for our IIIF service. We were able to make Riiif respect existing authorization rules and with Riiif fetching everything it needs from Blacklight, we avoid the additional slowness of interacting with Fedora. We then implemented our own IIIF presentation API for use with the viewer. Finally, since Blacklight includes OpenSeadragon it was an obvious choice as our IIIF viewer. The overall result is a discoverable collection with a slick presentation front-end where one can interact with the collections in two different ways: viewing high-resolution previewable TIFs or downloading multi-page downsampled PDFs.

174
  • Vicky Steeves, Nick Wolf

The Data Services model at many institutions follows a traditional software support, data reference/finding, and data management framework, usually using the reference interview as a platform to engage users about their data problems.

At NYU, we've expanded on this model. So much cool data comes through our doors from students and researchers alike, and we are sick of seeing it leave!! So, we've begun collecting and showcasing it using an existing CMS platform and integrating with other key areas of collection, such as our Spatial Data Repository.

We'll talk about the process of getting this up and running, our hosting platform and choices, policy/IP considerations, and integration with institutional repositories (use ALL the APIs!).

172
  • Gregory Wiedeman

Archivists have developed a consensus that forensic disk imaging is the most effective way to manage born-digital records and preserve contextual metadata. Yet, disk imaging also has the potential to preserve deleted records and other unexpected information posing a significant problem for the management of institutional records that are assigned to records retention schedules. This issue can be particularly hazardous for archives that are governed by public records laws.

I will discuss these issues and the development of a proof-of-concept python desktop application. This tool runs basic forensics tools to extract filesystem metadata we can get depending on administrator rights, add creator description, package files with checksums, and transfer them over network shares or FTP. While the metadata we can gather at accession now is limited, the evolution of forensics tools may provide better results in the future, and the knowledge of these tools can help us to better evaluate the benefits and drawbacks of disk imaging for institutional records.

171
  • Mark Noble & James Staub

Marmot Library Network and the Nashville Public Library both support discovery for libraries of various types (public, academic, and school). They also share the issues that arise from supporting such different audiences and needs under the same interface.

With this setup come a number of challenges including: - MARC records from a variety of sources (SkyRiver, OCLC, self cataloged) - Accounts and Records in different ILSs - eContent from a large number of sources (OverDrive, Hoopla, EBSCO, ebrary, SpringerLink, etc) - Patrons with multiple accounts for different libraries - Libraries that want their own look and feel to the catalog - Different relevancy needs

In all cases, end users expect a single discovery layer that hides all of this complexity from them.

Marmot, Nashville, and 5 other libraries have worked together on our open source Pika discovery layer to implement creative solutions for these issues. This session will talk about some of the unique features in Pika, some technical challenges we have faced, as well as how we work collaboratively to plan new functionality.

166
  • Luke Aeschleman

When you want to welcome fresh perspectives and diverse voices, first impressions make a difference. While the code4lib website and wiki have traditionally served as the focal point for code4libcon information, they can be confusing, even to long time community members. This year, organizers decided to overhaul how we present conference information. Our primary goal: make the experience more welcoming to first timers and long time community members alike. Similarly, if we wanted to convince sponsors that code4lib represents the latest and greatest in library technology, our site needed to look the part.

We wanted to relieve the kind OSU folks of the albatross we've become, but we also couldn't pay anyone to host the site. We decided that GitHub Pages and Jekyll, a static site generator, was the best solution as it would allow for community participation, would be performant, and would work with structured data files. We will demonstrate how anyone with a GitHub account can make changes, report bugs, and suggest enhancements to the conference site.

While Jekyll does democratize the process of site development, there are still technical barriers that could get in the way of proposal submissions and other user contributed content. How do you allow anyone to contribute content to a statically generated site in a performant way? We will also demonstrate our innovative approach to this challenge using Tabletop.js, and get you all up to speed on how to run a data-driven site on the cheap.

161
  • Vicky Steeves

An age-old (read: ten year) battle in digital preservation has been the struggle between migration and emulation. Which is better? How can we better both archive reproducible material and reproduce archival material for the future?

ReproZip is a potential solution! ReproZip is an open-source tool developed at NYU that packs a digital object, experiment, etc. along with all the necessary data files, libraries, environment variables, etc. necessary to reproduce it, throwing it into a .rpz compressed file. That .rpz file can be used in ReproUnzip, the flip-side program that anybody can use to then reproduce the experiment without making the user track down and install the dependencies or even having to run the same operating system!

While the majority of the use cases for ReproZip have come out of quantitative research, there has been a push into library science when it comes to making archival snapshots of databases and LIS systems, as well as using ReproZip to reproduce/recreate library research and digital materials.

In this talk, I'll explore the potential of ReproZip in a library setting, particularly a digital preservation setting. I'll discuss how the packing and unpacking of environments, software, and experiments can contribute to a greater potential of reproducibility through digital libraries.

Website: https://vida-nyu.github.io/reprozip/ Github: https://github.com/ViDA-NYU/reprozip

159
  • Michael Tedeschi

College Women (http://www.collegewomen.org) provides access to digital versions of letters, diaries, scrapbooks and photographs documenting the first generations of women students attending the northeastern colleges once known as the Seven Sisters, the womens college counterparts to the all-male Ivy League schools. These seven institutions educated many of the most privileged, ambitious, socially-conscious, and intellectually-committed women in the country during the nineteenth and early twentieth centuries, and sent their graduates into path-breaking careers in philanthropy, public service, education, and the arts.

Through this project, our collective project team explored the complexities of building a rich collection of materials and objects from these institutions. Our panel will discuss the process of developing a successful grant; the technical considerations of building this type of project; our concerns and solutions to metadata importing; and the process of successfully managing a project across a range of geographically diverse groups. We will demonstrate the newly launched project to the group during this session.

157
  • Erin Holmes

Teaching non-technical library staff how to manage EZproxy configurations can be difficult. This presentation will discuss a free, open source tool that provides a web interface for library staff to edit resource configurations and easily export them into one or several EZproxy configuration files. This presentation will demonstrate how the interface works from the user perspective and also discuss how the tool is implemented on the proxy server and in the MySQL database.

156
  • Gary Thompson

If every researcher acquired and ORCID and used it with every publication, the name disambiguation problem would resolve itself over time. The UCLA Library built and deployed a Drupal module that allows UCLA faculty members to acquire an ORCID, and to link it to the university ID. The link between those two identifiers will simplify registering publications and the CDL Open Access repository, and provide a list of publications for career advancement in a new campusfFaculty information system called Opus.

After a brief contextual overview, this talk will focus on the ORCID.ORG API calls and the OAuth implementation in Drupal.

155
  • Ian Lamb

What happens to the data that's produced by a scientific study after all the articles are published? All too often, that data languishes under researchers' desks, never again to see the light of day -- on personal thumb drives, external hard drives, or even Zip drives. But recently, the research community has been seeing how beneficial the sharing and reuse of their data can be.

This presentation will describe how an academic medical library created a web-based search interface specifically geared towards research datasets. Using the Symfony PHP framework, the Apache Solr search engine, and a lot of metadata expertise from our librarians, we created a fast and functional search engine that can help researchers reduce costs and prevent duplicated work by providing information about existing datasets, and easy ways to contact the data's owners. The NYUHSL Data Catalog (https://datacatalog.med.nyu.edu) is a faceted, fully-featured search tool, and also includes an administrative interface so librarians can add, edit or delete items in the index themselves without requesting help from the development team.

With RDF output in the form of JSON-LD and an open source version coming to GitHub, it is hoped that this tool (or at least the metadata in it) will be used and reused by researchers and institutions the world over.

154
  • Harish Nayak

Awardee of an IMLS Sparks! Ignition Grant, the Library Application for Study Space Engagement will deliver timely, usable data of the occupancy of students' favorite study spaces in the University of Rochester Libraries. It will do so via a native mobile application leveraging just our existing wi-fi infrastructure and web-friendly data visualization techniques. This talk will review decisions on project milestones, interpreting wi-fi data, architecting a native mobile application on a lean project cycle, and actualizing the assessment implications of this newly unearthed spatial data.

153
  • Andy Weidner & Sean Watkins

Since 2009, the University of Houston (UH) Libraries have digitized thousands of rare and unique items and made them available for research through its UH Digital Library (UHDL) based on CONTENTdm. Six years later, the need for a digital asset management system (DAMS) that can facilitate large scale digitization, provide innovative features for users, and offer more efficient workflows for librarians and staff has emerged. To address these needs, the UH Libraries formed the DAMS Task Force in the summer of 2014. The groups goal was to identify a system that can support the growing expectations of the UHDL.

This presentation will focus on the DAMS evaluation that the task force completed. The evaluation process consisted of an environmental scan of possible DAMS to test, the creation of criteria to narrow the list of DAMS down for in-depth testing, and the comprehensive testing of the DSpace and Fedora systems. The presentation will conclude with a discussion of the task forces results as well as the lessons learned from the research and evaluation process. It will also reflect on the important role that collaboration, project management, and strategic planning played in this team-based approach to DAMS selection.

152
  • Adrian Turner

Since 2014, the California Digital Library (CDL) has been piloting the use of a metadata harvesting infrastructure to aggregate unique collections from across the 10-campus University of California library system -- and libraries, archives, and museums throughout the state. These collections are now available through the newly redesigned Calisphere (http://calisphere.cdlib.org/) website in addition to the Digital Public Library of America (DPLA). Our relatively new metadata harvesting infrastructure, which adapts DPLA's early code base, has made it easier for contributors to share collections. We're now able to aggregate a much larger range of resources than before: 400,000 objects are now available in Calisphere -- an immediate 70% increase in content from the previous site. However, there are challenges to scaling and streamlining our processes, from the point of staging collections for harvest, through to quality control checking results. This talk will highlight where we've been with metadata aggregation, and where we're planning to go. We'll discuss points of pain and lessons learned with the existing infrastructure. We'll also report on new requirements that we've developed, and directions that we are planning to take to improve on and ramp up our processes.

149
  • John LaDue

In 2011, the University of Pittsburgh Health Sciences Library System (HSLS) was licensing an expensive Electronic Resource Management System (ERMS) that communicated poorly with our Integrated Library System (ILS), provided no individual title detail or usage statistics for non-journal e-resources, and did not offer an intuitive user interface. With the ERMS license due for renewal, we had four options: (1) continue with the current product and hope for improvements, (2) move to a new licensed tool, (3) adapt an open-source tool, or (4) build our own ERM from scratch. Knowing that the first option was never really an option and how much work the fourth option would be, we had high hopes for options two and three. Unfortunately, none of the proprietary or open-source tools we examined in our environmental scan met the librarys needs.

This talk will introduce HERMIT, the HSLS Electronic Resource Management and Information Tool, focusing on five key stages in the development process: Gathering needs requirements through Initial surveys and interviews with potential stakeholders and users Adapting the Digital Library Federations Electronic Resource Management Initiative to meet our needs Building the Web-based product using CakePHP Testing and launch in 2012 Ongoing maintenance and additional features

In addition to sharing our experiences developing a complex library tool, this talk will also demonstrate HERMITs features including: Individual title detail Near real-time data sharing with our ILS Usage statistics Vendor and contact data Reporting

147
  • Jeffrey Spies

Openness is inclusivity, and open data fosters innovation. But access to scholarly (meta)data is exclusive, and innovation is stifled. As long as this data is closed and/or programmatically inaccessible, only the copyright owners and/or groups that purchase/license the rights can create tools using that data. For example, no one can build a competitive open source Google Scholar. Theses and dissertations cannot produce the next Scopus or Web of Science. Further, scholars outside of these groups cannot conduct meta-science/scholarship. Scholars cannot use their own methods to study themselves--they have no data. SHARE looks to change this.

SHARE--a collaboration between ARL, AAU, APLU, and the Center for Open Science--is building a free, open data set about research and scholarly activities across their life cycle. Built on an entirely free, open source software stack, SHARE is collecting, connecting, and enhancing metadata that describes research activities and outputs--from data management plans and grant proposals to preprints, journal articles, and data repository deposits.

This presentation will cover the goals, challenges, and software stack of SHARE, culminating in an invitation to get involved. There is plenty of work to do, and SHARE is a community-based project. There are no competitors, only collaborators. Anyone can contribute, no matter their technical skill level. Openness, after all, is inclusivity.

144
  • Bruce Washburn

How do we ascertain truth on the Web? That's a question being pursued by researchers at Google who have articulated a flow of data that generates discrete statements of fact from countless Web sources, relates those statements to previously assembled stores of knowledge, and fuses them mathematically to identify which statements may be more "truthful" than others. They describe this assembly of scored statements as a "Knowledge Vault." As OCLC works with data from library, archive and museum sources, we may benefit by taking a closer look at the Google Knowledge Vault idea to see how it applies to a vault of library knowledge. In this discussion we will describe how OCLC is: - extracting simple statements about entities and their relationships from bibliographic and authority records, - establishing a relevant score for similar statements provided by different sources, - viewing the Library Knowledge Vault data using a prototype application, and - testing how statements contributed by users of that prototype can find their way back to the Vault.

140
  • Weiwei Shi

University of Alberta Libraries is currently building a centralized Digital Assets Management System (DAMS) to consolidate various digital platforms and repositories we have developed over the years, including a 7-year-old institutional repository based on Fedora 3. The new system is based on Fedora4/Hydra/Sufia stack. Phase 1 of the project was to focus on the implementation of institutional repository functionality and the migration of digital objects from the existing system. The new system was launched successfully in Oct 2015, and inevitably with a few stumbles and tumbles along the way. In this presentation we will have an overview of the steps we took to plan and execute the content and service migration, the expected and unexpected challenges we faced during the application deployment and the migrations, and the valuable lessons we have learnt.

139
  • Dan Gillean

Access to Memory (AtoM) is an open-source application for standards-based description and access in a multilingual, multi-repository environment. Built around the concept of easy-to-use templates based on international and national content standards, AtoM was originally developed for use in an archival context. However, the flexible nature of the application makes it easy to adapt and use in a library special collections context without further development. This session will discuss the history and mission of the AtoM project and introduce users to some of AtoM's powerful features, including built-in crosswalking between standards templates, user interface label and menu customization, digital object support, digital rights management, and integration with the open-source digital preservation system Archivematica.

138
  • Dan Kerchner, Laura Wrubel, Justin Littman

GW Libraries developed and released Social Feed Manager (SFM) in 2012 to help faculty and students study collections of tweets. SFM has been a useful workhorse in providing this new service to researchers and archivists at GW and beyond, but over three years of practical and sometimes painful experience in this area led us to rethink our approach. Well describe how were rebuilding the app from the ground up to create a more archivist- and researcher-friendly, sustainable service which is easier to deploy and allows users to define their own social media collection strategies without mediation. Its new modular design scales up to harvest from several platforms including Twitter, Flickr, Tumblr, Sina Weibo, and potentially more platforms and at greater volume. Well discuss our approaches to using process scheduling, queueing, containerized deployment, export and delivery, using the WARC format to store social media content, using messaging to connect multiple harvesters, along with our specification for writing new harvesters to integrate with SFM. Cultural heritage and research institutions now recognize the value of building collections of social media content for research and archival purposes, and we hope SFM might provide a solid foundation for doing so at your institution.

130
  • Amy Jiang

We are currently at a time where hardware manufacturing cost has been dramatically reduced. The Arduino, RasperryPi has placed limitless possibilities for programmer. This session will focus on how library identifies applications which is very costly but might be a good candidate to develop in house using RasperryPi. The speaker will talk about the digital display system that we are using right now based off RasperryPi at the cost of under $100 serves the same purpose and function as the ones company could charge at least $1000. The exciting part of this case study is to show that we are at a revolutionary time for making products in house. The speaker will talk about the possibility of developing low cost collaborative Table, touch screen interactive screen, etc that might eventually make the library an innovation center for the university and for the local community.

121
  • Nathan Rogers

Read Evaluate Print Loop (REPL) toolchains are a powerful way to develop and debug applications of any complexity. PsySH lets you bring this power to your PHP environment and is even compatible with Drupal. In this talk I will doing a brief overview of how to set it up, walking through some simple examples, and then demonstrating how it can change the way in which we look at the development process.

109
  • Jason Fleming

At the beginning of the Summer of 2015 we received funding for 2 standing kiosks and 2 touch monitors for the purpose of creating a building directory for the library. We designed an interface to give students, faculty, and guests an easy way to find people, books, and library spaces in a friendly unmediated manner. In addition we created a way to survey users about topics on which we would like their input, and a Quiz that highlighted our resources and provided prizes for participants.The library is open 24 hours and the library is not staffed overnight. To create the interface we created a Drupal View and worked with campus to set up windows seven PCs to create a kiosk experience that was safe for us and responsive for the users.

108
  • Patrice-Andre Prud'homme

From the compilation of this exemplary collection of short poems of the Ogura Hyakunin Isshu in 1235 to its interpretation in ceramics exhibited at the Tokyo American Club in 1981, this presentation will examine the interactive digital transformation of a collection of one hundred pieces of pottery displayed using a combination of HTML, PHP, CSS3, and Javascript using the jQuery library. Along with these existing technologies, Object2VR was used to produce a programmatically generated interactive display from the multiple generated images.

Through his work, the potter Mitsuya Niiyama reveals the nature of Japanese sensitivity. The purpose of this presentation is three-fold: 1) Demonstrate the collaborative work of the digitization process across areas of interest inside and outside an academic library, 2) Explore the innovative process in the production of an interactive presentation by getting the most out of existing tools and by integrating other types of tools such as Object2VR and 3) Find solutions in the making of a website when the balance between visual appeal and practical design is key to render information accessible in a variety of formats.

103
  • Scott Fradkin

How can we lower the barriers to teaching programming to kids and adults? Lets use music! By using Sonic Pi software on a Raspberry Pi, Mac, or Windows computer, kids and adults alike can learn how to program by creating their own music. Well take a short tour through Sonic Pi and see how easy it is to make some great tunes while learning real programming concepts.

101
  • Eric Weig

My talk will outline the creation of a PHP viewer that works within the Blacklight discovery platform and displays harvested hypertext newspapers using an XML structure designed to hold the harvested HTML as CDATA. The viewer will also be demonstrated.

78
  • Janelle Varin

This report looks at three treatises on orchestration that were written around the turn of the twentieth century, and draw conclusions on the content based on text mining and analysis. It then compares the results and determine possible reasons for differences and similarities. The question being answered is what effect, if any, does date of publication and authors main area of employment have on topics discussed in a treatise on orchestration? To answer this question, I analyzed three texts gathered from the Internet Archive using the online software Voyant Tools.