This talk will introduce a number of natural language processing techniques and their applications in computational linguistics and machine learning. Attention will be given to data preparation and modeling building, as well as the statistical and theoretical underpinnings of many current techniques.
Examples will be derived from experiments using DPLA metadata as a document corpus. Techniques discussed will include clustering with Latent Dirichlet Allocations, feature vector generation using continuous bag of words (CBOW), and semantic encoding vectors built with neural networks like Doc2Vec. Additionally, classification, clustering, and recommendation techniques using the output of these models will be examined. Current research in these areas will be explored, including application to a variety of common text analysis problems.
The presentation will conclude with a demonstration of attempts to use Twitter profiles and recent tweets to build a search vector for querying DPLA according to vector cosine similarity and nearest neighbor algorithms.
When we don’t negotiate, what do we lose? There’s a systemic reluctance to negotiate for a higher salary or benefits package especially as a woman in tech, but this is not confined to just women. There is a pay gap; helping to close it requires negotiating with employers to compensate us for our actual worth. If we do not negotiate and we undersell ourselves, we allow that wage gap to remain.
Salary negotiation is intimidating and difficult to navigate if you don’t have the right tools and knowledge to do so. You need to know how to assess your value, identify target salaries, ask the right way, respond to red flags, and evaluate a total benefits package. Negotiating takes having a strategy in place and knowing tactics to be successful. Of course, success isn’t always the outcome and you may choose to walk away from a job offer. Knowing your breaking point is also important.
We’ve had our fair share of successes and failures in past negotiations at a variety of institution types (public and academic libraries, state government, and private corporations). This talk will present what we’ve learned and share tools and tactics that helped us along the way.
Participatory User Experience Design with Underrepresented Populations: A Model for Disciplined Empathy
At Code4Lib 2014, Sumana Harihareswara delivered a keynote titled, “UX Is A Social Justice Issue.” Harihareswara encouraged attendees to practice a disciplined empathy in the work of designing library services, because “a better user experience is the best force multiplier we have at our command.”
This inspirational keynote motivated a new initiative at the Montana State University Library: User Experience with Underrepresented Populations (UXUP). The goal of the UXUP project is to create better user experiences for all users in our community—from the mainstream to the margin. To achieve this goal, we followed the methodology of Participatory Design, which has its roots in Scandinavian industrial design of the 1980s. Participatory Design begins with the idea that the user and the designer each possess skills and perspectives of equal worth. This equity is realized through a practice of collaborative power-sharing and decision-making that deeply connects the user with the design outcome. For the UXUP project, our library worked with our Native American community to develop a participatory design practice anchored by a Native American student group empowered to make design decisions.
This talk will overview our design methods, including website usability and emotional response assessment, journey mapping, interviews, and the student working group. We will also describe our design outcomes, including outreach programming, website enhancements, and physical space improvements. The UXUP project can serve as a model for empathetic and collaborative design, with the ultimate outcome of creating more inclusive library experiences for all users.
Many information professionals deal with impostor phenomenon (IP): an observed anxiety caused by one’s feelings of fraudulence, fear of being exposed as a fraud, and inability to internalize personal achievement. These feelings negatively impact our work, potential for future projects, and perception of our self-value. Feelings of IP extend into conference participation and attendance in which members of our community believe that they do not possess the skills to attend or benefit from conferences such as Code4Lib.
Current research on IP within library technology positions is scarce. This leaves our community with some questions: 1) how is IP affecting professional development and involvement? 2) Is IP influencing Code4Lib attendance and participation?
Through a survey, this study will measure 1) levels of IP within the Code4Lib and information professionals’ community, and 2) how IP influences community members’ perception of Code4Lib. Based on survey responses and research, this presentation will offer solutions as to how the Code4Lib community can help combat and prevent IP. These solutions can be applied to individuals of all skill and experience levels to promote confidence, professional development, and community involvement.
As a developer or librarian in library technology, you probably ask yourself questions like “What do I need the system to do?” all the time. How would things change, if you rephrased the question to be user oriented like "What are the problems I want library software to help solve for my users?”
Whether you are buying software or services or developing applications, changing the conversation about requirements from features sets to a focus on end-user problems can open the door to more creative and innovative solutions and build better collaborations with your developer colleagues, your library staff, and your service providers.
For example, when you think about APIs and integrations, providing a basic understanding of the problems you are trying to solve for the end-user creates the opportunity to review the multiple possible solution paths outside the context of existing system designs and capabilities.
This talk will focus on strategies and methods that you can use to reframe the conversations about library software to foster increased understanding and innovation within the library software ecosystem.
The DevOps Handbook is a guide to re-organizing your workplace for greater speed in producing new ideas, more stability in key systems, and increased satisfaction among employees and customers. If technology should be at the centre of libraries, here is one way to make that real. Development becomes Public Services, Operations becomes Systems, and Products become Services. Public Services, traditionally handled by librarians and library technicians, is merged with the system admins, web developers, and programmers usually in Library IT department. By combining forces, the process of turning ideas into reality becomes streamlined. When getting things done takes less time, assessment is that much more meaningful, allowing for continuous improvement of services. With IT and librarians working more on the same page, priorities become shared, trust between departments improves, and the patron’s experience becomes better. Can this magical world exist in real life?
Although no one would argue that web accessibility is unimportant, the current state of library search catalogs indicates otherwise. Many existing products barely, if at all, pass the most basic of accessibility standards. Even then, an accessible site may not be efficiently usable for a disabled user. We certainly need to do better, but what would better (or even best) look like? Is displaying results in a list or table better for screen readers? Should we present more results via pagination, infinite scrolling, or a mix of the two?
Answering such questions is not immediately straightforward. Users of different [dis]abilities have different priorities on what makes a site both accessible and usable for them. We must also support the needs of other stakeholder groups beyond the catalog users. The site should not hinder library staff, including web designers customizing the look-and-feel and librarians working directly with patrons, from readily completing their duties. Thus, any design decision should clearly articulate the reasoning and tradeoffs among the competing priorities. Creating a "gold standard" of accessibility for library search catalogs is certainly a difficult task but is not insurmountable. This talk will begin this conversation. Through code examples, I will discuss the needs, priorities, and complexities behind designing for the diverse understanding of accessibility necessary for building the most accessible catalog results page ever.
We've heard a lot about natural language tools and interfaces over the past several years--Google's Translate, IBM's Watson, Apple's Siri--but what's the true state of the art once we've parsed the fine print and extracted the key ideas?
Takeaways from this talk:
• Natural language understanding remains a monumental challenge because computers typically lack the situational context and world knowledge necessary to make sense of writing and speech. • Despite the hype about big data driven machine learning techniques, these challenges have not been meaningfully addressed by new technologies. • There are, however, a handful of interesting (and even useful) tasks that NLP technology is currently capable of tackling effectively.
In the Digital Library Initiatives department at NCSU Libraries', we often have new professionals, student workers, as well as full-time staff who are interested in advancing their software development skills. Through colleague-led opportunities like an informal discussion series related to software development, reading groups (both code and literature), code reviews, and a low barrier department-wide scrum-like process, we have been seeking to foster a greater culture of peer mentorship, team building and sharing/learning opportunities. We’ve also extended this beyond our department, offering technical mentorship opportunities to NCSU Libraries’ Fellows, new librarians who have assignments in a broad range of departments. This includes group discussion, peer to peer consultation, and compiling lists of technical resources to assist them in learning skills that will help them in their future professional careers. This talk will describe what we’ve done, what we’ve learned, how it has affected the general culture of our department and what we are thinking of for the future to sustain these types of programs.
Over the years libraries have searched for software solutions to satisfy their needs. With the advent of open-source solutions suddenly we have great flexibility in the customizations that we can do to existing software products to fit our needs. But, have we gone too far? Are we making organizational decisions based on existing software tools rather than evaluating our needs and building solutions around them?
Building your own solution is not cheap, but neither is adapting an existing open-source solution. There are many advantages when building your own tool: your organization gets exactly what it needs, knows the code inside out, the architecture that supports it, and the organizational decisions behind it. Software maintenance on a known codebase is simpler than on a codebase that you inherited. What about the complexity of the code? A tool that does one specific thing for your organization is simpler than one that does many things that you might not need.
This session will cover pros and cons when deciding whether to build our own tool vs adopting an existing one, as well as the long-term considerations of creating and maintaining software systems that drive critical areas of our organizations.
This presentation will show how the worldwide surge of work on distributed technologies like the InterPlanetary File System (IPFS) opens the door to a flourishing of community-oriented librarianship in the digital age. The centralized internet, and the rise of cloud services, has forced libraries to act as information silos that compete with other silos to be the place where content and metadata get stored. We will look at how decentralized technologies allow libraries to break this pattern and resume their missions of providing discovery, access and preservation services on top of content that exists in multiple places.
Making Your Library IT Defensible: 5 Easy Things To Prevent 85% of All Targeted Cyber Intrusions
All libraries can easily follow the top 5 mitigation strategies. These will block the vast majority off all attacks. use application whitelisting to help prevent malicious software and unapproved programs from running patch applications such as Java, PDF viewers, Flash, web browsers and Microsoft Office patch operating system vulnerabilities restrict administrative privileges to operating systems and applications based on user duties. User application configuration hardening We’ll cover the cheap/free tools you can run on any OS in the library to make things safer for everyone.
This talk will introduce the concept of Coordinated Discovery (CD) at the University of Wisconsin-Madison Libraries. CD aims to provide a coherent experience across resource types—bibliographic, digital collections, article, etc.—while at the same time optimizing the experience of any one type.
Discovery across resource types is coordinated rather than aggregated in the sense that we do not provide a single global search as exemplified by the super-index products, e.g. Primo, Summon, EDS, nor do we aggregate results from difference sources on a single page bento box style. In CD, patrons begin their searches against the index for a single resource type, e.g. digital collections, viewing search result sets and full record or object displays tuned to the features and behaviors of that type. Behind the scenes a suggester service evaluates search parameters and propagates searches using local and vendor APIs to indexes for other appropriate resource types. When results from these sidebar searches satisfy closeness-of-fit rules, we display in “ad space” sidebars links to these possibly interesting resources. Transitions from searching one resource type to another are smoothed by forwarding the search to another resource “bucket”, along the lines of Google’s search interface.
This talk will discuss our motivations and philosophy for Coordinated Discovery and the technology and development we have invested in to support it.
The ubiquity of Google Drive solves many problems (file sharing, web publishing, bulk editing) that are cumbersome to build in a home-grown application.
This presentation will describe the Google Apps Script platform and the API’s available to the platform. Citing sample applications built for the Georgetown University Library, this presentation will describe the various ways that your custom code can be deployed for a library audience (formula function, embedded in a document, web service, document add-on).
You want to build a blazingly fast, secure and feature-rich website. Perhaps it needs to include dynamic features like search, responsive design and easy editing options for content creators. A static website might not be the first tool that comes to mind to accomplish this task, but it might actually be exactly what you are looking for. Static sites are fast, incredibly secure, and have a low technical barrier to deployment. Current static site generators, such as Jekyll, automate the painful aspects of static website creation and ongoing maintenance. All of which greatly reduce ongoing costs of long term maintenance of the site.
This talk will focus on the migration of a Rails website to a static website that was done to lower the technical debt of ongoing maintenance, while at the same time continuing and enhancing support for feature-rich aspects of the website such as search and content creation. The resulting website is responsive, simple to deploy, inexpensive to maintain, provides a simple solution for ongoing content creation.
This presentation will be an overview of the current state of the BagIt ecosystem. BagIt is an important part of transferring and storing digital assets. The format has several open source libraries and tools that can be used for creating BagIt archives. Despite this many of the tools are not fully featured and some have had features removed over time. In this presentation we will talk about how we have modified the Ruby and Python implementations in order to get functionality that we require and our experiences testing existing libraries. We'll also discuss some possibilities for future development of bagging tools and the implications of using open-source software for digital preservation.
HTML5 has become the preferred, widely supported method of serving video on the web. This is a tools talk...but not an Amazon sales pitch. I was genuinely surprised at how easy, yet flexible ET is - and how well it dovetails with Amazon's other services and I think other people will be just as surprised.
Over the last year, NYPL has been focused on transcoding decades/terabytes worth of archival quality video into high-quality HTML 5 accessible video. Using AWS has allowed us to focus on making the media and its technical and intellectual metadata available without having to own the responsibility of running and monitoring a fleet of transcoding machines.
Metadata is frequently cited as the most significant cost of a digital library project. As such, care should be taken to ensure that when metadata is created, it is done in such a way that its quality will not be low enough to be a liability to current and future library applications. Agile software development uses the concept of "technical debt" to place metrics on the ongoing costs associated with low-quality code, and determine resource allocation for strategies to improve it.
This talk presents the technical debt metaphor in the context of metadata management, identifies common themes across the qualitative and quantitative literature related to technical debt, and connects them to similar themes in the literature on metadata quality assessment. It presents a set of metrics for determining metadata management debts and taking steps to pay them down. Finally, it concludes with areas of future research in the area of technical debt and metadata management, and ways in which the metaphor may be integrated into other current avenues of metadata research.
Intentionally Horrible Markup: Strategies for Testing and Enhancing Web Accessibility on a Larger Scale
As more institutions seek to improve their user experience through advancement of web accessibility of interfaces and content, the need emerges for a set of patterns to serve as a framework for accomplishing this. Focusing on strategies for improving web accessibility on a larger, distributed scale, this talk will detail models for continuous accessibility testing and validation by developers, content editors, and site auditors in production and development sites.
Methods for judiciously evaluating software frameworks for their conformance and adaptability to web accessibility measures will be detailed, as well as techniques for selecting and incorporating tools and methodologies into accessibility enhancement efforts, such as automated validators, test-driven tools for software development, and workflows for accessibility testing with end users.
This talk will address the need for multiple approaches to accessibility testing at different levels of creation and enhancement of content and software for the web, and will provide suggested strategies for improving accessibility in development, remediating it in production, and crafting a stellar set of tools that all members of an organization can use to create software, interfaces, and content that can truly help to craft an open, fair, and equivalent user experience for all.
Apps on a platform is a growing paradigm -- iOS and Android, Salesforce’s Force, and Google’s G Suite are examples. FOLIO is an app platform created for the needs of libraries. With built-in support for descriptive metadata, patrons, and other library-related content types, FOLIO provides a toolkit for app developers to build new functionality atop traditional data sources. FOLIO is also a user-experience-centered project; the user interface is designed and tested first using a prototyping tool and then the code is created to match. The platform is providing libraries an avenue for thinking "outside the monolithic box" for new ways to provide services to patrons.
Participants will learn how the user interface prototype informs code development and how data in the domain models are expressed within the RESTful microservices. Participants will also learn about the roadmap for the platform and app development through to the minimum viable product expected in 2018. Lastly, participants will hear about the community of users engaging in FOLIO development and how to join in.
Like the weather, everybody talks about Docker, but few do anything with it. To be useful, Docker needs to be part of an integrated integration and deployment workflow. This session will explain how we selected tools and environments to make Docker work for us.
This talk will describe the design and features of the new, totally re-written version of SolrMarc, an indexing tool for creating Solr add documents from MARC records. SolrMarc has been used by both Project Blacklight and VuFind to create the Solr index that they rely on for searching. The new version provides a more powerful and more configurable index specification language that is backwards-compatible with the previous version but adds significant new capabilities such as conditional qualifiers, specification modifiers, multiple translation maps and support for dynamically-compiled custom indexing methods. It can work with different versions of Solr without needing to be re-compiled, and is easier to install, configure and upgrade.
After completing a reorganization of our stacks, the Georgetown University Library needs to perform an inventory of 900,000 books within our stacks. This project needs to be completed before migrating our catalog to a new ILS (integrated library system) vendor in 2017.
The inventory tool options provided by our ILS vendor were ruled out due to cost and complexity. After learning about a similar project at the University of Dayton Library, we decided to build our own application to support this effort.
The Georgetown University Library has developed an application that allows student workers with a barcode scanner and a Chromebook to rapidly move through the stacks and scan each item. The tool queries the ILS database to verify the item's status and location, while allowing the user to verify the Title, Call Number and Volume of each item. The tool will also highlight call number sorting errors with each scan.
This presentation will describe the solution that we have developed and the challenges that were overcome during the project. The code (and a video demonstration of the project) are available at https://github.com/Georgetown-University-Libraries/BarcodeInventory.
Privacy-protection technology tools help individuals to thwart surveillance and data collection efforts by corporate, government, or other entities. Prior research findings indicate that the average Internet user is highly concerned with privacy and loss of control over the collection and use of their information. However, most users take little action to protect their privacy, either by making behavioral changes or through use of privacy-enhancing tools. In this complex landscape, information professionals play a vital role in conveying technical topics and advising their users and patrons as to what privacy-protection tools to employ. This talk will present the preliminary findings of a research study aimed at understanding information professionals' use and understanding of privacy-protection technology tools, as related to their mental models of how the Internet works. This talk will also provide an opportunity for the code4lib community to provide feedback on and potentially contribute to ongoing research in this area.
This talk will present on work at UW-Madison to enhance the library catalog with info cards for the authors of bibliographic works using Linked Open Data sources. At the heart of this work is BibCard (https://github.com/UW-Madison-Library/bibcard), a Ruby gem that serves as a reference implementation for working with bibliographic and Linked Data. Included in this presentation will be a discussion of working with RDF, graph structures and SPARQL in a web application environment. Discussion will include issues of speed, availability and reliability of Linked Open Data sets and the implications for suitability with core library services and possible solutions, such as caching strategies, to dealing with them. The goal of this work is to explore ways to provide greater context for library resources from data sets on the Web that are not traditionally curated by libraries.
Open data in government is an initiative designed to ensure that the public can freely access government data and easily share the data with other citizens. Future mandates will include transparency of both government funded research as well as federal spending. Our panel will facilitate a conversation about the current state of open data in government and highlight several case studies of current initiatives at FDA’s Center for Device and Radiological Health. These initiatives include but are not limited to a data catalog, a research project database that includes current and past projects and their budget data, and our collaboration on the Data Curation Network project.
Additionally, we will assess techniques that worked and those that didn’t work, and processes that can be used to bridge the gap between high level regulations and local level work in other agencies and libraries.
Finally, we will discuss the possible implications of how the upcoming administration will prioritize the initiative going forward.
So you’ve hired a talented team, developed a shared vision, and implemented a bunch of user-centered tools and services. Now, how do you keep it going? Following up on Sibyl Schaefer’s 2015 talk “Designing and Leading a Kick A** Tech Team,” we’ll discuss how our small team of archivists provides ongoing technical leadership and expertise to our organization, an independent archive. Informed by sustained engagement with users, we strategically deploy low-cost, loosely coupled and easily maintainable (and abandon-able) tools that solve specific user needs and build political capital. Using staff turnover as an opportunity, we look for ways to build confidence and competence with technology across the organization, resulting in shared operational commitments and collaborative project management. Through informal horizontal skill-sharing, we value the existing expertise and labor of staff, break silos of tech knowledge, foster ethical computational thinking, and have fun experimenting, learning and teaching together.
Over several weeks, our online catalog was hit by millions of ISBN searches from thousands of IP addresses, with several hundred thousand requests/day at the peak of the attack. I'll talk about how we tried, failed, and ultimately succeeded in blocking this attack while minimizing the impact on our legitimate users.
More than just deciding what to buy, modern collection development for libraries occupies an increasingly complex space, a space structured by forces like predatory vendor pricing, institutional pressures on research and teaching, the high cost of physical storage, and the migration from print to digital media. As we strive to navigate this space with collaborative, transparent, and data-savvy decisions about our collections, we need tools and processes adequate to the task. This presentation addresses this need by way of a question: What can collection development learn from software development?
You’ll probably like this one, too: Using circulation data to automate recommendations in a special collections library
Book recommendation systems are increasingly common, from Amazon to public library interfaces. However, in the land of archives and special collections, such automated assistance has been rare. This is partly due to the the complexity of descriptions -- EADs describing whole collections -- and partly due to the complexity of the collections themselves -- what is this collection “about”, and how is it related to another collection?
The American Philosophical Society Library is using circulation data collected through Aeon to automate recommendations. In our system, recommendations are offered in two ways: based on interests (“You’re interested in X, other people interested in X looked at these collections”) and on specific requests (“You’ve looked at finding aid Y, other people who looked at finding aid Y also looked that these finding aids”).
We will discuss the development of this system, the central role of patron privacy, possibilities for generalizing the project for other environments, and future plans. We will also discuss ongoing concerns and issues. For example, do these recommendations increase the use of already highly used collections at the expense of less well-known resources? What nuance are we missing by focusing on the collection-level data, and what alternatives could be developed?
Honing in on the needs of your patrons is crucial for libraries now and in the future. Academic libraries in particular are under increased scrutiny to demonstrate how the library, its collections and services contribute to student success.
To ensure that the work you do adds value to your library services, it’s important to identify the problem you are trying to solve first. This upfront investment in understanding the needs of the end-user result in solutions that are useful and used, providing enviable metrics. So how do you do that?
In this session, OCLC will share experiences for engaging with end users, verifying problem statements, and testing whether the proposed application will solve the problem or not. Addressing end user needs is a team effort and we will discuss high-level roles and responsibilities that you’ll need to cover to build a dedicated team.
Institutional repositories are at the mercy of would-be depositors although this is somewhat mitigated by institutional requirements and workflows. Even so, some locally generated content is poorly represented or missed altogether. At Los Alamos National Laboratory we have policies and workflows intended to populate the IR. A complimentary effort, Autoload, attempts to locate and harvest items that were missed. Autoload starts with basic metadata and in some cases, a version of the publication, and uses the APIs of several public services in an effort to locate and retrieve copies of the material that are appropriate for archiving. Autoload contends with several challenges: 1) can the content be archived locally, 2) can we find an archivable version of the publication, 3) how do we determine if what we have found is the correct version.We look capabilities of and challenges with using Crossref, Microsoft Academic, Romeo/Sherpa and oaDOI as part of a harvesting pipeline. We also consider aggregating, matching, and verifying metadata and content. ResourceSync, a standard for resource synchronization, was implemented locally atop our IR to expose its content to search engines such as Google provides a loosely coupled mechanism for initiating the autoload pipeline for discovering and adding missing content to the IR in a timely fashion, without requiring modifications to the IR.
The California State University libraries are moving to a shared library services platform (Alma) that allows developers to add, update, and delete data through RESTful APIs. We will present on our experiences and future plans in creating code and applications that can be adapted by other institutions.
The Sufia project (http://sufia.io/) used to poll the server continuously for updates in order to alert our website users when something needed their attention. This technique has a several drawbacks. Within the last few years, all web browsers in common use have begun to ship with WebSocket support. By using the ActionCable library, now bundled with Ruby on Rails, we were able to change the notification functionality in Sufia to take advantage of the benefits of WebSockets. In this talk we will discuss the benefits that WebSockets have over polling, the necessary server architecture and how we refactored our client-side code. We'll take a close look at how we authenticate users and deploy the application. We'll show how ActionCable makes WebSockets easy.
Linked data promises to make library metadata more accessible and powerful. The clearly-defined URIs of linked data will form chains that lead to new connections and insights. But is there a flip side to such sharply-delineated data? Real life is messy and natural language doesn’t come with precise definitions. How will catalogers work in an environment that seems to require black-and white conclusions? What happens when you only have incomplete information and it’s impossible or impractical to obtain the missing details. Sometimes the information you find is contradictory with no clear resolution. Some things fit neatly into categories and some things don’t. Some questions don’t have objective, factual answers. Sometimes you don’t have the expertise to identify the answer. Sometimes user needs are in tension. Of course, these problems confront catalogers now, but they will become more prominent and problematic as we rely more on machine-actionable metadata.
Are there things that we might lose in the transition to linked data? How much will we be able to infer from converted data? What about the things that RDF isn’t so good at, such as grouping and ordering metadata statements?
This presentation will look at some challenges for cataloging in a linked data environment and discuss some possible approaches to handling them.
Sci-Hub is a surprisingly sophisticated website that does a good job of facilitating evasion of research article paywalls; use of Sci-Hub may be a violation of copyright law in many jurisdictions. So Sci-Hub's release, early in 2016, of complete, de-identified, usage logs would be a serious user-privacy breech to the extent that users can be re-identified.
I have explored re-identification of the Sci-Hub logs. Because of the nature of scholarly literature, I find that usage can frequently be tied to a small groups, and occasionally, to single individuals. In particular, article DOIs provide convenient keys to databases containing personally- identifiable information. Libraries that rely on log de-identification or anonymization to protect user privacy should be aware of the limitations of these strategies.
You are attending - or teaching - a workshop on the latest tech hotness. The ad said it was "For Beginners -- No Experience Necessary". You get there and a third of the attendees don't have the right equipment and software, a third are on the verge of tears, and a third are bored out of their minds. What's worse, the presenters want to sneak out the back door. Attendees suck at self-selecting for these workshops because we suck at teaching for beginners. We need to be better at understanding what it means to teach for true beginners and at communicating the real expectations for attendees. This presentation will cover some ideas to get us on the right path for better experiences teaching and learning about technology.
Community-oriented projects keep library technology vibrant, and can provide life support in those times when it could otherwise sink. Whether open source or not, a stated commitment to community is something that we tend to look for in choosing a technical solution, digital project, or professional development opportunity. Understanding how to effectively and ethically improve community around a project--at whatever scale--can improve the project. I will present an analysis of the community efforts around a variety of library technology projects, from large, well-funded, and well-known to small but feisty efforts. Learn from history what works to grow and maintain a diverse community, potential pitfalls, and how to make it happen no matter how tiny you start.
Micro-volunteering is an easy, low commitment, and time-bound volunteer assignment. It is a 'byte-sized' way to make a difference by helping with micro tasks or sharing expertise, and lends itself to workplace volunteering. This presentation will provide an introduction to micro-volunteering, including benefits, how to create a micro-volunteering project, platforms, and volunteer management. From repository development to digital curation, we will share examples of utilizing micro-volunteers to bootstrap library technology projects. We will also discuss lessons learned and how micro-volunteering can be leveraged for no-cost professional development and employee engagement.
Information architecture (IA) can be a powerful tool for clarifying who you are as a library and what you’re doing both for your staff and for your users. Yet the IA skillset and the actual work that is done to produce an IA is rarely defined clearly. This talk will discuss ways in which IA work can support organizational change by establishing a common language and aligning it with how we present the work we do.
We will define what IA competencies are for librarians, discuss what IA can and cannot do for you, and provide examples of what IA work looks like. This will be presented from the perspective of web-based IA design and implementation but address IA broadly.
The Fedora 4 repository offers message based workflows through Apache Camel. AWS Lambda offers serverless computing resources that allow you to only pay for the compute time you actually use. And, the Amazon API Gateway gives you a way to create Web APIs that expose these Lambda resources to other workflows on the Web (like Apache Camel's). This talk proposes to plumb the usefulness of integrating Camel workflows and AWS Lambda through Amazon's API Gateway. In a series of experiments, we'll look briefly at storing repository contents in S3, indexing metadata in AWS' ElasticSearch, sending email or text alerts about repository activity from AWS' SES/SNS, and logging repository events to CloudWatch. These are simple examples but serve as proof of concepts for more sophisticated Fedora and AWS integrations, such as spinning up on-demand image processing. While Apache Camel offers out-of-the-box AWS components, using AWS Lambda provides more flexibility in the process, giving you greater control over Fedora's integration with AWS' myriad of services, and it allows you to offload the work to Amazon's servers.
As born-digital archives become a core part of archival processing, traditional processing archivists who lack Digital Archivist job titles face numerous challenges when meeting the processing and preservation needs of born-digital collections. Knowledge, skills, and institutional culture about who is responsible for the management of born-digital archives can all be barriers for practicing archivists in traditional processing roles. On the other hand, relying on the expertise of one Digital Archivist per institution is not a sustainable practice as it can overburden and overwork those in these roles. This presentation will focus on steps archivists can take to break through these barriers, based on the collaborative practices of the Manuscript Division processing team at Princeton University.
Our talk will discuss steps the Manuscripts Division has taken to manage its born-digital archives, including building our own digital processing workstation, creating extensible workflows, documentation, policies, and investigating means of access that have been informed through collaboration with our colleagues at Princeton and with those in the profession at large. Our hope is that sharing our experiences can empower more archivists to reach across their institutions and the profession to meet the challenges of managing born-digital collections no matter their job description entails.
Bento-box search has been widely used as users’ initial gateway into a library’s offerings where typical candidates for inclusion in search results are resources like catalog, articles, journals, databases etc. However, it has not been broadly applied for Special Collections area. At NCSU, we’ve implemented a second bento-box style search called Historical State Search that is dedicated to our Special Collections (SC) materials, integrating search across different relevant platforms, i.e. SC database, historical events API, general catalog etc. We have recently migrated this project onto the QuickSearch open source toolkit that we released earlier in 2016, and are currently in the process of revamping the website to reflect more content from different collections and provide better access overall. This talk will share the learning experience by discussing why we made the decision to use bento-box search for special collections, design decisions, improvements made in QuickSearch in the process, as well as the process of migrating from a homegrown system to the QuickSearch open source project to help you understand how you can use this code base to migrate your special collections.
Increased emphasis on the reproducibility of research has ignited a shift toward more open practices, requiring researchers to improve infrastructure and develop new skills. As a result, reproducible and portable computing environments are critical for future research success. This talk will define a modern research skill set, discuss its relationship to the principles of open science, and introduce the Scholar’s Backpack, a project to help researchers create the scientific computing environments they need to be productive using virtual environments and "dev ops" tools. We will show how we are simplifying the learning experience for novice data scientists and increasing the reproducibility of scientific computing environments. We will also demonstrate how these environments can be applied to a variety of library services serving a range of disciplines through a case study of our own Summer of Open Science workshop series.
Out-of-the-Box, In-a-Box, or Outside the Box? Lessons from an Environmental Scan of Digital Library Systems
Research library seeks digital library system for asset management, long metadata uploads on the beach (or near it), and multimedia user experiences.
This talk presents the results of interviews, agonizing functional requirements brainstorming sessions, message-board lurking, and random Google searches done by UCLA Digital Library Program staff to find systems not only most likely to meet our current backend and frontend needs for digital library collection management and access, but that are also sufficiently flexible to adapt to an ever-changing digital landscape of media types and access and dissemination methods.
Beginning with a review of the menagerie of DAMS and front-end systems adopted and maintained in the UCLA Digital Library over the past several years -- including a homegrown digital collections system, a Drupal CMS, and a full Islandora “stack” -- we will then discuss the features we seek in the next iteration of such systems and the alternatives we are exploring. This latter group currently comprises a mixture of mature “out-of-the-box” options like DSpace, Fedora-based stacks like Hydra/Blacklight, and management systems originating from outside the library world proper, such as the Nuxeo platform currently being championed by the California Digital Library.
More than 1,000 libraries have already adopted library services platforms in the last 4 years. While these solutions offer unified management of print and electronic resources, the digital content has somewhat remained siloed and separate, leaving important and unique content hidden or hard-to-discover for users. In some cases, due to the technical nature of the digital content, it has been handled by the library IT in cooperation with special collections. In many cases, the workflows are still separate and cumbersome.
Creating digital workflows across departments and within an existing cloud-based solution presented both opportunities and challenges: (A) which storage to use? (B) what to do with existing repositories? (C) what ingest standards, such as SWORD, are critical today? (D) how to enhance the content discoverability?
This presentation will discuss objectives, outcomes and lessons learned
Take Your Relationships to the Next Level: Transforming Relational Data to Linked Data using ETL Processes
There are many paths to publishing linked data. One increasingly-popular method is using an extract, transform, load (ETL) process to transform existing data, stored in a data store such as a relational database, into linked data. The Global Open Knowledgebase (GOKb) project aims to create an open repository of electronic resources metadata that is managed collaboratively by the library community. To facilitate this openness GOKb has successfully used ETL processes to transform metadata about e-resources, including titles, packages, holdings, platforms, and organizations, into linked data, which is loaded into a triple store and exposed through a SPARQL endpoint. This talk will focus on the GOKb linked data pilot, expanding on topics such as designing an appropriate linked data model and ETL pipeline. This pilot highlights a practical approach to exposing relational data as linked data and serves as an example of how linked open data can take your e-resource data to the next level.
User research is critical for good User Experience Design (UXD). UXD deliverables should illuminate customer problems in a way that help your team ideate and design solutions your organization can uniquely solve. Wireframes, a popular UXD deliverable, can represent good design decisions when backed by smart user research - not magic or hunches. Wireframes without user flows defined and informed by user research can result in rework and delay as your team continues to ask: What happens when I click this?
In this presentation we'll take a look at one of the most overlooked UXD deliverables, the user flow - sometimes called a journey map - and show how it can help clarify and visually explain a proposed solution to a known customer problem. By documenting both the current experience and the desired one, not only do user flows help everyone stay on the same page during design and development, they can also be a powerful tool for gaining user validation before investing in code or vendor arrangements.
The Georgetown University Library has joined the Academic Preservation Trust (APTrust) project to provide long term preservation of our digital assets.
Using tools developed by the APTrust project team and tools developed by other open source communities, we have developed a collections of automated workflows to support the ingest into the APTrust platform.
This presentation will describe the process we followed to develop our workflows and our strategy of reusing existing metadata and identifiers from DSpace and ArchivesSpace.
The presentation will walk through each of our primary workflows highlighting the unique characteristics of each workflow and the tools integrated into the workflows.
A description and demonstration video for this project are available at https://github.com/Georgetown-University-Libraries/APTUploadVerification/wiki/Workflow-Designs
Informed by our experience over the last decade developing CKAN, OpenSpending, and other data-driven projects, Open Knowledge International is keenly aware that there is too much friction involved in working with data. We have identified a clear need for a lightweight and extensible format for describing data. This specification, the “Data Package”, is the heart of what we call Frictionless Data.
Frictionless Data is an ongoing project for a set of tools, specifications, and best practices for describing, publishing, and validating data. The mission is to remove the friction in working with data by making it easier for researchers and others to easily transport data among the tools and services that make the most sense for their work. We have found that “packaging” data in a standard format enables the development, for example, of a generic framework for tabular data validation (see our “GoodTables” tool).
This talk will describe our progress to date.
As digital preservation becomes a significant feature within archives, the differentiation between “digital archivist” and “archivist” is blurring. With this in mind, Special Research Collections (SRC) at UC Santa Barbara is removing "digital" as a premodifier, not viewing born-digital processing as a separate unit but an embedded mechanism for all archivists to process collections. This shift includes a reinterpretation of building a born-digital program where archival processors lead in developing born digital policies and workflows as well as processing born-digital media themselves. This project aims to solve the issues of incorporating digital preservation holistically into SRC’s existing processing procedures, confronting the larger challenges of implementing total end- to-end digital archiving, while maintaining sustainability. The presentation will outline how we’ve developed our proposal for such integration as well as our steps in developing a born-digital component to our everyday archival procedures.
The WI+RE (Writing Instruction and Research Education) team at UCLA was tasked with creating innovative online instructional modules to meet undergraduate student needs in research technology skills, information literacy, writing skills and research planning. Looking for a framework that enables students at a range of skill and knowledge departure points to find the modules useful, we turned to a tool called reveal.js. The novel slideshow structuring allows users to select paths through the content, moving downward to get additional information about a topic or skipping forward to bypass entry-level steps. This afforded agency respects users time, but is still comprehensive. For example, in Introduction to Zotero, we offer users text installation instructions, in-depth videos if they need them and the ability to bypass the whole section if need be. The slides have been created collaboratively in Github using simple html for formatting. By conference time, we will have conducted user testing and deployed publicly a number of these projects, and will have data on their efficacy.
[A funny thing happened after we titled our talk with Russian; diebold-o-tron doesn’t like Russian encoding!] What happens when money gets thrown at a problem for a limited period of time – not once, but twice?
A group of volunteers set out to redesign the ArchivesSpace Public User Interface (PUI) nearly two years ago with the goal of making it a premier tool for the public delivery of archival metadata. We had a budget, a vision, and not much else. Our problem was that we had to balance limited ArchivesSpace developer support with the fact that there are many ArchivesSpace users who are eager to see it evolve. As of 2016, in fact, ArchivesSpace has over 300 member organizations but less than ten percent use the current PUI. The audience is also global, since current users include The Strong National Museum of Play in Rochester, New York, as well as The Chinese University of Hong Kong. Luckily, the entire ArchivesSpace application was built with internationalization in mind (primarily driven by YAML files) in addition to being backed by a solid data model. Time was needed to build support for a new design that would be accessible to as wide an audience as possible, however, which is why our development process has favored a “progressive enhancement” approach.
So what happened when we were confronted with this problem? We hope to prove that the persistence of a well-rounded, quality team has paid off. Good banter helped, too.
For the past several years, the IMLS National Digital Platform (NDP) funding priority has focused on expanding the digital capacity and capability of libraries and archives across the country. It is a way of thinking about the combination of software applications, social and technical infrastructures, and staff expertise that provide digital content, collections, and related services to users in the United States. Approaches that underpin much of the work funded under this national priority includes engaging and integrating shared and distributed digital services. Ultimately, we hope to support the development of systems and networks that reflect the ethics and values of important to libraries and archives.
This talk will introduce broad, emergent themes under the NDP priority, as well as specific developments and examples of recent grants, highlighting how they reflect the library values outlined by American Library Association. We are intently focused on expanding equitable access to digital information, diversifying the profession and our collections, engaging community memory initiatives, and building infrastructures for inclusive digital collections. Broadband access, OER (open educational resources), and accessibility for a range of users are also strong themes. Our talk will highlight successfully funded projects and suggest next steps for work enhancing the NDP.
In the 2015-2016 academic year librarians at San Diego State University strategically deployed proximity beacons throughout the library. The beacons delivered informational websites to users via their mobile phones upon entering specific areas of the library building. This presentation will talk about the Google Eddystone beacons selected for this project, and the experience of the library developing and marketing the project. The challenges encountered programming for the beacons will also be discussed.
Do digital screens have electric dreams? From community research to prototyping a digital exhibit service at EPL
Digital visualization spaces are not new in academic environments, but truly community-led interactive digital walls are an emergent service in public libraries. As such, the Digital Exhibits Intern Librarian project seeks to inform the Edmonton Public Library in achieving the vision for the Stanley A. Milner Library Digital Display Wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. In this presentation, I share the findings of the environmental scan and literature review on the contemporary landscape of public video walls and visualization spaces in learning environments around the world. Based on these findings and the results of the community consultation conducted to understand the local context in which this service would function, I present several approaches to prototyping projects without physical infrastructure. How do we design for a space under construction? Where to come up with content for exhibit projects? How can usability research and best practices in digital interactivity guide a suite of digital exhibits? Edmonton Public Library shares the lessons from one exciting year of research and testing.
Collections of digitized cultural heritage materials … photographs, diaries, newspapers, etc. … are often described in ways that make a lot of sense when you’re working with just that collection. But those descriptions can sometimes get in the way of sharing the metadata with other systems. OCLC’s Research staff and its CONTENTdm team have been working with a group of libraries and archives to devise some simple, web-based tools to analyze, cleanup, reconcile, and transform their locally-defined record-oriented metadata into RDF Linked Data, and to provide a more efficient and impactful workflow for contributing this data to DPLA and other aggregations. We refer to this experimental application as the Metadata Refinery. In this talk we’ll describe and demonstrate the Refinery app, summarize the feedback we’ve received from the test sites, and share our lessons learned.
Library Exhibits are stale and lifeless, right? Not anymore. With a mobile TV cart, DVD player and a Mac Mini you can put together a digital presence that takes your flat wall graphics and curated realia to the next level to augment exhibits and events.
As new events or exhibits are rolled out, I work on a project to decide what kind of content is appropriate to accompany the exhibit and how to best display it.
“Perfect” Data in a Crowd-Sourced, Open-Access World: Perspectives from the New Schoenberg Database of Manuscripts Project
Handing data creation and maintenance over to users in a crowd-sourced environment, as we are doing with the New Schoenberg Database of Manuscripts, raises the intertwined problems of modeling, user-engagement, community building, and data quality. With data drawn from auction and sale catalogues and other sources dating back to the 15th century, and now our users’ own personal observations, the SDBM assists researchers in locating pre-1600 manuscript books from Europe, Asia, and Africa, establishing provenance, and aggregating descriptive information. Building on a robust search and discovery interface in the old SDBM, the new SDBM provides its user community with the means to actively contribute and maintain the data as a by-product of their own research process.
Technology infuses the world of libraries, but have we adapted to work in a way that promotes innovation? This presentation explores the ways in which tech and entrepreneurship techniques can be applied to library product development and team management. We will start with a high-level overview of the key concepts of Agile, Lean Start Up, User Experience Design, and Design Thinking. We will then showcase examples of how we have successfully utilized associated principals and techniques to build better products, reduce waste, improve user experience, and promote a culture of creativity and innovation. From lean business model canvas to journey mapping to Kanban and Minimum Viable Products, we will share what has worked and where we have encountered challenges. Highlighted projects include: an idea competition; mobile application, software, and database development; business transformation; and general team management. We will also discuss training experiences like: Project Management, User Experience Design, and Product Management courses; incubator program; and tech start-up internship. The presentation will be followed by an interactive discussion about the value and challenges associated with adopting start-up principles in a library context.
Libraries - rightfully so - embrace emerging technologies, eager to provide patrons with a dedicated MakerSpace or to become the goto spot for the VR-curious. While significant consideration is given as to the impact these new technologies will have on our budgets, few libraries take the time to analyze the impact this unfamiliar tech will have on library staff. At the JPL Library, one staff member is in charge of all things technical: me. For the past three years, this has included two very temperamental 3D Printers. Diving head first, I have over time taught myself the basic principles of additive manufacturing, printer maintenance and repair, and have developed user guidelines from the ground up. I have learned first–hand the true costs of these 3D printers for my library – not only their financial cost, but their effects on my daily work as well. This talk will describe how the printers were obtained through a fruitful partnership with JPL’s Office of the Chief Intelligence Officer, and will examine how their acquisition has benefited library patrons and the JPL community alike. More importantly, however, it will be a candid conversation about the printers’ significant impact on my daily workload, with suggestions for fellow frontline staff on how to ease the pain and mediate their library’s eagerness for new tech with a practical ground game. The talk will advocate that any library considering an investment in emergent technology should think beyond their bottom line, incorporating the needs of technical staff in their cost analysis.
There exist various ontologies that represent a large body of domain knowledge. These ontology systems can be used by librarians creating item level metadata to enhance the search system. With automated full text indexing, there is a great opportunity to exploit domain specific ontologies to algorithmically enrich metadata assignment and inform information retrieval techniques. Using the structure and features of a chosen ontology, custom similarity measures can be designed to match document terms with related ontology concepts that have the highest similarity. This method allows for the automation of associating semantically similar terms with documents based on their contents, all without the intervention of subject specialists. The talk will focus on the experience of working with the medical domain ontology, SNOMED-CT, to enrich the indexing of the Open Access Subset of PubMed Central. The process of system design and results will be presented.
A new library service often requires a custom application. Technologists - whether a developer or a librarian - are usually called in to gather specifications and build the application. By that point, ideally those specifications are already identified and the functions and tasks that the application are to perform are clear. In reality, however, a technologist often struggles with the insufficiently defined service workflow while the staff are unsure about how to define their workflow when they have not seen what the supporting application would look like, which they expect to closely rely on in their work.
What is a technologist to do in such a case? With two new library services, 3D printing and poster printing, as examples, this session will discuss how library technologists and library staff can resolve such a problem together by (1) breaking down the new service into individual actions, (2) attributing each action to either to a user, the application, or a staff member, and (3) fully describing and discussing each action and scenarios for potentially problematic cases. This participatory design process simultaneously addresses and articulates the staff workflow and the application design and can be used for adding new features in the future.
Living Dangerously: Skipping the Test Phase in a Web Development Project -- Successes and the Opposite
ibiblio.org's Omeka instance broke over the summer of 2016 after a version change. Taking this opportunity to make a more lasting change that would also improve user experience, the new deliverable became a Wordpress 2015 theme "lookalike" that would match ibiblio's homepage. But when the dev environment had no usable data and resources were slim, testing the site redesign became a problem. Enter the Firefox Browser Developer Tools and the Wayback Machine. At the 11th hour, the site worked (mostly), and future workflows will be easily improved. This session will describe the project that came about over the summer of 2015 during my part-time work at ibiblio as a graduate student employee, and will focus on making use of browser developer tools as a core component to testing CSS and responsive web design. The audience will learn about the tips, tricks, and gotchas involved in using the browser as an editor, and why they might choose to do so.
Your application makes first contact with real data and grinds to a halt — now what? We'll give an overview of performance tuning strategies and tooling, based on recent experience performance tuning Princeton's new Hydra/CurationConcerns app, Plum. From when to throw hardware at the problem, to when to break out a profiler and dive deep into the code, we'll provide guidance on where to start, and what approaches make the most sense in certain circumstances.
Research institution managers need to increase visibility, understanding and value of their department’s research work. They need to demonstrate how primary research increases perception of their institution’s value to donors and others and have measurements to back up their statements and reasoning. Librarians play a key role in enabling research activities and online information exchange.
Citation count tools provide some explanation of potential awareness of published research. Researchers increasingly use online and social media resources to promote research activities. The challenge is to harness this online chatter to enable a better awareness of the extended value of their research.
Funders are increasingly requiring evidence of broader impacts and dissemination of research efforts to ensure money provided is being used to its maximum. Tools are becoming available to measure attention to formal research discussed online. These tools are referred to as alternative or enhanced metrics, often called altmetrics. They enable tracking of attention to research articles and provide numbers of views, demographics on viewers and other attention measurements.
This presentation uses a case study at the Natural History Museum of Los Angeles County to focus on why altmetrics are important to senior management in research institutions. The analysis review includes an overview of data compiled using the Altmetric for Institutions tool from Altmetric, a Digital Science company. It explains the conclusions reached and how librarians can utilize altmetrics tools to expand the understanding of the value of their information services to support an institution’s mission and goals.
Like many institutions, at the University of Virginia, we have in the past, done archival description using Microsoft word. In order to better manage this descriptive and structural metadata (finding aids) it became necessary to migrate to a structured format. In order to save time, we developed a tool to facilitate converting MS Word documents into EAD XML finding aids, which could then be imported into ArchivesSpace. The tool helps in two main ways, first it allow for rule-based assignment of sections of the word document and exposes an XML-based rule language to allow arbitrary rules to be written. For example, it's fairly easy to write a rule that says "treat the first line as the title" or "treat all paragraphs below the heading 'scope and content' as paragraphs within a scopecontent tag". The second way the tool aids in the conversion is to expose a powerful user interface that allows sections of document to be assigned their place in the structured format, including drag-and-drop reordering and nesting as well as bulk handling of content that appears to be tabular. Through it initial testing and use, lots of convenience features were added to the user interface.
The tool has since been adapted to support conversion of MS Word documents in other contexts.
In Fall 2016, the University Library at California State University San Marcos [CSUSM] launched a comprehensive web presence assessment project. The goal of this assessment effort is to gather user feedback so as to meet the Library’s strategic goal of identifying and implementing improvements to the Library's online presence in order to increase awareness of, and access to, Library resources. This assessment effort utilized five methodologies: online surveys, in-person pop-up sessions, focus groups, 1-1 usability testing, and site usage analytics.
This presentation will focus on the rationale and value of utilizing in-person pop-up sessions. The presenters will describe the design of the pop-up sessions and discuss the value of utilizing this method alongside other assessment types. Findings from the pop-up assessments and the impact on the final redesign recommendation will also be discussed. The presentation will include lessons learned and how the CSUSM Library plans to incorporate pop-up sessions into its ongoing UX assessment program. Attendees will learn why and how they might implement a similar assessment methodology at their institution. A template for building a series of pop-up sessions, and a description of the time and resource requirements, will be made available to attendees.
Creating Streamlined Workflow across Library Services: Case Study at California State University, Fresno
Duplicate work in multiple library projects has negative effect on the overall organizational efficiency. At the Henry Madden Library of California State University, Fresno, efforts were made to create streamlined metadata workflow with technology support that can eliminate unnecessary duplicate work in multiple library services such as institutional repository, Bibliography and MARC cataloging. In this talk, the presenter will share the experiences with reviewing, redesigning and evaluating the workflow. The methods introduced are applicable to similar projects at other academic libraries and the concepts can be considered by all kinds of institutions pursuing high efficiency in daily work.
You’ve probably faced some people within your organization who are fearful of change, especially technological change. It can be difficult to convince techno-leery individuals that technological change is imminent and can be exciting rather than scary. Changes can take the form of a web redesign, a new integrated library system, virtual servers, morphing roles of library vs. organization IT, etc. Convincing library stakeholders that change needs to happen can be a daunting task as a systems administrator, web developer, technology consultant, or technology manager. This presentation will frame technology projects as complex sales or “long sales;” it will also provide practical examples for modifying communication patterns to get people excited, or at least grudgingly accepting, of change.
This presentation will showcase the workflow we employed to generate three-sixty views of our objects. It will also highlight the techniques we employed to capture images and to create virtual objects. But above all, it will demonstrate how any archives or museum can develop interactive virtual objects on a limited budget by harnessing the power of collaboration and open source technology.
Since 2014, the California Digital Library (CDL) has been piloting the use of a metadata harvesting infrastructure to aggregate unique collections from across the 10-campus University of California library system -- and libraries, archives, and museums throughout the state. These collections are now available through the newly redesigned Calisphere (http://calisphere.cdlib.org/) website in addition to the Digital Public Library of America (DPLA). Our metadata harvesting infrastructure, based on DPLA's early code base, allows us to harvest from a wide range of sources and pull much more content into Calisphere. However, there are challenges to scaling and streamlining this infrastructure: our processes for staging collections for harvest, adjusting metadata mappings, quality control checking results and performing re-harvests need to be more flexible and user-friendly in order to maximize available resources. This talk will provide an overview of our existing processes, including bottlenecks in the software stack and the growing pains of harvesting from more & more disparate sources. It will also cover an environmental scan we conducted to evaluate possible replacement software including DPLA's Heiðrún stack, Digital New Zealand's Supplejack, and other large-scale aggregators. Last, we will discuss new requirements that we've developed in order to improve our processes and ramp up harvesting work. With these requirements in mind, we hope to adapt a tool that can provide fast-acting relief for all of our workflow woes!
Each year the volume of research data produced grows exponentially. The capacity to share, analyze, process, and re-use this vital scholarly material is central to the advancement of science. Research institutions are expanding data services—assisting with data management plans and funder mandates, assigning DOIs and archiving data in institutional repositories, supporting robust data documentation, mediating intellectual property law, and facilitating access to open data. How do RDM services fit into current workflows? What are intersections with tangential fields like digital archiving and data science? What emerging technologies and metrics are ideal in a data curation environment?
As William & Mary’s first Digital Services Librarian, I organize and implement research data services at the Virginia Institute of Marine Science. My presentation shares experiences configuring the Digital Commons IR platform for data management, including development of data ingest and access procedures, and metadata customization. I examine digital scholarship and data management trends, and discuss building support for open data initiatives via instruction, outreach, and consultation. Those committed to openness, who embrace the Code4Lib vision of “a diverse and inclusive community of technologists seeking to share ideas and build collaboration,” must lead the development of rigorous, adaptive approaches to data stewardship, to ensure the acceleration and integrity of scientific and academic discoveries.
Building software to support Institutional Repositories is uniquely challenging. Luckily, there are many techniques we can borrow from the practice of Continuous Integration to not only tackle our unique challenges, but actually speed up our process of software delivery. Last year, bepress decided to decided to implement a Continuous Integration and Delivery Pipeline which has now been in place for over a year. During that time we have found that it produced far better results then our traditional waterfall process. Among other benefits our customers get bug fixes faster, our platform is more stable, and the lead time on new features was dramatically reduced. The results have been so astounding we are in the process of moving all of our software products to use similar pipelines.
Streaming videos are a larger share of our libraries licensed collection and demand driven offerings. We rely on MARC records to populate our discovery layer since the videos are not indexed to the degree of journals and eBooks. This means that the quality and extent of MARC encoded description are crucial to the discovery of the streaming video for the people who use the library. It is established that we can develop automated pre-load error checks for eBooks and doing the same for streaming videos is possible. This talk will, at a high level, discuss what types of checks are specific for streaming videos and which of those are better suited for pre-loading and post-loading. Then, I will survey and discuss the strengths and weaknesses of free products for pre-load or post-load checks as compared to customized checks developed through PyMarc, a Python library for working with MARC data. The talk will provide Python equivalent examples for various components of the error check products. The outcome will be to provide an entry point for easing creation and sharing of Python error checks and establishing a community of practice around batch loading of high-quality streaming videos MARC records.
The aimHI Summer Incubator Program: A government-civic partnership to hack the innovation and diversity pipeline
aimHI is an award-winning pilot program at the U.S. Food & Drug Administration (FDA) and partnership with Montgomery County Public Libraries. The goal is to grow a diverse and civic-minded innovator pipeline through a tech incubator model. This collaborative, experiential learning approach turns the traditional summer internship on its head through a unique government-civic collaboration. Students split their time between FDA and library incubator sites to work on team mobile application development projects. 80% of participants are underrepresented in tech, and of these, more than 50% are girls. 2016 participants presented at the White House and at a public shark tank and demo day. The program is the first-of-its-kind with its unique partnership approach, and diversity focus. It is also an innovative example of leveraging resources and collaborations to experiment with new STEAM education and outreach models in a low-risk, cost-effective way. This presentation will provide an overview of the aimHI Summer Incubator Program, including best practices, lessons learned, and how other organizations can start similar programs using this model.
Librarians and information professionals are scrambling to learn many programming languages and new software, but not enough time is spent on learning alternative uses of existing tools. While no one would say that acquiring engineering skills is a bad idea, solving problems with common tools already at our disposal is easy and efficient.
Throughout the course of a digitization project at The New York Academy of Medicine Library, we realized that we could co-opt Microsoft Word’s mail merge technology to generate unique, well-formed XML files. We were then able to batch ingest these records into our Islandora repository. After receiving an Excel document filled with beautiful metadata, we decided that the benefits of exploring this hack outweighed the time needed to manually input and collocate XML tags in Excel or to develop a script.
Please join us for a brief discussion about the challenges and successes we encountered while using Microsoft Word’s mail merge tool to batch generate XML files. In addition to sharing with the Code4Lib community an innovative use of existing technology, we also feel that showcasing this strategy will draw attention to the necessity of cultural institutions attracting and retaining professionals who are comfortable in both the tech world and the traditional librarianship world.
Article writing is a complicated and time-consuming endeavor and one that is only made less attractive by the prospect of producing several versions for various publisher workflows and specifications. An additional, undesirable effect is that the end result of these processes do not lend themselves easily to metadata extraction or customizability and reusability without some sort of mediation.
We propose a simplified document language, FINNEGAN, that fosters a type of article creation that allows for automated metadata extraction which can be piped into different standards and systems, as well as make use of embedded identifiers and other syntactic structures to create an RDF graph of the article's universe. By moving away from ultra-stylized word processing tools, the publishing workflow can facilitate a more fluid and extensible product that serves a larger variety of library and publishing needs.
This presentation will explain the FINNEGAN model, its benefits, syntax, as well as how it can be incorporated into existing tools like Pandoc. Attention will also be given to the (very real) adoption challenges it faces and how these can be recast as innovative opportunities in reevaluating existing metadata and publishing models.
The world of digital humanities is rapidly growing, and if student digital humanities projects are to be seen as robust and legitimate research, then the emerging literacies skills necessary to produce that high level of research, including primary sources; proper attributions and citations; and clear metadata and documentation need to be integrated into classroom as part of the syllabus.
As a Civic Engagement Fellow with the UCLA Center for Jewish Studies, I worked with CJS faculty and staff to integrate archival literacy into the syllabus for two digital humanities/service learning courses. Over time, I learned to distill my archival literacy instruction down to the basics in order balance the needs of the students with the needs of the university to maintain an accurate archival record. I have found that due to the short academic terms, it is most effective and less intimidating for students to use tools with which they have some familiarity. Thus, CJS faculty, staff, and I have encouraged students to use their smart phones (or another device they already own) to record interviews and take high-resolution photographs of artifacts; upload images onto a classroom Dropbox account set up with a hierarchal folder structure specific to the project; and describe artifacts and capture metadata through a simplified archival data entry form created through Google Forms and customized for each course.
The timing in making sure students provide clear metadata and documentation for the work they contribute is critical given the transient nature of undergraduate students who are typically only in a course for a term. Once an academic term has passed, it is infinitely more difficult to retroactively gather any information or metadata not already provided to the department or university.
As universities begin to incorporate digital humanities projects into undergraduate courses in a wide range of disciplines, there is a growing need for information professionals to collaborate with faculty to introduce archival principles into the course in order to ensure an accurate archival record of student work and contribution to these projects as the courses unfold. Faculty and information professionals, then, should be encouraged to collaborate and set up an infrastructure for the course in which students learn archival principles and why it is important for research and then have the chance to put the principles into practice through following simplified professional standards in archival description and digitization.
My office within the U.S. National Library of Medicine (NLM) has used ColdFusion for many years for certain critical public applications such as MedlinePlus. Now, NLM is re-writing many of our key applications in Python/Django. We like a lot of things about the framework and the language. We are concerned that Python does not multi-thread well, especially compared to byte-code languages that run on a JVM.
In order to determine the number of Python/Django application servers required for on-premise applications, and the resource utilization at which cloud application servers must scale, we measure the response time of a ColdFusion server and a similar server running a Python/Django application generating the same HTML content. We also configure a varnish cache in front of the Python/Django application server and measure this against static HTML on disk.
The methodology is simple - measure response time as a function of the number of concurrent clients. The response time should be linear until a certain number of concurrent clients, and then exponential. On the application server after the knee, we measure resource utilization and determine the best resource to signal scaling in the cloud.
We expect Python/Django to have a knee at a smaller number of concurrent client connections, because Python does not multi-thread well. We are interested to see the differences in response time, but we have no theory here and expect it will be fast enough if the database connections and server are properly optimized.
In addition to pure performance measurement, we discuss performance optimizations on the Python/Django application and WSGI application server configuration. Multiple WSGI application servers may be tried depending on the results with Phusion Passenger open-source.
Dataverse is an open source research data repository platform developed at the Harvard Institute for Quantitative Social Science and hosted at universities around the world. Scholars Portal, a service of the Ontario Council of University Libraries (OCUL), hosts an instance of the Dataverse platform for use by the 21 institutions in the OCUL consortium.
In February of 2016, the Dataverse team at Scholars Portal began the process of migrating from Dataverse 3.6 to 4.5. This poster will explore the migration process, including custom development, project management, training, considerations in a consortial environment, and lessons learned along the way.
Scholars Portal (SP) Journals is a digital repository of over 47 million scholarly articles drawn from journals covering every academic discipline. It’s a service provided to 21 OCUL libraries across Ontario.
SP Journals loads data from about 30 publishers. The metadata from the publisher is normalized and loaded in the MarkLogic database for display and searching while the full text source data resides in the file system. The loader needs to get physical directory for the new source data from properties files. It has been a manual process for the programmer to update the physical path in the properties file. Our project is to update the properties files for ejournal loader programs automatically so the loading process can load ejournal source data seamlessly from the files system to database without any human intervention. A java program is written to look for the possible new directory once there are no new datasets in the old directory and update the properties file as needed. This guarantees most up-to-date content on our platform, and is a much more efficient method overall.