Kathi Fletcher's Blog: June 2011

Tuesday, June 28, 2011

Not convinced yet that SWORD is sharp?

My prior blog post discussed the benefits of choosing SWORD for depositing open education resources (OER) in repositories that make it easy to share and remix. It was very general, however, and here I will attempt to make a few more details concrete.

The key to SWORD's appropriateness is that it is specifically geared toward exchanging packaged content along with some information (metadata) about that package, and SWORD includes metadata in a format common to OER (Dublin Core).

Packaging and the "atomic" nature of SWORD: The key to SWORD over AtomPub (which it is built on top of) is that it has a notion of a "package". Connexions, for example, is a repository of modules and collections. These are educational packages and can properly be thought of as containers of stuff. A module contains document XML, resources (images, slides, attachments) etc, and some metadata (title, authors, subject etc). It is good to have an API that can deal with that whole thing at once as a logical unit. And SWORD also gives you a way to ask for the whole logical item back (perhaps transformed). For Connexions someone can then rightly ask for their module back as a zip of all its parts, as a web view, as an EPUB, etc.

SWORD would be too heavyweight to publish images to flickr or status updates to Facebook. SWORD also isn't appropriate in an environment where you could deposit a bunch of things at once, but they really aren't a logical unit. If you just wanted to upload 20 pictures in bulk to save time, SWORD wouldn't be the right protocol, because it would require the repository to be able to give you back precisely those 20. But it is just the right size for publishing modules or collections to the Connexions repository. There is really no extra overhead that isn't completely necessary to the process of submitting content to Connexions.

A protocol like WebDAV is much more complex, because its goal it to make collaborative, distributed authoring of web resources possible, or perhaps be a networked file system. It has all sorts of locking and detailed synchronization, along with ways to query individual properties on individual resources. This is just much more fine-grained than is needed for an educational repository like Connexions. Similarly for CMIS, which is a complete system for making general purpose content management systems interoperable, and like WebDAV it is oriented toward general purpose files and folders.

Commonly used metadata: One more factor makes SWORD appropriate for educational repositories with modular, remixable content. SWORD uses Dublin Core metadata and Dublin Core support has become standard in the world of scholarly and educational content.

Sunday, June 26, 2011

OER Roadmap : The First Quarter in Review

Investigation:
To start, it was important to look for an existing API to adapt for publishing OER that can support a learning ecosystem around open education repositories (especially delicious remixable ones). Like Goldilocks, I found a cottage in the woods with several to try. WebDAV, CMIS, and gdata are all interesting and well-established protocols for publishing to the web, but they are much too hot, or in other words they are either too complex for the task at hand, or too specific to particular services. Next I tried Atom Publishing Protocol, but unfortunately too cool -- it was too general to specify the work flow natural to publishing packages of learning content. I found two bowls of very, very similar and tasty looking publishing APIs, called Simple Publishing Interface and Simple Webservice Offering Repository Deposit. The bowl I found most satisfying will become evident as the story unravels.

Birth of an OER Roadmap wiki and reserved parking spot for code (March 14): After some investigation of different hosting sites, I chose Google Code, because it had a lot of functionality that was easily available, and a very small learning curve. Welcome to OER-Roadmap.

Hewlett OER Grantees Meeting and Wikimedia: (Mar 29 - Apr 1) I went to the Hewlett's annual OER grantees meeting and spent time reconnecting with old friends and meeting new ones regarding the broad goals of this fellowship and the potential to help catalyze education content production and consumption. After the grantees meeting I met with Erik Möller of Wikimedia about potential wikipedia/wikieducator bridges to Connexions and potentials for implementing the API in wikimedia projects.

I attended the NITLE Summit (April 6,7) where liberal arts college leaders think about education in a digital age. John Seeley Brown’s keynote (my notes here) on educating for change provided a though-provoking challenge regarding the kind of education needed when content changes constantly. I participated in the OER workshop led by Hal Plotkin of the US Department of Education.

Cataloging the current Connexions' API: (April 14th on) : Because Connexions software provides a publishing platform that supports “frictionless remix”, its functionality is a good model for the actions that a publishing API for OER should support. With the help of Connexions’ Systems Engineer, we catalogued the data and metadata that is available, the current publishing implementation, and ideas for how to build licensing, versioning, derived copies, and authorship roles into an API. Those details are found on the Roadmap Wiki here.

Connexions' Google Summer of Code projects were accepted and two students were chosen. : (April 25th on) : Both of Connexions Google Summer of Code projects have the potential to increase OER production in open repositories.

Creating a Google Docs editor for Connexions would result in a simple pathway for authors to produce content, and the publishing API from this fellowship would allow docs authors to push their content in from wherever they create it.
The second project, Enhanced Author Profiles and Kudos, is also relevant to API's for OER, because it will make it much easier for authors to advertise their publication of open education materials from Connexions.

Sprinting at the Plone East Symposium (May 19-22): We sprinted (communal coding) to extend an existing partial publish implementation in Connexions. The extension allows creation of modules from deposited Word files or CNXML (Connexions semantic document format) files and improves the handling of metadata (title, language, etc). With the help of fellow fellow, Mark Horner, we found and invited Carl Scheffler, whose background is in machine learning, and whose interests include improving education, to participate in the sprint, along with Connexions own Phil Schatz and Ross Reedstrom, with Penn State's team Mike Halm and Michael Mulich advising. The sprint planning is here and the full description of the day is here.

IMS Learning Impact and an emphasis on phased implementation strategy: (May 16-18) The IMS Learning Impact conference was the perfect place to corner, I mean get advice from, those with experience creating API's and making pathways between software and services. Many thanks to Chuck Severance, Jeff Kahn, Gerry Hanley, Brad Felix and several folks I met at the conference. These discussions led to a general approval for (minor spoiler alert) SWORD, and a planned phased approach to implementing it in Connexions (sooner is better than complete). A simple first cut at the API and an implementation in Connexions allows us to start building tools that use the API to generate interest and excitement and software that others can use, improve, and copy as well.

Open Repositories 2011: The SWORD Workshop: The Choice Revealed (June 7,8)
After meeting with the SWORD technical team, learning more about the second version of SWORD, and conferring with Connexions' Systems Engineer, SWORD built on AtomPub became the clear winner among the API servings. (And now that Goldilocks metaphor officially ends.) SWORD is simple, flexible, popular, and has a head start in Connexions where we will test it first. The full reasoning is published in the blog entry before this one and a bit more detailed reasoning here.

Client investigations: Translation tools
So with the choice of SWORD, and V2 in particular, investigating potential clients is in full swing. Translating content allows multilingual domain experts to contribute to OER without creating something from scratch. Siyavula's translation sprints for the Free High School Science textbooks demonstrate that people who know the subject and the language are willing to help out. Carl Scheffler is investigating a couple of different approaches to translation.

Using CodeMirror: For geeks unafraid to see XML, this demonstration protects tags so only text gets changed – demo here.
Using Google Translate Toolkit: Translate Toolkit is designed for all translators and the beginnings of the investigation are described here. Using Translate Toolkit would require a web service that manages various format transformations that are needed.

The Specification of the API and implementation are underway. The Specification of an OER Publishing API extension of SWORD V2 has begun. The SWORD protocol should work as is, so the extensions should not change the protocol, but rather make use of the natural SWORD flexibility to include extra metadata and return repository specific information. For Connexions, the first phase of implementation of a SWORD V2 service will support creating, updating, publishing, versioning, and deriving copies of modules through the existing editing spaces (to hold the modules as they are being constructed). The following pages show the progress and ongoing work.

Choosing a SWORD for publishing OER (a pen may be mightier but ...)

Choosing or developing a standard way to publish open educational resources (OER) to libraries (repositories) that encourage remixability (sharing and adapting) was one of my main goals for the first three months of my fellowship with the Shuttleworth Foundation. Fortuitously, the way forward seems clear and smooth by using an existing publishing standard called SWORD.

For non-techies and techies alike, I highly recommend watching the video below that Cottage Labs made for SWORD V2. It is very short, clear, and quite nicely done. If you are a Connexions person, think of the package as a module (the document/page/topic itself, plus all the goodies like images, movies, sound clips, and handouts that the document contains). The package could also be an entire collection. If you work with other educational materials in learning management systems, the package could contain anything from a single PDF or geo-tagged image, to a whole common cartridge course.

As the video shows, SWORD is a simple protocol for depositing content into repositories. It is a specialization of the Atom Publishing Protocol which is itself widely used for publishing web content like blogs. After attending Open Repositories 2011, meeting some of the SWORD technical team (Stuart Lewis and Richard Jones), and getting a few technical details ironed out, SWORD looks like a winner. In particular, the second version (V2) has everything that we need for publishing open education resources.

The reasons for choosing SWORD in a nutshell (well really in a blog).

SWORD is simple, but not too simple. SWORD V2 will handle all the basics: finding locations to publish to, creating items, updating them, and signaling that they are ready to publish. And that is about it. SWORD doesn't specify authentication and authorization, so you can use other standards for that. SWORD doesn't get into details of organization (like creating and managing folders and such) so much of the complexity of CMIS-like systems is avoided. SWORD V2 specifies one simple packaging format (an atom entry plus a zip) that must be supported, and then leaves all other packaging up to the repository to negotiate with the client. So creating a sword service is straight-forward and encourages lots of implementations. Clients toolkits can provide lots of useable code, but clients do have to know a bit about how the repositories they want to use expect content to be formatted. But that is true anyhow. You can't send a bunch of PDF's to Flickr, because it wants images, right?
SWORD is flexible. SWORD provides specific returned URLs that can be used to give repository specific requirements like signing a license. Repository specific metadata can be added to the entry at will (with a nice namespace to keep them sorted). Repositories can choose to replace content or to create new versions.
SWORD is popular. SWORD is implemented in many existing repositories (DSpace, ePrints, Fedora, arXiv, Zentity, Invenio) the Open Journal System (OJS), and budding data repositories like Chem#. The US Government led Learning Registry is also including a SWORD service for depositing metadata and paradata about learning materials. The SWORD working team includes many different organizations including JISC, UKOLN, US Library of Congress, and the teams from all those implementors. Client toolkits are available in a variety of popular langagues including Java, Python, Ruby, and PHP.
SWORD has a head start in Connexions. Since I want to show how clients and services can make publishing OER easier while keeping them in remixable formats, implementing SWORD in Connexions is crucial to the process. Connexions already has a partial implementation of the first version of SWORD that was built for a very specific work flow with OJS. So a full implementation of SWORD has a head start in Connexions.

Saturday, June 25, 2011

Publishing API and a new service could make translating Connexions modules easy

Specialized tools for translating and publishing OER is one of the possible uses of an API for publishing to open education repositories. Repositories may have general purpose editors for creating content, but they aren’t likely to have great facilities for translating content.

Carl Scheffler and I spent some time in the Geneva airport investigating whether Google Translator Toolkit could be the translation editor of choice for Connexions modules. Translator Toolkit has to be convinced and helped along, however, because it was designed for HTML (web pages), rather than for the structured XML format of Connexions' modules. It just might be possible, however, and advice and comments would be most welcome.

The workflow would be just a bit more complicated than the normal route for translation and would look something like this:

Find a module that you want to tranlsate on Connexions and record its ID. Lets say the module is Electric Circuits - Grade 10, http ://cnx .org /content /m 32830/latest /. Then the id is “m32830”.
Open Google Translator Toolkit and select a URL something like this: http://www.coolhelperservice.org/cnxtranslate/m32830. This would fetch the module in a format that Google Translate can use well.
Translate it using the Translator Toolkit.
Save the file to your laptop.
Go to something like http://www.coolhelperservice.org/cnxpublish and upload the saved file. Fill out a bit of information and then push a button to sign the license and publish it to Connexions.

Although it would be more straightforward to enter a cnx.org web address into the Translator Toolkit and then publish straight from the toolkit, we don't have the technical hooks into the Translator Toolkit to be able to do that. So instead, we would create this new “coolhelperservice” that would know how to format Connexions content for Translator Toolkit and how to take translations and reformat them and publish them to Connexions.

Does that work flow seem reasonable? Is there a better work flow that you can think of and suggest?

Some technical details for those that are interested. Those that aren't can safely stop here and still be able to give feedback on the process from a translator's perspective.

Google Translator Toolkit doesn't work with XML formats. But Connexions does produce an HTML format for modules that can be be converted back into Connexions XML without any loss. So the “coolhelperservice” needs to retrieve the module, format it in HTML for the translator toolkit, and then do the opposite transform (HTML → CNXML) on the way back into Connexions.

To get the HTML for the body of a module from Connexions, you append “/body” to the module URL. And the module metadata (title and such) is available by appending /metadata to the module URL. So with the module ID, the “coolhelperservice” can put together a nice package of HTML for the translator to use, and still be able to reconstruct the XML to publish the translated version.

One tricky bit is that Google Translator Toolkit makes a mess of the mathematics that comes in from Connexions, so the math has to be protected somehow. Carl and I experimented with a few ideas for how to do that, and toolkit didn't cooperate with most of those, but Carl came up with the idea of putting all the math into an HTML id. Amazingly, that worked. It comes out all escaped, but that is good enough. (Toolkit won't keep around a random attribute, so “id” was the way to go). Carl is pretty sure that there is a webservice that will take a snippet of mathml and give back an image. He is going to investigate that further. So in principle, you can stuff the math into an image ID (so it doesn't get lost) and replace the math with a URL to this service that will render the math. The translator won't be able to translate words that were inside the math, but Carl had previously looked around and that isn't very common, so this might just be good enough.

At the end, the “coolhelperservice” will use a publishing API (SWORD V2) to publish the translation back to Connexions. Implementing that API is part of my fellowship work so it is coming later this year. There will have to be a bit of license signing back at Connexions, but the “coolhelperservice” can make that smoother also.

I think something like this could work. What do you think? And did we miss some clever idea or service that could be of help? Actually, I am sure we did since this was a 2 hour experiment. So send help, advice, etc. Carl will keep investigating, and maybe we will have some screenshots to clarify all this for a future post.