battis.net and I'm all out of bubble gum…

So, I’ve spent the last few days wrestling with a curriculum unit that an outside consultant built. In PowerPoint. On Windows. And which we have been trying to set up in such a way that we can share the interactive document with students. Who are using Macs. And, perhaps, without asking each student to download a ~200MB file to use it.

I have learned and grown much in the process. And have discovered that Microsoft PowerPoint 2008 does an execrable job of exporting PowerPoints as web pages (it does an execrable job of doing a lot of other things too, but we can talk about that at another time). Here are the key fixes that I made to the exported web page and supporting files so that the presentation would fundamentally work (all of this was done using regular expressions in TextMate):

  1. I stripped out all of the fancy Javascript calls that PowerPoint inserted as links to navigate from one slide to another. It turns out that a simple HREF to the actual page’s HTML file works (and the JavaScript Does Not.)
    Find:

    (href=")[^"]*(slide\d{4,4}.htm)[^"]*(")

    Replace with:

    $1$2$3
  2. The export to web page takes all of the already URL-encoded links in the PowerPoint and reencodes them, rendering them useless. I stripped off the second encoding.
    Find:

    (%)25([a-fA-F0-9]{2,2})

    Replace with:

    $1$2
  3. Finally, because the links were built in Windows and then URL-encoded, all of the Windows-style paths needed to be turned into POSIX paths for use on the web.
    Find:

    %5[cC]

    Replace with:

    /

At this point, in an average PowerPoint, most of the damage has been fixed and things more or less work. However, the curriculum unit that we were working with also linked to external Word documents (hence some of the Windows-style path issues above). This meant I had a few more fixes along the way that are worthy of note:

  1. I replaced the links to Word documents with links to the corresponding PDF files (and script I used generated PDF files with .doc.pdf extensions and I didn’t bother to fix that).
    Find:

    (href="[^"]*docx?)(")

    Replace with:

    $1.pdf$2
  2. These links to external documents open in the same frame as the slideshow. Which defeats the purpose of the slideshow being a navigational tool. So I redirected all of the new PDF links to a new window in the browser. As the hyperlinks are broken across two lines in the HTML source code, this took two steps.
    1. Find (changing {{name of Links & Sources folder}} to the, well, actual name of the Links & Sources folder):
      (href="((http://)|({{name of Links & Sources folder}}))[^"]*")\n

      Replace with:

      $1
    2. Find (modifying as noted above):
      (href="((http://)|({{name of Links & Sources folder}}))[^"]*"\starget=")_top(")

      Replace with:

      $1_blank$5

September 24th, 2009

Posted In: Educational Technology, How To

Tags: , , , , , , , , , , ,

So, a couple days ago, Nate posted about my Yahoo Pipes solution for a blog comment aggregating conundrum he was running into in his history class. My solution ended up being a little technical, but I think it’s an interesting enough example of the power of Yahoo Pipes that it’s worth talking through what’s going on. (I love Yahoo Pipes — although one of my colleagues is a major fan of WebRSS, which meets similar needs.)

A mildly cleaned-up version of the solution is below, in the Pipes’ visual programming diagram:

Yahoo Pipes Solution (click to embiggen)

  1. The pipe itself starts on the left, with Fetch Feed module, in which Nate had helpfully plugged in the comments feeds for all of his students. This takes all of the comment feeds from their blogs and aggregates them into one, enormous comment feed.
  2. Following the blue “pipe” from the Fetch Feed module, we reach the Rename module, which is actually be used to rename a copy of the “link” field of each item to “orig_author”. Clearly, this sentence needs some explanation. In an RSS feed, each item represents a single entry (in this case, a single comment on a particular blog). The RSS feed is made up of a bunch of items (usually the ten or so most recent items in the feed, so the ten most recent comments on each blog). We have aggregated these ten most recent comments from each blog into a feed that includes the hundred or so most recent comments from all the blogs (ten blogs x ten comments each = one hundred comments). Each item is made up of a series of fields that describe the item (title, description, link, etc.). Normally, in a feed reader, we only see the title and description, although we may also “see” the link field when the title is turned into a link back to the original comment on the originating blog. In this case, I have made a copy of the link field, for future reference, so that every item now has a second copy of the link field, named “orig_author” (short for original author, because programmers are lazy). It turns out that, in the WordPress.com comment feeds, the link is the only part of the item that refers to the name of the blog on which the comment was made, and that cryptically (e.g. “http://nkogan.wordpress.com/2009/08/…” — i.e., as the first word in the address of the blog itself). More on this in a second.
  3. Again, following the blue pipe from the Rename module to the Regex module, we see two lines, which read, more or less:
    1. In item.orig_author replace
      http://(.*)\.wordpress.*

      with

      $1
    2. In item.title replace
      (Comment on )(.*)( by .*)

      with

      $1${orig_author}’s post “$2”$3

    What this means, in layman’s terms, is basically that I want to take whatever text appears in the orig_author field (which is a copy of the link field, which was a link to the original comment on the originating blog), and extract only the name of the blog from the URL. I won’t dive into the details of regular expressions right now, but what the first one above essentially says is “look for a pattern that starts with ‘http://’ followed by any number of characters — (.*) — followed by ‘.wordpress’ followed by any number of characters — .* — and replace the text that matches that pattern with whatever was in the first set of parentheses — $1.” In other words, replace the entire link with just the name of the blog that’s between ‘http://’ and ‘.wordpress’.
    The second regular expression is a bit more involved. At this point, it helps to realize that each set of parentheses is referred to by its order in the sequence of the original pattern, prefaced by a dollar sign — $1, $2, $3, etc. In this case, we’re looking at the original title of the comment item, which started off something like “Comment on My First Blog Post by Seth B.” I want to take my newly discovered blog name (from the first regular expression) and insert it into this in a meaningful way. To do that, I create a pattern that breaks the original title into its component phrases “Comment on “, “My First Blog Post” and “by Seth B.” With this in hand, I just plug in the current value of the orig_author field — which I just clipped in the previous regular expression, add some nice curly quotes, and put it all back together again to read something like “Comment on nkogan’s post “My First Blog Post” by Seth B.”

  4. Again, following the pipe down to the Sort module, I’ve added the handy little fillip of sorting all of the comments in our aggregated feed by the time at which they were posted (rather than grouping them by the blog on which they were posted, as they would be coming out of the original Fetch Feeds module — the first blog linked to, followed by the second blog linked to, etc.

I hope this is useful, or at least intriguing, in thinking about using the both Yahoo Pipes and regular expressions. A really wonderful reference on regular expressions can be found at Regular-Expressions.info, and I really like JRX as a tool for fine-tuning my regular expressions as I write them.

August 31st, 2009

Posted In: How To

Tags: , , , , , , ,

css.php