battis.net and I'm all out of bubble gum…

For the last few years (my JSON feed tells me: since 2008), I have been tagging and annotating articles of interest as they passed before my eyes in Google Reader. This served a two-fold purpose:

  1. I could find them again later, easily, because they were tagged and annotated.
  2. I could share an RSS feed with those annotations to particular interest groups that I worked with (e.g. anything tagged “for robotics” would show up on my advanced computer science class’ portal page, or anything tagged “for academic computing” would show up on my school home page).

This was a great way to share (and manage) resources. Granted, much of what passed before my eyes in Google Reader was trivial and not of lasting value, but this filtering allowed me to hang on to at least a few gems for future reference.

And then Google Reader got the Google+ treatment and sharing items broke. But you could download a JSON dump of all the items that you had ever shared. It wasn’t entirely clear what you could do with this JSON dump, but… there it was. And then: I realized that all of my other information is stashed on my web server (and that I have become increasingly distrustful of relying on cloud services to maintain my data and workflows — e.g. my weekly backup of all my Google Docs… just in case).

Wouldn’t it be handy to import that JSON feed into a new blog on my server? So I wrote a PHP script that converts (at least my) Google Reader JSON dump into an XML file that WordPress can import as a list of posts. With the tags and annotations converted over. In fact, with all of the data in the JSON dump embedded in the XML file (although WordPress doesn’t read all of it).

This comes with a few caveats:

  • For items that came from blogs with a full feed, the result is a republication of the original post — which feels ethically dubious to me. (I have made my new blog of Google Reader shared items private, so that I have the data but I’m not sharing it with the world).
  • I’ve made guesses as to how to treat some of Google’s data. Reasoned, educated guesses, but guesses nonetheless. For example, I’m not super-clear on which dates in the file correspond with what events — does a publication date refer to when the item was shared or the original post was posted?
  • I’ve added in some arbitrary (and therefore, ideally, eventually, configurable) WordPress tags to make the import go more smoothly. Where I have done that, I mark it in the script as a TODO item. (And, in truth, I didn’t really test to see if all of these items were necessary.)
  • The original authors of the posts are transfered to the XML file, which means that when the actual import into WordPress is done, you will have the option to either laboriously create a new user for each distinct author or simply revert authorship to the currently logged-in WordPress user. It doesn’t seem like WordPress has a format for exporting or importing users (or, at least, my cursory search didn’t find it). Clearly an ancillary SQL query could be generated that pre-populated the WordPress database with the users that the XML file refers to. But I haven’t bothered to do that.
  • You’ll need your own PHP-compatible webserver to run the script, since I have been quick and dirty and simply imported the JSON file from and exported the XML file to the script’s local directory. And I have no interest in setting up my world-facing webserver to take the traffic hit of processing other people’s multi-megabyte JSON dumps.
With that said, here is the script, as it stands this morning.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
/**********************************************************************
 * Google Reader to Wordpress
 *
 * 2011-11-27
 * Seth Battis (seth@battis.net)
 * 
 * This script takes the output of Google Reader's JSON export of
 * shared items and converts it into an XML file that can be imported
 * into a Wordpress blog as posts. All of the data in the original JSON
 * file is preserved in the XML file, either by transfering it to
 * an appropriate format (e.g. Google Reader categories are converted
 * to WordPress post tags) or simply as an additional XML tag (e.g. the
 * Google Reader commentInfo metadata for recent shared items). In
 * situations where actual data has to be converted to make it readable
 * for WordPress, the original data is included as the JSON attribute
 * of that tag (e.g. timestamps and categories).
 *
 * As currently written, the script looks in its local directory for
 * Google Reader "shared-items.json" file and generates a matching
 * "shared-items.xml" file, also in its local directory.
 *
 * There are a number of potentially configurable (i.e. arbitrary)
 * values marked as TODO.
 *
 * Caveat emptor: this has been tested against my ~1000 item Google
 * Reader shared items feed and on my WordPress 3.2.1 site. I would
 * presume that it should work fairly well for others, but make no
 * guarantees!
 *********************************************************************/
 
/* returns a Wordpress slug-version of the given text (only
   alphanumeric characters and dashes) */
function sluggify($text)
{
	return preg_replace("|[^a-z0-9]+|", "-", strtolower($text));
}
 
/* SimpleXML doesn't really support namespaces unless you force it */
$xml = '<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
	xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.1/"
></rss>';
$rss = new SimpleXMLElement($xml);
$namespaces = $rss->getDocNamespaces(true);
 
/* load the Google Reader JSON file */
$file = file_get_contents("shared-items.json");
$json = json_decode($file, true);
 
/* Wordpress will choke if our post names aren't unique, so we track
   them separately */
$post_names = array();
 
/* header information describing the file itself */
$channel = $rss->addChild("channel");
$channel->addAttribute("direction", $json["direction"]);
$channel->addAttribute("id", $json["id"]);
$channel->addAttribute("self", $json["self"][0]["href"]);
$channel->addAttribute("author", $json["author"]);
$channel->addChild("title", $json["title"]);
$channel->addChild("link");
$channel->addChild("description");
$pubDate = $channel->addChild("pubDate", gmdate("D, j M Y G:i:s O", $json["updated"]));
$pubDate->addAttribute("json", $json["updated"]);
$channel->addChild("language");
$channel->addChild("wxr_version", "1.1", $namespaces["wp"]);
 
/* run through the list of items and add them to the XML */
foreach ($json["items"] as $item)
{
	$rssItem = $channel->addChild("item");
 
	/* a bunch of Google Reader-specific metadata */
	if (isset($item["isReadStateLocked"]))
	{
		$rssItem->addAttribute("isReadStateLocked", $item["isReadStateLocked"]);
	}
	$rssItem->addAttribute("crawlTimeMsec", $item["crawlTimeMsec"]);
	$rssItem->addAttribute("timestampUsec", $item["timestampUsec"]);
	$rssItem->addAttribute("id", $item["id"]);
	if (isset($item["commentInfo"]))
	{
		while (list($commentInfoKey, $commentInfoValue) = each($item["commentInfo"]))
		{
			$commentInfo = $rssItem->addChild("commentInfo", $commentInfoKey);
			$commentInfo->addAttribute("permalinkUrl", $commentInfoValue["permalinkUrl"]);
			$commentInfo->addAttribute("commentState", $commentInfoValue["commentState"]);
		}
	}
 
	/* annoyingly, not every item has its content in the content
	   element -- sometimes it's in the summary element (and once in a
	   while, it's just not there). I think this is an artifact of how
	   the original RSS feeds were constructed. I think. */
	if (isset($item["content"]))
	{
		$content = $item["content"]["content"];
	}
	else if (isset($item["summary"]["content"]))
	{
		$content = $item["summary"]["content"];
	}
	else
	{
		$content = "";
	}
 
	/* sometimes items don't even have titles */
	if (isset($item["title"]))
	{
		$rssItem->addChild("title", $item["title"]);
	}
 
	/* most items store the original linkback information in the
	   alternate element -- Wordpress won't honor this link tag when
	   it's imported (i.e. it won' treat it like the Daring Fireball
	   feed), so I have embedded a more descriptive linkback after the
	   annotations at the start of the content, using some information
	   from the origin element. The linkback is in the
	   "google-reader-alternate-href" div (for easy CSS-wrangling!) */
	if (isset($item["alternate"]))
	{
		$rssItem->addChild("link", htmlentities($item["alternate"][0]["href"], ENT_COMPAT, "UTF-8"));
		$content = htmlspecialchars("<div class=\"google-reader-alternate-href\"><p>Originally posted at <a href=\"{$item["alternate"][0]["href"]}\">{$item["origin"]["title"]}</a></p></div>", ENT_COMPAT, "UTF-8") . $content;
	}
 
	/* I haven't bothered to figure out if the dates are really GMT or
	   localized -- GMT was an easier assumption to make */
	$pubDate = $rssItem->addChild("pubDate", gmdate("D, j M Y G:i:s O", $item["published"]));
	$pubDate->addAttribute("json", $item["published"]);
 
	/* not every item has an a author, either -- again, an artifact of
	   the original RSS feeds */
	if (isset($item["author"]))
	{
		$rssItem->addChild("creator", $item["author"], $namespaces["dc"]);
	}
 
	/* annotations were tricky -- I have added them as their own XML
	   tags _and_ inserted them within a "google-reader-annotation" div
	   at the top of the post content (to match the original format on-
	   screen). All of the original data is preserved in the XML tag,
	   with an ID that matches the embedded div ID. */
	foreach($item["annotations"] as $annotation)
	{
		$annotationHTML = htmlentities("<div id=\"" . md5($annotation["content"] . $annotation["author"]) . "\" class=\"google-reader-annotation\"><blockquote><p>{$annotation["content"]}</p><p class=\"author\">{$annotation["author"]}</p></blockquote></div>", ENT_COMPAT, "UTF-8");
		$content = $annotationHTML . $content;
		$rssAnnotation = $rssItem->addChild("annotation", $annotation["content"]);
		$rssAnnotation->addAttribute("id", md5($annotation["content"] . $annotation["author"]));
		$rssAnnotation->addAttribute("author", $annotation["author"]);
		$rssAnnotation->addAttribute("userId", $annotation["userId"]);
		$rssAnnotation->addAttribute("profileId", $annotation["profileId"]);
		$rssAnnotation->addAttribute("profileCardParams", $annotation["profileCardParams"]);
	}
 
	/* again, sometimes content is in content, sometimes it's in the
	   summary element */
	$rssContent = $rssItem->addChild("encoded", $content, $namespaces["content"]);
	if (isset($item["content"]))
	{
		$rssContent->addAttribute("direction", $item["content"]["direction"]);
	}
	if (isset($item["summary"]))
	{
		$excerpt = $rssItem->addChild("encoded", $item["summary"]["content"], $namespaces["excerpt"]);
		$excerpt->addAttribute("direction", $item["summary"]["direction"]);
	}
 
	/* more Google reader metadata, this time about the original feed
	   that the item came from -- which is used above to format the
	   linkback that is embedded a the start of the content */
	$origin = $rssItem->addChild("origin");
	$origin->addAttribute("streamId", $item["origin"]["streamId"]);
	$origin->addAttribute("title", $item["origin"]["title"]);
	$origin->addAttribute("htmlUrl", $item["origin"]["htmlUrl"]);
 
	/* it's not clear to me whether the published or modified date is
	   when the original post was published or when the item <span class="hiddenGrammarError" pre="item ">was
	   shared</span> -- I think when published refers to when it was shared. */
	$postDate = $rssItem->addChild("post_date", date("Y-m-d G:i:s", $item["published"]), $namespaces["wp"]);
	$postDate->addAttribute("json", $item["published"]);
	$rssItem->addChild("comment_status", "open", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("ping_status", "open", $namespaces["wp"]);		// TODO make configurable
 
	/* make a Wordpress friendly title slug for the post */
	if (isset($item["title"]))
	{
		$slug = sluggify($item["title"]);
	}
	else
	{
		/* if no title, generate the slug from the timestamp */
		$slug = date("Y-m-d-G-i-s", $item["published"]);
	}
 
	/* make sure that our slug  is unique -- add a counter to the end
	   if it is not, and track those counter values in $post_names[] */
	if (isset($post_names[$slug]))
	{
		$post_names[$slug]++;
		$slug .= "-" . $post_names[$slug];
	}
	else
	{
		$post_names[$slug] = 0;
	}
	$rssItem->addChild("post_name", $slug, $namespaces["wp"]);
 
	/* more Wordpress metadata -- all of which could be tweaked */
	$rssItem->addChild("status", "publish", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addchild("post_parent", 0, $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("menu_order", 0, $namespaces["wp"]);		// TODO make configurable
	$rssItem->addChild("post_type", "post", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("post_password", "", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("is_sticky", 0, $namespaces["wp"]);		// TODO make configurable
 
	/* convert categories to post tags -- nota bene that Google Reader
	   has conflated the reader's categories with the original post's
	   tags, creating a... mish-mash. */
	foreach($item["categories"] as $category)
	{
		if (!preg_match("|.*/com\.google/.*|", $category))
		{
			$cleanCategory = $category;
			$cleanCategory = preg_replace("|user/\d+/label/(.*)|", "$1", $cleanCategory);
			$rssCategory = $rssItem->addChild("category", htmlentities($cleanCategory, ENT_COMPAT, "UTF-8"));
			$rssCategory->addAttribute("domain", "post_tag");
			$rssCategory->addAttribute("nicename", sluggify($cleanCategory));
			$rssCategory->addAttribute("json", $category);
		}
	}
 
	/* add comments -- note that for privacy reasons, while the
	   commenter's metadata is added as an XML tag, it is not embedded
	   in the Wordpress-readable wp:comment tags */
	foreach($item["comments"] as $comment)
	{
		$rssComment = $rssItem->addChild("comment", "", $namespaces["wp"]);
		$rssComment->addAttribute("id", $comment["id"]);
		$commentContent = $rssComment->addChild("comment_content", $comment["htmlContent"], $namespaces["wp"]);
		$commentContent->addAttribute("plainContent", $comment["plainContent"]);
		$author = $rssComment->addChild("comment_author", $comment["author"], $namespaces["wp"]);
		$author->addAttribute("userId", $comment["userId"]);
		$author->addAttribute("profileId", $comment["profileId"]);
		$author->addAttribute("profileCardParams", $comment["profileCardParams"]);
		$author->addAttribute("venueStreamid", $comment["venueStreamId"]);
		$commentDate = $rssComment->AddChild("comment_date", $comment["createdTime"], $namespaces["wp"]);
		$commentDate->addAttribute("modifiedTime", $comment["modifiedTime"]);
		$rssComment->addAttribute("isSpam", $comment["isSpam"]);
	}
}
 
/* dump the converted XML out as a file */
header ("Content-type: text/xml");
echo $rss->asXML();
file_put_contents("shared-items.xml", $rss->asXML());

November 27th, 2011

Posted In: How To, Social Bookmarking, Social Media, Useful Tools

Tags: , , , , , , ,

For the last couple of months, one of my high school classes at Jewish Day School has been working on building an interactive tool about the Six Day War for a middle school curriculum unit. They have put a lot of work into researching their data and laying it out in Google Earth, and now we’re putting together a tour of the data for the middle school students, that will also teach them a little about how to use Google Earth. It’s been enormously fun.

But. But, we now have to do some programming or some very careful recording in Google Earth to create this tour (a là Al Gore’s Climate Change tours). And my students are far more interested in the design end of things than in the coding end of things. They’re great at what they do, but niggling coding details give them head aches.

So I built a Google Docs spreadsheet, into which they can plug various pertinent information and out of which comes valid KML instructions that will define the tour. I’m a little proud of this — I don’t think that there’s anything else quite like it out there. There are three kinds of information that need to be entered:

  1. SoundCue — any audio that should be played in the background. You can either enter the whole URL to the MP3 file that should be played, or just the file name (and make sure that you enter the path in which all the files can be found on the KML Generator tab). You also need to tell us both when the cue should be played and how long it should last.
  2. AnimatedUpdate — anything that should be happening in Google Earth in terms of showing photo overlays or placemarks or polygons or what-have-you. Enter what should happen, and when (and, if relevant, for how long).
  3. FlyTo — any time we change where (or when) we’re looking at Google Earth. Enter when and how long the transition should take, and the date (and time) that we’re meant to be looking at.

The AnimatedUpdate and the FlyTo also expect the user to enter KML code — which can be copy-and-pasted from Google Earth KML exports for each transition. Happily, if something needs to happen more than once, entering the KML for the first instance will automatically populate future instances. In addition, the pauses necessary to synch up all of the action are calculated by the spreadsheet.

Update March 25, 2011: The spreadsheet above is our live work (since I kept updating the spreadsheet after this post).

The end result is both a visual timeline of the tour (helpful for debugging any weird errors) and also KML code that can be copy-and-pasted out of Google Docs and into a KML file. (Caveat emptor: depending on what you’re pasting the KML into — I like Xmplify — you may see that there is a leading and trailing quote that need to be manually deleted.)

Right now the spreadsheet can handle tours up to four minutes and 10 seconds in length (250 seconds, for those of you keeping score at home). This is because I was originally copying the KML out of the KML Orderer worksheet, and Google Docs supports pasting of up to 1000 rows in total. You’ll see that the current KML worksheet is a single cell, getting around this limitation, but I didn’t bother to extend any of the other worksheets. Just make sure you fill down all the formulas if you extend any of the sheets!

Here’s a link to a scratch copy of the spreadsheet. Feel free to copy and use it for your own purposes — let me know how you used it!

February 12th, 2011

Posted In: Educational Technology, How To, Social Media, Useful Tools

Tags: , , , , , ,

This post is part of a series that are components of my “Expert Plan” at my school, looking to create a shared resource for my colleagues as the school moves towards greater adoption of laptops and technology in our pedagogy.

The Model

This started out as simply a way of addressing my own personal predilections: I’m tired of trying to build a syllabus on a calendar (the calendar doesn’t present the syllabus in a useful view), or in a spreadsheet (changing dates becomes a chore), so I decided to separate the various issues out and automate as much as I could. The result is a Google Docs spreadsheet that I embed on my course web site, which allows me to think in terms of units and lessons, and which automatically sequences lessons on to class meetings and updates me with notes about those specific days (Moadim L’Simcha, vacation days, etc.).

This is an example of one of my syllabi as the students see it:

In Practice

This turned out to be some pro-level spreadsheet work. I like working in spreadsheets. Not everyone likes working in spreadsheets. This link takes you to a “scratch” version of one of my syllabi (you’re welcome to edit it to see changes — this isn’t live. The organization is thus:

  • There is a worksheet for each unit of the syllabus, named sequentially Unit 1, Unit 2, Unit 3, etc. (The best way to create a new unit is to duplicate an old one and replace the information).
  • On a unit worksheet, a few of the columns are automatically filled in. You just have to worry about editing the title of the lesson, the lesson description, and the assignment summary. Everything else is filled in automatically.
  • The integrated view of all the units, sequenced together and lined up to days with notes is the Syllabus worksheet.
  • The Meetings worksheet is just a list of days when the class meets (which I entered manually) and any notes about that day specifically that might be helpful for lesson planning.
  • There are a bunch of “working” sheets that you can look at, but don’t edit — they’re collating and organizing all of the units automatically.

Reflection

This was way more work than it was worth for a single syllabus. But as a tool that I intend to reuse again and again, I’m pretty happy with it and feel good about the investment. It is mildly idiosyncratic, in the sense that it meets my specific needs. But it could be used as a model for other people’s style of syllabus design, separating the schedule from the concepts in a way that makes visualizing the lesson flow much, much easier.

November 22nd, 2009

Posted In: "Expert Plan", Educational Technology, How To, Teaching, Useful Tools

Tags: , , , , , , , , ,


This post is part of a series that are components of my “Expert Plan” at my school, looking to create a shared resource for my colleagues as the school moves towards greater adoption of laptops and technology in our pedagogy.

This AppleScript application converts any word processing files that Pages can open into PDFs. This application will only work on Macs.

Use

To use this application, drag a icon(s) of a file or group of files on to the icon for the application. When asked, pick which folder you would like to save the PDFs into. As the application runs, if Pages cannnot open a particular file, you will see a message warning you of this. When the application completes, it will display a list of all the files that could not be converted (or simply quit if all of the files were converted).

Install

To install this application, click the link below to download it as a ZIP archive. Double-click the “Convert Word Processing Files to PDFs.zip” icon to expand the ZIP archive and drag the application icon to where you want to use it.

Download

November 22nd, 2009

Posted In: "Expert Plan", Educational Technology, How To, Useful Tools

Tags: , , , , , , , , ,

css.php