battis.net and I'm all out of bubble gum…

For the last few years (my JSON feed tells me: since 2008), I have been tagging and annotating articles of interest as they passed before my eyes in Google Reader. This served a two-fold purpose:

  1. I could find them again later, easily, because they were tagged and annotated.
  2. I could share an RSS feed with those annotations to particular interest groups that I worked with (e.g. anything tagged “for robotics” would show up on my advanced computer science class’ portal page, or anything tagged “for academic computing” would show up on my school home page).

This was a great way to share (and manage) resources. Granted, much of what passed before my eyes in Google Reader was trivial and not of lasting value, but this filtering allowed me to hang on to at least a few gems for future reference.

And then Google Reader got the Google+ treatment and sharing items broke. But you could download a JSON dump of all the items that you had ever shared. It wasn’t entirely clear what you could do with this JSON dump, but… there it was. And then: I realized that all of my other information is stashed on my web server (and that I have become increasingly distrustful of relying on cloud services to maintain my data and workflows — e.g. my weekly backup of all my Google Docs… just in case).

Wouldn’t it be handy to import that JSON feed into a new blog on my server? So I wrote a PHP script that converts (at least my) Google Reader JSON dump into an XML file that WordPress can import as a list of posts. With the tags and annotations converted over. In fact, with all of the data in the JSON dump embedded in the XML file (although WordPress doesn’t read all of it).

This comes with a few caveats:

  • For items that came from blogs with a full feed, the result is a republication of the original post — which feels ethically dubious to me. (I have made my new blog of Google Reader shared items private, so that I have the data but I’m not sharing it with the world).
  • I’ve made guesses as to how to treat some of Google’s data. Reasoned, educated guesses, but guesses nonetheless. For example, I’m not super-clear on which dates in the file correspond with what events — does a publication date refer to when the item was shared or the original post was posted?
  • I’ve added in some arbitrary (and therefore, ideally, eventually, configurable) WordPress tags to make the import go more smoothly. Where I have done that, I mark it in the script as a TODO item. (And, in truth, I didn’t really test to see if all of these items were necessary.)
  • The original authors of the posts are transfered to the XML file, which means that when the actual import into WordPress is done, you will have the option to either laboriously create a new user for each distinct author or simply revert authorship to the currently logged-in WordPress user. It doesn’t seem like WordPress has a format for exporting or importing users (or, at least, my cursory search didn’t find it). Clearly an ancillary SQL query could be generated that pre-populated the WordPress database with the users that the XML file refers to. But I haven’t bothered to do that.
  • You’ll need your own PHP-compatible webserver to run the script, since I have been quick and dirty and simply imported the JSON file from and exported the XML file to the script’s local directory. And I have no interest in setting up my world-facing webserver to take the traffic hit of processing other people’s multi-megabyte JSON dumps.
With that said, here is the script, as it stands this morning.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
/**********************************************************************
 * Google Reader to Wordpress
 *
 * 2011-11-27
 * Seth Battis (seth@battis.net)
 * 
 * This script takes the output of Google Reader's JSON export of
 * shared items and converts it into an XML file that can be imported
 * into a Wordpress blog as posts. All of the data in the original JSON
 * file is preserved in the XML file, either by transfering it to
 * an appropriate format (e.g. Google Reader categories are converted
 * to WordPress post tags) or simply as an additional XML tag (e.g. the
 * Google Reader commentInfo metadata for recent shared items). In
 * situations where actual data has to be converted to make it readable
 * for WordPress, the original data is included as the JSON attribute
 * of that tag (e.g. timestamps and categories).
 *
 * As currently written, the script looks in its local directory for
 * Google Reader "shared-items.json" file and generates a matching
 * "shared-items.xml" file, also in its local directory.
 *
 * There are a number of potentially configurable (i.e. arbitrary)
 * values marked as TODO.
 *
 * Caveat emptor: this has been tested against my ~1000 item Google
 * Reader shared items feed and on my WordPress 3.2.1 site. I would
 * presume that it should work fairly well for others, but make no
 * guarantees!
 *********************************************************************/
 
/* returns a Wordpress slug-version of the given text (only
   alphanumeric characters and dashes) */
function sluggify($text)
{
	return preg_replace("|[^a-z0-9]+|", "-", strtolower($text));
}
 
/* SimpleXML doesn't really support namespaces unless you force it */
$xml = '<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
	xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.1/"
></rss>';
$rss = new SimpleXMLElement($xml);
$namespaces = $rss->getDocNamespaces(true);
 
/* load the Google Reader JSON file */
$file = file_get_contents("shared-items.json");
$json = json_decode($file, true);
 
/* Wordpress will choke if our post names aren't unique, so we track
   them separately */
$post_names = array();
 
/* header information describing the file itself */
$channel = $rss->addChild("channel");
$channel->addAttribute("direction", $json["direction"]);
$channel->addAttribute("id", $json["id"]);
$channel->addAttribute("self", $json["self"][0]["href"]);
$channel->addAttribute("author", $json["author"]);
$channel->addChild("title", $json["title"]);
$channel->addChild("link");
$channel->addChild("description");
$pubDate = $channel->addChild("pubDate", gmdate("D, j M Y G:i:s O", $json["updated"]));
$pubDate->addAttribute("json", $json["updated"]);
$channel->addChild("language");
$channel->addChild("wxr_version", "1.1", $namespaces["wp"]);
 
/* run through the list of items and add them to the XML */
foreach ($json["items"] as $item)
{
	$rssItem = $channel->addChild("item");
 
	/* a bunch of Google Reader-specific metadata */
	if (isset($item["isReadStateLocked"]))
	{
		$rssItem->addAttribute("isReadStateLocked", $item["isReadStateLocked"]);
	}
	$rssItem->addAttribute("crawlTimeMsec", $item["crawlTimeMsec"]);
	$rssItem->addAttribute("timestampUsec", $item["timestampUsec"]);
	$rssItem->addAttribute("id", $item["id"]);
	if (isset($item["commentInfo"]))
	{
		while (list($commentInfoKey, $commentInfoValue) = each($item["commentInfo"]))
		{
			$commentInfo = $rssItem->addChild("commentInfo", $commentInfoKey);
			$commentInfo->addAttribute("permalinkUrl", $commentInfoValue["permalinkUrl"]);
			$commentInfo->addAttribute("commentState", $commentInfoValue["commentState"]);
		}
	}
 
	/* annoyingly, not every item has its content in the content
	   element -- sometimes it's in the summary element (and once in a
	   while, it's just not there). I think this is an artifact of how
	   the original RSS feeds were constructed. I think. */
	if (isset($item["content"]))
	{
		$content = $item["content"]["content"];
	}
	else if (isset($item["summary"]["content"]))
	{
		$content = $item["summary"]["content"];
	}
	else
	{
		$content = "";
	}
 
	/* sometimes items don't even have titles */
	if (isset($item["title"]))
	{
		$rssItem->addChild("title", $item["title"]);
	}
 
	/* most items store the original linkback information in the
	   alternate element -- Wordpress won't honor this link tag when
	   it's imported (i.e. it won' treat it like the Daring Fireball
	   feed), so I have embedded a more descriptive linkback after the
	   annotations at the start of the content, using some information
	   from the origin element. The linkback is in the
	   "google-reader-alternate-href" div (for easy CSS-wrangling!) */
	if (isset($item["alternate"]))
	{
		$rssItem->addChild("link", htmlentities($item["alternate"][0]["href"], ENT_COMPAT, "UTF-8"));
		$content = htmlspecialchars("<div class=\"google-reader-alternate-href\"><p>Originally posted at <a href=\"{$item["alternate"][0]["href"]}\">{$item["origin"]["title"]}</a></p></div>", ENT_COMPAT, "UTF-8") . $content;
	}
 
	/* I haven't bothered to figure out if the dates are really GMT or
	   localized -- GMT was an easier assumption to make */
	$pubDate = $rssItem->addChild("pubDate", gmdate("D, j M Y G:i:s O", $item["published"]));
	$pubDate->addAttribute("json", $item["published"]);
 
	/* not every item has an a author, either -- again, an artifact of
	   the original RSS feeds */
	if (isset($item["author"]))
	{
		$rssItem->addChild("creator", $item["author"], $namespaces["dc"]);
	}
 
	/* annotations were tricky -- I have added them as their own XML
	   tags _and_ inserted them within a "google-reader-annotation" div
	   at the top of the post content (to match the original format on-
	   screen). All of the original data is preserved in the XML tag,
	   with an ID that matches the embedded div ID. */
	foreach($item["annotations"] as $annotation)
	{
		$annotationHTML = htmlentities("<div id=\"" . md5($annotation["content"] . $annotation["author"]) . "\" class=\"google-reader-annotation\"><blockquote><p>{$annotation["content"]}</p><p class=\"author\">{$annotation["author"]}</p></blockquote></div>", ENT_COMPAT, "UTF-8");
		$content = $annotationHTML . $content;
		$rssAnnotation = $rssItem->addChild("annotation", $annotation["content"]);
		$rssAnnotation->addAttribute("id", md5($annotation["content"] . $annotation["author"]));
		$rssAnnotation->addAttribute("author", $annotation["author"]);
		$rssAnnotation->addAttribute("userId", $annotation["userId"]);
		$rssAnnotation->addAttribute("profileId", $annotation["profileId"]);
		$rssAnnotation->addAttribute("profileCardParams", $annotation["profileCardParams"]);
	}
 
	/* again, sometimes content is in content, sometimes it's in the
	   summary element */
	$rssContent = $rssItem->addChild("encoded", $content, $namespaces["content"]);
	if (isset($item["content"]))
	{
		$rssContent->addAttribute("direction", $item["content"]["direction"]);
	}
	if (isset($item["summary"]))
	{
		$excerpt = $rssItem->addChild("encoded", $item["summary"]["content"], $namespaces["excerpt"]);
		$excerpt->addAttribute("direction", $item["summary"]["direction"]);
	}
 
	/* more Google reader metadata, this time about the original feed
	   that the item came from -- which is used above to format the
	   linkback that is embedded a the start of the content */
	$origin = $rssItem->addChild("origin");
	$origin->addAttribute("streamId", $item["origin"]["streamId"]);
	$origin->addAttribute("title", $item["origin"]["title"]);
	$origin->addAttribute("htmlUrl", $item["origin"]["htmlUrl"]);
 
	/* it's not clear to me whether the published or modified date is
	   when the original post was published or when the item <span class="hiddenGrammarError" pre="item ">was
	   shared</span> -- I think when published refers to when it was shared. */
	$postDate = $rssItem->addChild("post_date", date("Y-m-d G:i:s", $item["published"]), $namespaces["wp"]);
	$postDate->addAttribute("json", $item["published"]);
	$rssItem->addChild("comment_status", "open", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("ping_status", "open", $namespaces["wp"]);		// TODO make configurable
 
	/* make a Wordpress friendly title slug for the post */
	if (isset($item["title"]))
	{
		$slug = sluggify($item["title"]);
	}
	else
	{
		/* if no title, generate the slug from the timestamp */
		$slug = date("Y-m-d-G-i-s", $item["published"]);
	}
 
	/* make sure that our slug  is unique -- add a counter to the end
	   if it is not, and track those counter values in $post_names[] */
	if (isset($post_names[$slug]))
	{
		$post_names[$slug]++;
		$slug .= "-" . $post_names[$slug];
	}
	else
	{
		$post_names[$slug] = 0;
	}
	$rssItem->addChild("post_name", $slug, $namespaces["wp"]);
 
	/* more Wordpress metadata -- all of which could be tweaked */
	$rssItem->addChild("status", "publish", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addchild("post_parent", 0, $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("menu_order", 0, $namespaces["wp"]);		// TODO make configurable
	$rssItem->addChild("post_type", "post", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("post_password", "", $namespaces["wp"]);	// TODO make configurable
	$rssItem->addChild("is_sticky", 0, $namespaces["wp"]);		// TODO make configurable
 
	/* convert categories to post tags -- nota bene that Google Reader
	   has conflated the reader's categories with the original post's
	   tags, creating a... mish-mash. */
	foreach($item["categories"] as $category)
	{
		if (!preg_match("|.*/com\.google/.*|", $category))
		{
			$cleanCategory = $category;
			$cleanCategory = preg_replace("|user/\d+/label/(.*)|", "$1", $cleanCategory);
			$rssCategory = $rssItem->addChild("category", htmlentities($cleanCategory, ENT_COMPAT, "UTF-8"));
			$rssCategory->addAttribute("domain", "post_tag");
			$rssCategory->addAttribute("nicename", sluggify($cleanCategory));
			$rssCategory->addAttribute("json", $category);
		}
	}
 
	/* add comments -- note that for privacy reasons, while the
	   commenter's metadata is added as an XML tag, it is not embedded
	   in the Wordpress-readable wp:comment tags */
	foreach($item["comments"] as $comment)
	{
		$rssComment = $rssItem->addChild("comment", "", $namespaces["wp"]);
		$rssComment->addAttribute("id", $comment["id"]);
		$commentContent = $rssComment->addChild("comment_content", $comment["htmlContent"], $namespaces["wp"]);
		$commentContent->addAttribute("plainContent", $comment["plainContent"]);
		$author = $rssComment->addChild("comment_author", $comment["author"], $namespaces["wp"]);
		$author->addAttribute("userId", $comment["userId"]);
		$author->addAttribute("profileId", $comment["profileId"]);
		$author->addAttribute("profileCardParams", $comment["profileCardParams"]);
		$author->addAttribute("venueStreamid", $comment["venueStreamId"]);
		$commentDate = $rssComment->AddChild("comment_date", $comment["createdTime"], $namespaces["wp"]);
		$commentDate->addAttribute("modifiedTime", $comment["modifiedTime"]);
		$rssComment->addAttribute("isSpam", $comment["isSpam"]);
	}
}
 
/* dump the converted XML out as a file */
header ("Content-type: text/xml");
echo $rss->asXML();
file_put_contents("shared-items.xml", $rss->asXML());

November 27th, 2011

Posted In: How To, Social Bookmarking, Social Media, Useful Tools

Tags: , , , , , , ,

A few days ago (well, maybe a couple weeks ago), I was chatting with one of my colleagues about how I go about testing out new plugins and themes for WordPress µ before loading them on our school blog server. It seems like documenting my process might be generally helpful, so…

To start with, I decided (after ten years of mucking out Apache config files and PHP extensions and custom MySQL installs — thank you so, so much Marc Liyange for your timely and helpful installers!), that I was a grown-up and could spend $60 on a tool that makes my life easier: I run MAMP Pro on my MacBook. This means that I have a generic Apache/PHP/MySQL stack that supports commonly-used PHP extensions, Apache configurations, etc. I have redirected the document root of my install to my regular user’s Sites directory in OS X (~/Sites) so that I have ready access to the backend files of for my test installs. The net result: WordPress’ famous “Five Minute Install” is now true of almost any LAMP-based web application — I had a five-minute install of Drupal, Moodle, Joomla… you name it.

I’ve also settled into using Coda ($99) to edit HTML/PHP source code, since I particularly like the built-in terminal and publishing management features.

With WordPress µ installed (which, I guess, is now calling itself WPMU or WordPress MU or even WordPress 3 in betas), I now do the following:

  • I install create a new blog for each new theme or plugin that I want to test out. I follow a pretty intuitive naming scheme: the URL for the blog is the URL for the plugin or theme, and the name of the blog is the name of the plugin or theme (so WordPress Hashcash is at …/wp-hashcash and named WordPress Hashcash).
  • As I create each new blog, I create a new user to be that blog’s administrator. I almost never use this login, but it means that I have one user who is matched to each blog. In doing this, I make heavy use of Gmail’s + modifiers, so new user emails look like mygmailaddress+talyn+wpmu+blogurl@gmail.com — this lets me catch and filter relevant emails easily on the other end. (I developed this system when I was testing plugins that sent email notifications). For the curious, Talyn is the name of my laptop (so I know which server is sending me email) and WPMU is the keyword to distinguish these emails from, say, Drupal notifications.
  • I also have six generic users that I add to most (not all — I add them as needed) blogs, each with their own standard privileges:
    • Anna “Annie” Administrator
    • Edward “Eddie” Editor
    • Allison “Allie” Author
    • Christine “Chrissy” Contributor
    • Samuel “Sammy” Subscriber
    • Nathan “Nate” No Privileges

    I actually included nicknames so that I could control for how different themes displayed usernames (since I’m thinking about FERPA and how it may apply to our students on our school blogserver).

  • I have one blog on which I never activate themes or plugins, which I lyrically call “Is this blog in the blast radius?” This is based on my experience installing Digress.it on WordPress µ at the start of the year (it hosed every blog on the server, rather than just the one where it was activated). I check this before I deem any test complete.
  • One tricky thing that I did was that I set up MAMP to run Apache and MySQL as my local user account on my MacBook, and I have set permissions on my Sites directory so that my local user has all privileges, as does the www group, and other users have read/execute privileges (chown -R seth ~/Sites; chgrp -R www ~/Sites; chmod -R 775 ~/Sites). This means that I usually don’t run into problems with web apps that want to move or create files. This is also, of course, totally insecure. Que sera, sera.
  • I have an extra blog set up on my WordPress µ install that runs Feed WordPress, and it republishes the feeds for all of the other blogs on the server tagged Note. This means that I can post something tagged Note to any blog that I’m working on and then have all my notes together in one place. Adding the subscriptions to the Feed WordPress blog is a manual step, but not prohibitively difficult. And it really does mean that I have one place for all of my notes on how things went (or didn’t went) in my WordPress µ testing. I have the feeds categorized as Plugins, Themes, Configuration and Hacks, since those are generally what I’m testing (and mostly Plugins, at that).
  • One hitch in my system is that I have opted to keep my system entirely up-to-date (I’m running WordPress µ 2.9.2 with the most recent versions of all my plugins), while our school blog server is still at 2.8.4a. Generally speaking, this hasn’t been much of a problem, but when I’m particularly concerned, I will sometimes check things out on a lingering 2.8.4a install before loading it.

May 18th, 2010

Posted In: How To

Tags: , , , , , , , , , ,

I just slapped together a very quick plugin for a teacher’s blog that adds a [category] shortcode to WordPress. Basically, it just passes through all of the attributes of the shortcode as parameters to wp_list_categories(), allowing the user to embed a list of blog categories in any page, post or widget. This feels like something that should already exist (but I couldn’t find it).

category_shortcode.php

April 23rd, 2010

Posted In: Blogs, How To

Tags: , , , , ,

As part of my education technology role at my school, I am a member of our high school “Laptop Leaders” group. A few weeks ago, at the end of our first quarter, the Laptop Leaders were asked to document the work they were doing, to create a shared resource, both for themselves and for other teachers. Ultimately, this is preparation for more large-scale adoption of laptops and technology in general as teaching tools in the high school.

The teachers in this Laptop Leaders group were selected last spring, so I joined the group late, at the beginning of the school year and had, really, only a sketchy plan for what I would be working on. The outline (lightly revised) is below. My intention is to share my various write-ups related to this process in this space.

Collaborative Writing and Editing

I’m working with students to develop a class wiki as a collaborative information source, with students contributing class notes, screencasts and other updates and expansions on course content.

Blogs

I’m working with students to use the class blog as a publication platform for ideas/questions relevant to the greater community in their discipline (e.g. develop [my class] blog into a discussion of [media and design] and related ideas in the outside world).

Social Bookmarking

I’m working with faculty (and students) to use social bookmarking tools (specifically Diigo) to create dynamic and annotated resources for each other (and for and by students).

Social Media

I’m working with faculty and students to develop personal learning networks that tie together all of these Web 2.0 tools to create an online identity and a group of “fellow travelers” studying and exploring the same area. In students’ case, we’re working on this as a class (blogging), but for faculty tools like Twitter (and personal blogs) may also be useful. Also looking at other sharing sites (e.g. Flickr) for use as collaborative tools.

Useful Tools

In the interests of sharing, when I was at my last school, I sat down and created an iusethis.com profile of the handy applications that I use day-to-day. I’ve added this to my profile [on the school wiki], along with a (slowly growing) list of tools that I’ve built for special purposes around school.

Updated November 22, 2009: I should mention that I have Bowdler-ized some of these posts to protect (at least a little), the identities of my students. When posted to our school wiki, there are a number of links to examples. If you pop me an email or a comment and identify yourself, I’m happy to share these examples. Just trying to do some due diligence with regard to my students’ privacy.

November 22nd, 2009

Posted In: "Expert Plan", Educational Technology, Teaching

Tags: , , , , , , , , , , , , , , , , , , , ,

css.php