Posts tagged script
Linking to a Cell in a Google Spreadsheet
1Because I can’t stop playing with something that already works fine, I’ve continued to tweak my syllabus-generating spreadsheet. With the addition of an external script, I’m able to link directly to a specific cell in the spreadsheet — letting me share a link to my syllabi that takes people not just to the overall document, but to the current day (or the next day that’s closest to the current day) in the syllabus. The script could be used to link to any cell in any published Google Spreadsheet, with the caveat that if there are multiple cells with that match that value, the link will take you to the first appearance of the value in the spreadsheet.
$url = $_REQUEST["url"]; $key = $_REQUEST["key"]; $anchor = $_REQUEST["anchor"]; $html = file_get_contents($url); $output = str_replace($key, "<a name="$anchor"></a>$key", $html); $output = str_replace("href='/", "href='https://docs.google.com/", $output); header("Content-Type: text/html"); echo $output; |
This takes a GET URL of the format http://server/script?url=[url]&key=[key]&anchor=[anchor]#[anchor], where…
- The URL is the URL of the published Google spreadsheet location.
- The key is the text to search for in the spreadsheet (so you could link to anything in the published HTML, but it’s easiest to link to a unique value in a cell).
- The anchor is the name of the HREF anchor you will be creating (note that you then need to, well… link to that anchor).
Nota Bene: You have to add your own link to the anchor — this is a quick, slapped-together connection to my script.
Update: I was sitting here staring at the script, trying to figure out why I hadn’t put the anchor tag around the key, and then I realized what’s going on. The key (on my spreadsheet) is formatted to be white text on a white background (Rothko-style, if you will). If it gets converted to an anchor, its styling is affected and (without more work in the script) it turns blue and underlined. Lame. So… the anchor goes before the key, so the key’s CSS style won’t be affected.
This is why you document your code. Even when it’s short.
Transmogrifying those Google Reader JSON dumps into something useful
9For the last few years (my JSON feed tells me: since 2008), I have been tagging and annotating articles of interest as they passed before my eyes in Google Reader. This served a two-fold purpose:
- I could find them again later, easily, because they were tagged and annotated.
- I could share an RSS feed with those annotations to particular interest groups that I worked with (e.g. anything tagged “for robotics” would show up on my advanced computer science class’ portal page, or anything tagged “for academic computing” would show up on my school home page).
This was a great way to share (and manage) resources. Granted, much of what passed before my eyes in Google Reader was trivial and not of lasting value, but this filtering allowed me to hang on to at least a few gems for future reference.
And then Google Reader got the Google+ treatment and sharing items broke. But you could download a JSON dump of all the items that you had ever shared. It wasn’t entirely clear what you could do with this JSON dump, but… there it was. And then: I realized that all of my other information is stashed on my web server (and that I have become increasingly distrustful of relying on cloud services to maintain my data and workflows — e.g. my weekly backup of all my Google Docs… just in case).
Wouldn’t it be handy to import that JSON feed into a new blog on my server? So I wrote a PHP script that converts (at least my) Google Reader JSON dump into an XML file that WordPress can import as a list of posts. With the tags and annotations converted over. In fact, with all of the data in the JSON dump embedded in the XML file (although WordPress doesn’t read all of it).
This comes with a few caveats:
- For items that came from blogs with a full feed, the result is a republication of the original post — which feels ethically dubious to me. (I have made my new blog of Google Reader shared items private, so that I have the data but I’m not sharing it with the world).
- I’ve made guesses as to how to treat some of Google’s data. Reasoned, educated guesses, but guesses nonetheless. For example, I’m not super-clear on which dates in the file correspond with what events — does a publication date refer to when the item was shared or the original post was posted?
- I’ve added in some arbitrary (and therefore, ideally, eventually, configurable) WordPress tags to make the import go more smoothly. Where I have done that, I mark it in the script as a TODO item. (And, in truth, I didn’t really test to see if all of these items were necessary.)
- The original authors of the posts are transfered to the XML file, which means that when the actual import into WordPress is done, you will have the option to either laboriously create a new user for each distinct author or simply revert authorship to the currently logged-in WordPress user. It doesn’t seem like WordPress has a format for exporting or importing users (or, at least, my cursory search didn’t find it). Clearly an ancillary SQL query could be generated that pre-populated the WordPress database with the users that the XML file refers to. But I haven’t bothered to do that.
- You’ll need your own PHP-compatible webserver to run the script, since I have been quick and dirty and simply imported the JSON file from and exported the XML file to the script’s local directory. And I have no interest in setting up my world-facing webserver to take the traffic hit of processing other people’s multi-megabyte JSON dumps.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | /**********************************************************************
* Google Reader to Wordpress
*
* 2011-11-27
* Seth Battis (seth@battis.net)
*
* This script takes the output of Google Reader's JSON export of
* shared items and converts it into an XML file that can be imported
* into a Wordpress blog as posts. All of the data in the original JSON
* file is preserved in the XML file, either by transfering it to
* an appropriate format (e.g. Google Reader categories are converted
* to WordPress post tags) or simply as an additional XML tag (e.g. the
* Google Reader commentInfo metadata for recent shared items). In
* situations where actual data has to be converted to make it readable
* for WordPress, the original data is included as the JSON attribute
* of that tag (e.g. timestamps and categories).
*
* As currently written, the script looks in its local directory for
* Google Reader "shared-items.json" file and generates a matching
* "shared-items.xml" file, also in its local directory.
*
* There are a number of potentially configurable (i.e. arbitrary)
* values marked as TODO.
*
* Caveat emptor: this has been tested against my ~1000 item Google
* Reader shared items feed and on my WordPress 3.2.1 site. I would
* presume that it should work fairly well for others, but make no
* guarantees!
*********************************************************************/
/* returns a Wordpress slug-version of the given text (only
alphanumeric characters and dashes) */
function sluggify($text)
{
return preg_replace("|[^a-z0-9]+|", "-", strtolower($text));
}
/* SimpleXML doesn't really support namespaces unless you force it */
$xml = '<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.1/"
></rss>';
$rss = new SimpleXMLElement($xml);
$namespaces = $rss->getDocNamespaces(true);
/* load the Google Reader JSON file */
$file = file_get_contents("shared-items.json");
$json = json_decode($file, true);
/* Wordpress will choke if our post names aren't unique, so we track
them separately */
$post_names = array();
/* header information describing the file itself */
$channel = $rss->addChild("channel");
$channel->addAttribute("direction", $json["direction"]);
$channel->addAttribute("id", $json["id"]);
$channel->addAttribute("self", $json["self"][0]["href"]);
$channel->addAttribute("author", $json["author"]);
$channel->addChild("title", $json["title"]);
$channel->addChild("link");
$channel->addChild("description");
$pubDate = $channel->addChild("pubDate", gmdate("D, j M Y G:i:s O", $json["updated"]));
$pubDate->addAttribute("json", $json["updated"]);
$channel->addChild("language");
$channel->addChild("wxr_version", "1.1", $namespaces["wp"]);
/* run through the list of items and add them to the XML */
foreach ($json["items"] as $item)
{
$rssItem = $channel->addChild("item");
/* a bunch of Google Reader-specific metadata */
if (isset($item["isReadStateLocked"]))
{
$rssItem->addAttribute("isReadStateLocked", $item["isReadStateLocked"]);
}
$rssItem->addAttribute("crawlTimeMsec", $item["crawlTimeMsec"]);
$rssItem->addAttribute("timestampUsec", $item["timestampUsec"]);
$rssItem->addAttribute("id", $item["id"]);
if (isset($item["commentInfo"]))
{
while (list($commentInfoKey, $commentInfoValue) = each($item["commentInfo"]))
{
$commentInfo = $rssItem->addChild("commentInfo", $commentInfoKey);
$commentInfo->addAttribute("permalinkUrl", $commentInfoValue["permalinkUrl"]);
$commentInfo->addAttribute("commentState", $commentInfoValue["commentState"]);
}
}
/* annoyingly, not every item has its content in the content
element -- sometimes it's in the summary element (and once in a
while, it's just not there). I think this is an artifact of how
the original RSS feeds were constructed. I think. */
if (isset($item["content"]))
{
$content = $item["content"]["content"];
}
else if (isset($item["summary"]["content"]))
{
$content = $item["summary"]["content"];
}
else
{
$content = "";
}
/* sometimes items don't even have titles */
if (isset($item["title"]))
{
$rssItem->addChild("title", $item["title"]);
}
/* most items store the original linkback information in the
alternate element -- Wordpress won't honor this link tag when
it's imported (i.e. it won' treat it like the Daring Fireball
feed), so I have embedded a more descriptive linkback after the
annotations at the start of the content, using some information
from the origin element. The linkback is in the
"google-reader-alternate-href" div (for easy CSS-wrangling!) */
if (isset($item["alternate"]))
{
$rssItem->addChild("link", htmlentities($item["alternate"][0]["href"], ENT_COMPAT, "UTF-8"));
$content = htmlspecialchars("<div class=\"google-reader-alternate-href\"><p>Originally posted at <a href=\"{$item["alternate"][0]["href"]}\">{$item["origin"]["title"]}</a></p></div>", ENT_COMPAT, "UTF-8") . $content;
}
/* I haven't bothered to figure out if the dates are really GMT or
localized -- GMT was an easier assumption to make */
$pubDate = $rssItem->addChild("pubDate", gmdate("D, j M Y G:i:s O", $item["published"]));
$pubDate->addAttribute("json", $item["published"]);
/* not every item has an a author, either -- again, an artifact of
the original RSS feeds */
if (isset($item["author"]))
{
$rssItem->addChild("creator", $item["author"], $namespaces["dc"]);
}
/* annotations were tricky -- I have added them as their own XML
tags _and_ inserted them within a "google-reader-annotation" div
at the top of the post content (to match the original format on-
screen). All of the original data is preserved in the XML tag,
with an ID that matches the embedded div ID. */
foreach($item["annotations"] as $annotation)
{
$annotationHTML = htmlentities("<div id=\"" . md5($annotation["content"] . $annotation["author"]) . "\" class=\"google-reader-annotation\"><blockquote><p>{$annotation["content"]}</p><p class=\"author\">{$annotation["author"]}</p></blockquote></div>", ENT_COMPAT, "UTF-8");
$content = $annotationHTML . $content;
$rssAnnotation = $rssItem->addChild("annotation", $annotation["content"]);
$rssAnnotation->addAttribute("id", md5($annotation["content"] . $annotation["author"]));
$rssAnnotation->addAttribute("author", $annotation["author"]);
$rssAnnotation->addAttribute("userId", $annotation["userId"]);
$rssAnnotation->addAttribute("profileId", $annotation["profileId"]);
$rssAnnotation->addAttribute("profileCardParams", $annotation["profileCardParams"]);
}
/* again, sometimes content is in content, sometimes it's in the
summary element */
$rssContent = $rssItem->addChild("encoded", $content, $namespaces["content"]);
if (isset($item["content"]))
{
$rssContent->addAttribute("direction", $item["content"]["direction"]);
}
if (isset($item["summary"]))
{
$excerpt = $rssItem->addChild("encoded", $item["summary"]["content"], $namespaces["excerpt"]);
$excerpt->addAttribute("direction", $item["summary"]["direction"]);
}
/* more Google reader metadata, this time about the original feed
that the item came from -- which is used above to format the
linkback that is embedded a the start of the content */
$origin = $rssItem->addChild("origin");
$origin->addAttribute("streamId", $item["origin"]["streamId"]);
$origin->addAttribute("title", $item["origin"]["title"]);
$origin->addAttribute("htmlUrl", $item["origin"]["htmlUrl"]);
/* it's not clear to me whether the published or modified date is
when the original post was published or when the item <span class="hiddenGrammarError" pre="item ">was
shared</span> -- I think when published refers to when it was shared. */
$postDate = $rssItem->addChild("post_date", date("Y-m-d G:i:s", $item["published"]), $namespaces["wp"]);
$postDate->addAttribute("json", $item["published"]);
$rssItem->addChild("comment_status", "open", $namespaces["wp"]); // TODO make configurable
$rssItem->addChild("ping_status", "open", $namespaces["wp"]); // TODO make configurable
/* make a Wordpress friendly title slug for the post */
if (isset($item["title"]))
{
$slug = sluggify($item["title"]);
}
else
{
/* if no title, generate the slug from the timestamp */
$slug = date("Y-m-d-G-i-s", $item["published"]);
}
/* make sure that our slug is unique -- add a counter to the end
if it is not, and track those counter values in $post_names[] */
if (isset($post_names[$slug]))
{
$post_names[$slug]++;
$slug .= "-" . $post_names[$slug];
}
else
{
$post_names[$slug] = 0;
}
$rssItem->addChild("post_name", $slug, $namespaces["wp"]);
/* more Wordpress metadata -- all of which could be tweaked */
$rssItem->addChild("status", "publish", $namespaces["wp"]); // TODO make configurable
$rssItem->addchild("post_parent", 0, $namespaces["wp"]); // TODO make configurable
$rssItem->addChild("menu_order", 0, $namespaces["wp"]); // TODO make configurable
$rssItem->addChild("post_type", "post", $namespaces["wp"]); // TODO make configurable
$rssItem->addChild("post_password", "", $namespaces["wp"]); // TODO make configurable
$rssItem->addChild("is_sticky", 0, $namespaces["wp"]); // TODO make configurable
/* convert categories to post tags -- nota bene that Google Reader
has conflated the reader's categories with the original post's
tags, creating a... mish-mash. */
foreach($item["categories"] as $category)
{
if (!preg_match("|.*/com\.google/.*|", $category))
{
$cleanCategory = $category;
$cleanCategory = preg_replace("|user/\d+/label/(.*)|", "$1", $cleanCategory);
$rssCategory = $rssItem->addChild("category", htmlentities($cleanCategory, ENT_COMPAT, "UTF-8"));
$rssCategory->addAttribute("domain", "post_tag");
$rssCategory->addAttribute("nicename", sluggify($cleanCategory));
$rssCategory->addAttribute("json", $category);
}
}
/* add comments -- note that for privacy reasons, while the
commenter's metadata is added as an XML tag, it is not embedded
in the Wordpress-readable wp:comment tags */
foreach($item["comments"] as $comment)
{
$rssComment = $rssItem->addChild("comment", "", $namespaces["wp"]);
$rssComment->addAttribute("id", $comment["id"]);
$commentContent = $rssComment->addChild("comment_content", $comment["htmlContent"], $namespaces["wp"]);
$commentContent->addAttribute("plainContent", $comment["plainContent"]);
$author = $rssComment->addChild("comment_author", $comment["author"], $namespaces["wp"]);
$author->addAttribute("userId", $comment["userId"]);
$author->addAttribute("profileId", $comment["profileId"]);
$author->addAttribute("profileCardParams", $comment["profileCardParams"]);
$author->addAttribute("venueStreamid", $comment["venueStreamId"]);
$commentDate = $rssComment->AddChild("comment_date", $comment["createdTime"], $namespaces["wp"]);
$commentDate->addAttribute("modifiedTime", $comment["modifiedTime"]);
$rssComment->addAttribute("isSpam", $comment["isSpam"]);
}
}
/* dump the converted XML out as a file */
header ("Content-type: text/xml");
echo $rss->asXML();
file_put_contents("shared-items.xml", $rss->asXML()); |
Google Mail Envelopes Implementation
2
I came across Rahul Mahtani and Yofred Moik’s conceptual design of a Google Mail Envelope a few days ago and was… instantly captivated. I’m not sure if it’s just the aesthetics of a design on the envelope in general, the way it hearkens back to an old school airmail envelope, or the conceptual neatness of the route between the two addresses. I just know that I love it and I want it.
So, I spent some time making a version of it.
Right now, my implementation is very much hacked together (I was teaching myself the Google Maps API as I went — it’s not hard, but it’s not familiar vocabulary, yet — I have a few other projects that will get me more expert soon, I hope). The things to know are:
- Change the addresses and the map will (should) update to reflect the new information.
- The first line of the address is removed on the assumption that it’s a name and not part of the address (and users are cruelly constrained to 3-line addresses right now).
- The resulting envelope template is pretty much exactly a full-bleed letter-size page. Which means that printing it is a hassle.
- I strongly suspect that there should be a dampening-down of the colors on the map so that the USPS can automatically scan the right information easily. My recollection from constructing bulk mailings a few years back is that the address just needs to have a bit of white space around it, but having a mess of other geographic information scattered nearby may not be helpful…
- The snazzy orientation of the address infoWindows on the original design hasn’t happened yet. I think I have an idea of how to do it with some CSS (they won’t be “real” infoWindows), but haven’t taken the time to fiddle with it yet.
- There’s something hinky with the borders of the side-flaps due to the not-yet-standard border-radius CSS.
More to come as way opens.
Convert Word Processing Files to PDF
0![]()
This post is part of a series that are components of my “Expert Plan” at my school, looking to create a shared resource for my colleagues as the school moves towards greater adoption of laptops and technology in our pedagogy.
This AppleScript application converts any word processing files that Pages can open into PDFs. This application will only work on Macs.
Use
To use this application, drag a icon(s) of a file or group of files on to the icon for the application. When asked, pick which folder you would like to save the PDFs into. As the application runs, if Pages cannnot open a particular file, you will see a message warning you of this. When the application completes, it will display a list of all the files that could not be converted (or simply quit if all of the files were converted).
Install
To install this application, click the link below to download it as a ZIP archive. Double-click the “Convert Word Processing Files to PDFs.zip” icon to expand the ZIP archive and drag the application icon to where you want to use it.