20 Apr 2014

Migrating From Blogger.com to Hexo.io Static Site Generator

I found writing articles with Blogger.com had become slow and a little frustrating. So I decided to switch to Hexo.io as I can write articles anywhere I have a text editor (usually Emacs). Hexo also creates a responsive and fast static website, so when people want to read the articles (including myself when I have forgotten something) then they can do so quickly and across multiple devices. As its a static site, I can deploy it anywhere.

So how do I get all of that content I created out of Blogger and into Hexo. Luckily Hexo has a migration tool to make things easier

Hexo migration package

Hexo has a seperate tool called hexo-migrator to pull in content from an RSS feed and there is a more specific migrator for Wordpress. These migrators are installed as an npm package just like any other:

npm install hexo-migrator -g

Unforntunatley the npm packaged version of hexo-migrator failed when I tried to import from blogger, regardless of whether I used the blog URL or by downloading the XML file generated by the RSS feed. The error I got was already reported as an issue on the hexo-migrator Github site and a fix already applied. This fix had not yet been packaged up as a new npm version at the time of writing.

Hexo migration from Github

As a fix for the Blogger import problem exists in the Github repository, I installed the hexo migration tool directly from there. Node package manager allows you to install directly from a Github repository (handy when someone has not patch an npm package yet). So to install the latest version of hexo-migrator, I used the command:

npm install "git+https://github.com/hexojs/hexo-migrator-rss.git"

I used the https address for the Github repository as I dont have SSH access. However, to work you also have to put git+ infront of the repository address for npm to work. I am assuming git+ tells npm that we are pulling from a github repository rather than a regular file system.

Running the migration

The migration to is very simple to use, simply run hexo migrate specifiying the type of input, rss and the location of your content. In my case I just pulled the Blogger content directly from the website, although you could download the XML code generated by the RSS feed links and save them as a file for importing. following command and point it at the RSS feed of your website.

I created a new hexo site specifically to import blogger posts, so I would not interfeir whith the posts that I had already written using Hexo. So if everything went wrong I could easily delete the new site and still have my new posts running.

To import content directly from my blogger site into a new hexo project I used the following commands:

hexo init hexo-blogger-import
cd hexo-blogger-import
hexo migrate rss http://blog.jr0cket.co.uk/default?alt=rss
hexo server

It worked, brilliant. I have a whole bunch of migrated articles in source/_posts/ folder. Running the hexo server allowed my to quickly see the results.

Oh not everything is there

Whilst the hexo migration tool successfully grabbed articles from my blog, it only got the first 25 posts. I have about 200 posts so my excitement was short lived. It turns out that this is not a problem with the hexo migration tool, but a problem with the RSS feed from blogger.

I clicked the RSS link on the blogger website and looking at the XML (a horible thing to do) I saw that it was only giving me the first 25 posts.

Migration by labels

Checking on the sites I syndicate some of my blogs too, I noticed a different form for the RSS web address (URL). I share selective posts with Planet Clojure and Planet Emacsen. this is done using specific blogger labels (aka tags) (i.e. PlanetClojure, PlanetEmacsen). These RSS syndication sites were given the following RSS URL’s

http://blog.jr0cket.co.uk/feeds/posts/default/-/PlanetClojure
http://blog.jr0cket.co.uk/feeds/posts/default/-/PlanetEmacsen

So by using the different labels (Blogger calls tags labels) I could pull out more posts from blogger, even though each request would only return a maximum of 25 posts. So instead of the default rss feed used in the first hexo migration, I used the following commands:

hexo migrate rss http://blog.jr0cket.co.uk/-/Clojure
hexo migrate rss http://blog.jr0cket.co.uk/-/Emacs
hexo migrate rss http://blog.jr0cket.co.uk/-/Ubuntu
hexo migrate rss http://blog.jr0cket.co.uk/-/Agile
hexo migrate rss http://blog.jr0cket.co.uk/-/Kanban

So I carried on for each blogger label I had defined on my post until I thought I had most of the posts migrated. Not perfect, but until I know how to get blogger to give me more than 25 posts from its RSS feed that will have to do.

Testing the migrated content

As I was already running hexo server then I could see the results as I was importing each posts from a partiular blogger lable. All I needed to do was refresh the browser each time and click on the relevant tag in the tag cloud sidebar.

If you are not running the server during the migration, you can start it by using the following command in the root of your hexo project:

hexo server

Now open your browser at http://localhost:4000 and see the results of the migration.

Each of the posts I migrated is in my blog, although the tags need tidying up (I wasnt very consistent in blogger). The great thing is that all the posts are in date order, as the published date of each blog was put into each markdown file generated by the migration.

Migration isnt perfect

Whist my articles were copied over to markdown files okay, some of my post brought along with them additional styles (div’s, class styles, non-breaking spaces, etc) and other artefacts that messed up the styles that Hexo applies.

Some of the styling for headers and subheaders is using the markdown notation for bold, rather than heading. Headers in particular are a good thing to correct, as search engines base some of the articles relevance on those headers.

With some of the migrated posts I open them up in an editor and delete any offending styling that came with them. To tell which ones to open, I use the Unix command grep to find which of my posts have <div in their text:

grep "<div" source/_posts/*

It turns out that most of my posts do, so if I want to see which ones I really need to fix then it probably easiest to look at the locally running website created by hexo server. So I opened my browser at http://localhost:4000 and had a look at the posts to see which ones needed the most attention.

My basic strategy was to work from the most recent blog post, working backwards until I didnt care about any older posts.

Updating Categories and Tags

The Hexo rss migrator pulled in all the tags (labels) form my posts on Blogger and listed them correctly in the frontmatter of each post.

Whilst editing the posts to remove the rogue style code, it was a chance to refine the tags I used and select a category for each post. Using the local hexo server, it was quite quick to refine the tags I used by looking all all the words in the tag cloud sidebar. Where I had used similar tags I could just pick one to make it easier and simpler to find the most relevant content on the site.

Adding Summary breaks

A nice feature of Hexo is that you can define how much of a summary view you want to have with each article. The summary view is the main view of the blog and shows the title and the first part of your article.

You define where the summary view ends by using the following syntax in the article markdown file:

<!-- more -->

This is something you need to add manually to each article [TODO: check if there is a tool to do this], so if you have a lot of posts it may take a little while. However it does help your audience (and yourself) scan through your content quickly.

If you have a lot of older posts you are importing, then its not going to be a big problem as they will be many pages into your blog summary view.

Images still on Blogger

The migration is not yet finished, even after I tidy up my posts. Many of the images in my posts are stored in Blogger, which is actually Google picasa and now Google Plus photos. Again there is another hexo tool called hexo-migrator-image which will copy all the remote images to your local filesystem and fix your links (hopefully).

Install hexo-migrator-image using the following command:

npm install hexo-migrator-image

Then run the hexo-migrator-image command and wait for all the images to download.

Imaging importing was not so successful

The image migrator does not like https links and I had quite a lot of them. When the image migrator hits an https link it just crashes too.

Even after changing all the https links to http the results were not as expected. Whilst images had been copied to the local filespace, the names were all changed to long numbers rather than being the original descriptive filenames. To compound the issue, the links in the posts were not updated with to point to the local images.

I wonder if the hexo image migrator failed because the images were all within hypertext ancor links (a href’s).

Rather than wrestle with the hexo image migrator, I decided to leave the images where they were on Google Plus.

Github is not great for images

There is not a lot of advantage putting your images in Github, except that they are right there where the rest of your website is. However, using a good image repository that acts like a Content Delivery Network (CDN) should give you the same amount of speed and not waste space in the Github repository.

By keeping images out it also makes your Git repository quicker to fork and clone

So I will keep all my images on Google Plus. Any photos I take with my Android phone end up on Google Plus anyway, so it makes sense to keep all my images there.

Final migration check

As a final sanity check that everything has been migrated correctly, I ran the hexo-broken-link-checker. This Hexo plugin detects links that don’t work, missing images and redirects.

As I occasionally link to my own posts, it was good to check that these still links still worked.

In Summary

Although I had a bit of editing of my blog posts after the migration, it was worth it to have all my blog content in markdown. Now I can manage my posts much easier and do any updates easily in my favourite editor, Emacs.

Thank you
@jr0cket

This work is licensed under a Creative Commons Attribution 4.0 ShareAlike License, including custom images & stylesheets. Permissions beyond the scope of this license may be available at @jr0cket