Importing Drupal blogs to Jekyll

I had about 5 Drupal blogs that I wanted to export to Jekyll. Most of them were a mix of blog and HTML website, although the biggest one also used Drupal to generate some info pages, like the About page. They ranged from 1-200 posts. Of course, the biggest blog had comments. Obviously, I started with the smallest blogs and worked my way up.

I started with the jekyll-import plug-in on Github, which creates a _posts directory with your blog posts and Jekyll-ized file names, like so:

    Jan 21:37 2009-07-18-getting-started.md
    Jan 21:37 2009-07-19-ardubot.md
    Jan 21:37 2009-07-20-simple-ardubot-programming.md

Front matter is added to each post that looks like so:

    ---
    categories:
    - administrivia
    layout: story
    title: Getting Started
    created: 1247898135
    ---
Breaking that down:
  • all tags (are there categories in Drupal? I only ever used tags) become categories which makes for a very funky directory structure and makes referencing your other posts a pain to type, so one of the first things I had to do was to manually go in an pick one or 2 categories and change the rest to tags. If you've got bigger blogs, you will probably want to mod the conversion plug-in to do that part for you, especially if you do any cross-linking of your own posts.
  • the layout were variously "story" or "blog" and since my theme used "post" as the blog post layout, I just symlinked both "story" and "blog" to "post" and was done with that problem 5 seconds later.
  • created, I'm sure is a date/time stamp in a format that I don't immediately recognize, but the post date is also in the filename, so I'm assuming there's no issue there.

The last detail is the post text itself, which looks like so:

    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus
    eu lacus quis lacus tempor tincidunt. Mauris vulputate diam lorem,
    nec lobortis tortor gravida in.^M
    ^M
    In hac habitasse platea dictumst.^M    
    ^M
    <ul>^M
      <li>Duis vitae erat et enim commodo placerat sed id massa. Nam semper sed elit ac hendrerit. Nulla facilisi. Ut rhoncus erat at venenatis ultricies.
      <li>Ut eget turpis est.^M
    </ul>^M
The ^M causes problems if they appear in the excerpt, so I remove them wholesale while I'm editing the categories/tags which is a quick keyboard macro search and replace in emacs.

Speaking of excerpts, the manual ones you put into Drupal transfer (usually with <!--break -->, which can be reused with excerpt_separator: <!--break --> added to your page front matter). Sometimes a chunk of excerpt text has been auto-generated in the conversion and added to the page front matter. The rest of the time there is no excerpt, and the whole "By default this is the first paragraph of content in the post" is an inconsistent lie (as of jekyll 3.8.5). So, if you're a wordy blogger you need to fix the excerpt thing, too, for your blog lists.

Starting to sound like a lot of manual work, eh? Of course, you could either mod the import plug-in or write a supplemental post-processing script, because very little of this needs to be human-guided decision making. I was just doing it by hand for my 1, 5, and 20 post blogs. With my 200 post blog I might decide that writing and debugging a script is less work that just doing it with keyboard macros...

I forgot to mention that there are also all manner of directories with post stubs that redirect and refresh to the _posts posts for category/tag directories and date directories. I just deleted all of those and let Jekyll recreate what directories it wants.

Lastly, I was not using my Drupal in the most standard of manners, so sometimes I had to go hunting on the server for where I had squirreled away my images, although some images came along for the ride as you'd hope and expect them to.