How to easily migrate pages from Drupal 6 CCK content types to Drupal 7 fields using the Feeds module

Atlanta Drupal Users Group logo One of the easiest ways to upgrade between version 6 and version 7 of Drupal is to re-build your site in Drupal 7, and then use the Views Data Export and Feeds XPath Parser modules to move your pages and articles into your new site.

This post shows you the details of setting up both a D6 View you can export and a D7 Feeds importer you can use to migrate that View content. 

If you notice anything missing or confusing in these instructions, please post a comment, so that I can improve this tutorial. Thanks!

Before you begin

  1. Read other good posts on general updating strategies:
  2. Do a COMPLETE backup of your site (files, database, everything).
  3. Create a SEPARATE Test site to use for this process, so you don't kill your Live, Production site.

Set up your old Drupal 6 site

  1. On your old D6 site, install Views, Views Data Export, CTools & any modules they depend on.
  2. I'm using "Views Data Export", instead of the RSS display that is native to Views, because RSS made it difficult to output the path and Date fields correctly.
  3. Finally, create a separate View for each content type you want to move.

View settings in Drupal 6

If you want to skip the instructions, you can just import this View code using CTools.

settings for default display

  1. Add a view type of "Node"
  2. Set filters for only Published Pages.
  3. Add fields for each bit of data you want to migrate, such as these common fields:
    • Uid
    • Nid
    • Post date
    • Path
    • Title
    • Body
  4. Add a "Data Export" display to the View.
    settings for data export display
    • Under "Style settings", choose "XML file".  For "Data export: Style options", UNCHECK "Provide as file" and DO CHECK "Transform spaces". For Transforming spaces, choose "Dashes", because XML does not process element names if they have spaces in them.
      settings for XML file settings
    • Under "Data export settings", set a "Path" to this new feed. For example: feeds/pages/all.
    • Under "Fields", choose "Node: Path" and, under "Rewriting", check "Rewrite the output of this field". Enter [path] into the the text box, so that you will only get the internal path and not an entire URL.
      settings for Path field rewriting
    • Again under "Fields", choose "Node: Post date" and, under "Rewriting", check "Rewrite the output of this field". Enter [created] into the the text box, so you can change the date formatting. For "Date format", choose "Custom" and enter Y-m-d H:i:s O into the text box, this will output your date in a format that will be easier to import into the new Drupal 7 site.
      settings for Date field custom format
    • Don't link "Title" or "User" fields to their nodes, or they will output link tags in the feed.unchecked link this field to user checkbox
  5. SAVE your View, or all your changes will be LOST!

XML feed output from Drupal 6

When you click on the link to your new Feed (http://mylivesite.gatech.edu/feeds/pages/all), you may see something like this XML code:

<?xml version="1.0" encoding="utf-8" ?>
<nodes>
    <node>
        <Uid>2</Uid>
        <Nid>71</Nid>
        <Path>/about/staff</Path>
        <Post-date>2011-04-28 13:04:39 -0400</Post-date>
        <Title>Our Staff</Title>
        <Body><p>Our employees are brilliant!&nbsp; And attractive, too.</p></Body>
    </node>
    <node>
        <Uid>2</Uid>
        <Nid>81</Nid>
        <Path>/about/location</Path>
        <Post-date>2011-04-28 13:06:27 -0400</Post-date>
        <Title>Our Location and Hours</Title>     <Body><p>More fascinating HTML goes here, including an <a href="http://mysite.gatech.edu/fakedirectory/pagename">absolute link</a> whose URL may need replaced if I am changing my site's Domain Name.</p></Body>
    </node>
</nodes>

 

Set up your test site in Drupal 7

  1. Set up your new D7 site with whatever themes, modules & configurations you would like.
  2. Create custom content types with the same fields as you used in your Drupal 6 site. The Bundle Copy module will speed up your CCK re-creation, it allows you to import a generic content type with a pre-set collection of fields and settings you often use. 
  3. Warning: pay attention to Text Input Formats or you might strip out important HTML tags from your Body field when importing. So, allow all users (for now) permission to use the "Full HTML" text format. Likewise, set the default text format for your new Page content type to use "Full HTML".
  4. Install Feeds, Feeds XPath Parser, CTools & any modules they depend on.
    • Warning: the Pathauto module, if enabled, will overwrite/re-create path aliases for all the pages you import, so you might want to disable Pathauto before importing.

    Feed importer settings in Drupal 7

    If you want to skip the instructions, you can just import this Feeds Importer code using CTools.

    1. Add a feed importer at http://mysite.gatech.edu/admin/structure/feeds.
    2. For "Basic settings", choose:
      basic settings screen
      • Attach to content type: "Use standalone form".
      • Periodic import: "Off"
      • CHECK: Import on submission
    3. For "Fetcher", use "HTTP Fetcher" and choose:
      HTTP fetcher settings screen
      • CHECK: Auto detect feeds
    4. For "Parser", choose "XPath XML parser".
    5. For "Processor", choose "Node processor" and then use these Settings:
      node processor settings screen
      • Update existing nodes: Replace existing nodes
      • Text format: Full HTML
      • Content type: Page
      • Author: YourUserName (Note: To import page authors, you have to import your users BEFORE importing pages).
      • Expire nodes: Never
    6. For "Node processor Mapping", add "XPath Expression" for each of these fields:
      node processor mapping screen
      • "Node ID" and make it Unique
      • "User ID"
      • "Title"
      • "Body"
      • "Published date"
      • "Path alias"
    7. Under "XPath XML parser", type in your XSL queries like this:
      XPath XML parser screen
      • Context: //node
      • nid: Nid
      • uid: Uid
      • title: Title
      • body: Body
      • created: Post-date
      • path_alias: Path
      • At the bottom of the page, do NOT check any boxes under "Select the queries you would like to return raw XML or HTML", as this will wrap your field data in an extra <Body> tag.
    8. Be sure to Save your settings.

    Using your feed importer with Drupal 7

    import screen

    1. Go to the /import page on your site (for example: http://mysite.gatech.edu/import) & choose the importer you just created (D6 XML pages).
    2. In the Import > URL text box, enter the web address of the feed view you created earlier, for example: http://mylivesite.gatech.edu/feeds/pages/all and click on "Import".
      import screen 2
    3. Hopefully, you'll see a successful Status message that says something like "10 imported items total".
      import screen success status

    Quality assurance

    1. Do some sample checking of the pages you imported. Make sure your new pages are identical to those on the old site.
    2. Consider using Views Bulk Operations (VBO) as a great way to add tags or do mass corrections to this imported content in your new Drupal 7 site.

    Known import Problems

    For more information

    Comments

    Rob

    Uploads File Attachments

    Incase it's of any help to someone else. I was trying to follow this tutorial, except my content type had a File Attachment. Each time I would try to import it, I would get an error that the FID could not be null. I read all the error logs and could not find the solution. Turned out that on the Drupal 6 site where I was generating the XML from, for the File Attachment field, I needed to check "Only show listed file attachments". That solved it for me. 
     
    Great writeup! It helped me a lot. 
    Thanks

    Leora

    Stripping HTML even though not checked

    I am trying to use your method.  Everything worked except for the Body field.  I don't have Strip HTML checked, but the XML strips the HTML tags out of the Body.
    See: http://njsgc.rutgers.edu/feeds/pages/all
    If you look at the one for Space and Science Links, for example, you would expect the link tags to appear. But I get nothing.
    I switched that page to Full HTML (it was originallly entered with TinyMCE), but it still strips the code.  Wondering if this has to do with TinyMCE messing it up.

    Editor

    Maybe Text Input Format

    Hi, Leora: 

    Is it possible that the Input Format for text (for example: Plain Text) is set to a format that strips out tags?

    M

    Awesome tutorial! It worked!

    Two things we wanted to mention, for any other beginners reading this.
    1. In our case, we discovered YOU MUST IMPORT YOUR USERS FIRST and make sure they have permission to create pages. If you don't import your users first, the page import won't work. You can use the Feeds importer "User import" to import them, but first install the patch found at http://drupal.org/node/1570544 (it lets you map the UID from Drupal 6 to the User ID of Drupal 7, so that when you import your pages they can "find" their creators).
    2. When you're setting up your feed importer settings in Drupal 7, these instructions tell you to do your Node Processor Mapping BEFORE you select the XPath XML Parser as your parser. We discovered we had to choose the parser FIRST, or else the Xpath Expression option doesn't appear in Node Processor Mapping. Just a little thing.
     
    Other than that, I just want to say that you ROCK, Adelle! I wish this post showed up as the first Google hit for migrating or importing pages from Drupal 6 to Drupal 7, then we wouldn't have wasted so much time fumbling with Node Export and feeling totally demoralized when it kept failing to import any nodes (or importing half of our pages with a "page not found" error!) We're beginners and the way you outlined everything step by step was incredibly useful.
    Completely thanks to you, we just wrapped up our first successful import! We followed your instructions exactly and imported all but 5 pages successfully.
     
    I was really starting to think that migrating pages was going to be over my head, which felt bad because it seems like such a basic part of migration. I'm so happy that I got to do this myself, and the skills I learned from your tutorial can be transferred to other imports (users, blog posts, etc).
    Thanks again from Canada!

    Adelle Frank

    Going to update post with your feedback

    Thanks, M, for giving such detailed feedback: I will update my post with your excellent additions!

    Adelle

    Date field Timezone problem

    There is a known issue for Feeds, where it converts imported dates to GMT, even if they are in a UNIX timestamp format.
    There are a number of workarounds right now, including the Feeds Tamper module, or setting your Content Type's date field to "UTC". 

    Andy

    Thanks and a question

    This is a great tutorial! I really appreciate you taking the time to write it. I think it will work for me except for one thing: Do you know how to use the D6 UID to set the author of the created node?
    I realized I couldn't map users because of my own silly typo -- go figure.

    Dawn

    Great resource! Attached files and images?

    HI there - thank you for a very clear tutorial! 
    I'm wondering how you would handle adding multiple attached files? 
    I'm currently using Data Export module in D6 and it only outputs attached files as a comma separated list. Which would be fine if I could figure out how to get Feeds Xpath Parser in D7 to import these. I outlined the problem here http://drupal.org/node/1874380.
    Is there a standard way to attach and migrate files using feeds?
    thank you

    Adelle Frank

    Try Feeds Tamper module

    Hi, Dawn:

    I don't think there is a standard way to migrate attached files.  I believe this should work, but I haven't had time to test.  If you do try it would you let me know if it works?

    1. In your Drupal 6 View, add a field from the Uploads category, called "Upload: Attached files"
    2. When configuring that field, choose "Rewrite the output of this field" and put the [upload_fid-url] token into the textarea, so that the absolute link to this file is output in the XML feed. Be sure to choose a comma WITHOUT a space as the separator for multiple values.
    3. In your Drupal 7 site, make sure your content type for Page allows more than one value for the attached files field & that your field allows the extensions (txt, png, etc) you will be migrating.
    4. In your Feed importer, add a mapping for the Attached File field.
    5. Under the Settings for XPath XML parser page, enter "Attached-files".
    6. Install the Feeds Tamper module, and configure it to explode multiple values.

    Adelle Frank

    Text Find & Replace Solution (for small amounts of data)

    Because Views Export as XML does NOT nest multivalued field values properly in parent/child tags, my co-worker, Matt came up with this Solution:

    Use a text editor to replace <elements> in the exported XML. You can give those fields and their multiple values a parent/child structure manually.

    Sean

    More Resources

    Hey,
    I found this resource when looking into the same problem.  The only difference is the example they run through is done with a CSV file.  The process is identical.
     
    http://prezi.com/g6chbltwdcbx/drupal-data-migration-made-simple-with-feeds/

    Adelle Frank

    Great resource!

    Thanks, Sean :)

    Post new comment

    The content of this field is kept private and will not be shown publicly.
    CAPTCHA
    This question is for testing whether you are a human visitor and to prevent automated spam submissions.
    Image CAPTCHA
    Enter the characters shown in the image.
    To prevent automated spam submissions leave this field empty.