I think I've done it. I got all post content importing how I'd like. I've got view to redirect requests to old Wordpress post urls / categories / categories feeds to their new uuid based permalink. I can customize the site name. And I've added pagination to all of the public views.
Now all I have to do is actually make the switch!
Got image rewriting working so all posts will use the Tanzawa standard photo insert...and noticed a bug when I import images.
Photos taken with an iPhone rely on the exif data to indicate the proper orientation. However, I strip all exif data from photos when I save them for saving size and enhancing privacy....
No exif data to orient the image and vertical images appear sideways. The solution is simple: rotate the image before stripping / extracting the exif data. Thankfully the fix is simple.
I've got streams being set properly for each category. The last bit is to clean up the content automatically and rewrite / swap out image tags. Particularly photos posted with Sunlit. Posts made with Sunlit are displayed as an <a> tag (which links to your original image with a "-scaled" suffix) and an <img> tag with a source that proxies through micro.blog.
It also has an attachment of the second photo, which I am automatically inserting into the post.
I need to extract all <a> tags, detect if their href attribute has a "-scaled" in it, and strip that. Then I can look in my database for that attachment entry in my db and rewrite the tag as a Tanzawa image insert. Shouldn't take too long. Maybe tomorrow.
There's light at the end of the tunnel. I can import all of my post content, including check meta-data, bookmark urls – everything. The only remaining tasks is to build a custom 404 handler that will redirect visits to the old Wordpress urls to their new Tanzawa permalinks and to use the configured Category -> Stream mapping record.
I’ve got all of my extractors written. Next up is actually importing the content. In my import I’m also going to automate cleaning up some of the markup.
- Removing link wrappers around images. I.e. images posted from Sunlit wrap all images with an a tag. I want to strip that.
- Rewriting all attachment links to their new Tanzawa permalink.
- Rewriting all internal links to their new Tanzawa permalink.
I have a few pages on blog. I’m not sure I want to support pages yet (at least not in such a free form). I could import them as posts, so the content moves over. But instead I think I’m going to move them to my wiki instead.
There's still a slog ahead for importing posts, but it seems manageable. I wrote a bunch of utility functions (with tests) to extract and normalize individual fields of data from a post.
The idea being, once I can extract the data easily, I should be able construct my records by simply calling each function (more or less).
Migrating comments is going to be tricky as I only support webmentions in Tanzawa and not all comments on my blog originate from webmentions. I think I'll probably just not import comments/webmentions until after I migrate my blog to Tanzawa.
The last thing I need to do is import individual post content. Maybe it's because I'm not building fun features, but this last Wordpress import feels like such a slog.
I made a fun hack for importing images. I'm using (part) of the Hotwire stack for the dynamic portions of Tanzawa. Most dynamic web applications today use client side rendering, which means the server sends a json data structure and your browser has code/templates/logic to instruct it how to turn it into html for display. Hotwire is "html over the wire", so all of your logic and rendering happens on the server and the browser just displays the result.
Turbo also support lazy loading. Which means that it's not going to load the frame until it shows up on the page. Which means I can import all of my images by just scrolling down the page.
I broke the chain with a weekend off after about 3 months of working on Tanzawa a bit each and everyday. Today I'm back at it and I made a small api that imports images from Wordpress. Tomorrow I should be able to build a small interface that'll loop through the attachments and automatically download them.
Managed to get the category to stream and post kind to (tanzawa) post kind mappings working. I also got the attachment import records saving properly.
My basic plan for importing attachments is as follows. Each photo in Wordpress is exported as a post with the post type as "attachment". The guid for the item is the url for the attached file. So, I've created a record has a foreign key to the originating wordpress import record, the post guid, my own uuid, and a nullable foreign key to the resulting Tanzawa file attachment.
Once the file has been imported, I'll have a Tanzawa file attachment set so I'll easily be able to pick up where I left off.
Also since I'm keeping the originally uploaded Wordpress export file around and references between Tanzawa data and imported data, as I add features and capabilities to Tanzawa, I'll have the option to go back and pull in meta-data from Wordpress that I skipped on the initial import.
1 of 12 Next