Home Converting Medium Posts to Markdown
Post
Cancel

Converting Medium Posts to Markdown

Converting Medium Posts to Markdown

Writing a small tool to back up Medium articles & convert them to Markdown format

[ZhgChgLi](https://github.com/ZhgChgLi){:target="_blank"} [ZMediumToMarkdown](https://github.com/ZhgChgLi/ZMediumToMarkdown){:target="_blank"}

ZhgChgLi / ZMediumToMarkdown

[EN] ZMediumToMarkdown

I’ve written a project to let you download Medium post and convert it to markdown format easily.

Features

  • Support download post and convert to markdown format
  • Support download all posts and convert to markdown format from any user without login access.
  • Support download paid content
  • Support download all of post’s images to local and convert to local path
  • Support parse Twitter tweet content to blockquote
  • Support download paid content
  • Support command line interface
  • Convert Gist source code to markdown code block
  • Convert youtube link which embed in post to preview image
  • Adjust post’s last modification date from Medium to the local downloaded markdown file
  • Auto skip when post has been downloaded and last modification date from Medium doesn’t changed (convenient for auto-sync or auto-backup service, to save server’s bandwidth and execution time)
  • Support using Github Action as auto sync/backup service
  • Highly optimized markdown format for Medium
  • Native Markdown Style Render Engine (Feel free to contribute if you any optimize idea! MarkupStyleRender.rb )
  • jekyll & social share (og: tag) friendly
  • 100% Ruby @ RubyGem

[CH] ZMediumToMarkdown

A backup tool that can crawl the content of Medium article links or all articles of a Medium user, convert them into Markdown format, and download them along with the images in the articles.

[2022/07/18 Update]: Step-by-step guide to seamlessly migrate Medium to a self-hosted site

Features

  • No login required, no special permissions needed
  • Supports downloading and converting single articles or all articles of a user into Markdown
  • Supports downloading and backing up all images in the articles and converting them to corresponding image paths
  • Supports deep parsing of Gist embedded in articles and converting them into corresponding language Markdown Code Blocks
  • Supports parsing Twitter content and reposting it in the article
  • Supports parsing YouTube videos embedded in articles, converting them into video preview images and links displayed in Markdown
  • When downloading all articles of a user, it will scan for embedded related articles and replace the links with local ones if found
  • Specially optimized for Medium format styles
  • Automatically changes the last modified/created time of the downloaded articles to the same as the Medium article’s publication time
  • Automatically compares the last modification of the downloaded articles, and skips if it is not less than the last modification time of the Medium article (This mechanism can save server traffic/time, making it convenient for users to use this tool to create automatic Sync/Backup tools)
  • CLI operation, supports automation

This project and this article are for technical research only. Please do not use it for any commercial purposes or illegal purposes. The author is not responsible for any illegal activities conducted using this tool. This is a disclaimer.

Please ensure you have the rights to use and download the articles before backing them up.

Origin

In the third year of managing Medium, I have published over 65 articles; all articles were written directly on the Medium platform without any other backups. Honestly, I have always been afraid that issues with the Medium platform or other factors might cause the disappearance of my hard work over the years.

I had manually backed up before, which was very boring and time-consuming, so I have been looking for a tool that can automatically download and back up all articles, preferably converting them into Markdown format.

Backup Requirements

  • Markdown format
  • Automatically download all Medium posts of a user
  • Article images should also be downloadable and backed up
  • Ability to parse Gist into Markdown Code Block (I use gist a lot to embed source code in my Medium articles, so this feature is very important)

Backup Solutions

Medium Official

Although the official provides an export backup function, the export format can only be used for importing into Medium, not Markdown or common formats, and it does not handle embedded content like Github Gist.

The API provided by Medium is not well-maintained and only offers the Create Post function.

Reasonable, because Medium does not want users to easily transfer content to other platforms.

Chrome Extension

I found and tried several Chrome Extensions (most of which have been taken down), but the results were not good. First, you have to manually click into each article to back it up, and second, the parsed format had many errors and could not deeply parse Gist source code or back up all images in the articles.

medium-to-markdown command line

Some expert wrote it in JS, which can achieve basic download and conversion to Markdown functionality, but still lacks image backup and deep parsing of Gist Source Code.

ZMediumToMarkdown

After struggling to find a perfect solution, I decided to write a backup conversion tool myself; it took about three weeks of after-work time to complete using Ruby.

Technical Details

How to get the article list by entering the username?

  1. Obtain UserID: View the user’s homepage (https://medium.com/@#{username}) source code to find the Username corresponding to the UserID Note that because Medium reopened custom domains, you need to handle 30X redirects

  2. Sniffing network requests reveals that Medium uses GraphQL to get the homepage article list information

  3. Copy the Query & replace UserID in the request information
    1
    2
    
    HOST: https://medium.com/_/graphql
    METHOD: POST
    
  4. Get the Response

You can only get 10 items at a time, so you need to paginate.

  • Article list: can be obtained in result[0]->userResult->homepagePostsConnection->posts
  • homepagePostsFrom pagination information: can be obtained in result[0]->userResult->homepagePostsConnection->pagingInfo->next Include homepagePostsFrom in the request to paginate, nil means there are no more pages

How to parse article content?

Viewing the article source code reveals that Medium uses the Apollo Client service for setup; its HTML is actually rendered from JS; therefore, you can find the window.__APOLLO_STATE__ field in the

We need to do the same, parse this JSON, match the Type to the Markdown style, and assemble the Markdown format.

Technical Difficulties

A technical difficulty here is rendering paragraph text styles, where Medium provides the structure as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
}

This means that for the text code in text, and link in text, and ZhgChgLi, and bold, and I, only i:

1
2
3
4
- Characters 5 to 7 should be marked as code (wrapped in `Text` format)
- Characters 18 to 22 should be marked as a link (wrapped in [Text](URL) format)
- Characters 50 to 63 should be marked as bold (wrapped in *Text* format)
- Characters 55 to 69 should be marked as italic (wrapped in _Text_ format)

Characters 5 to 7 & 18 to 22 are easy to handle in this example because they do not overlap; but 50–63 & 55–69 will have overlapping issues, and Markdown cannot represent overlapping in the following way:

1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, **only i_

The correct combination result is as follows:

1
code `in` text, and [ink](http://zhgchg.li) in text, and ZhgChgLi, and **bold,_ and I, _**_only i_

50–55 STRONG 55–63 STRONG, EM 63–69 EM

Additionally, please note:

  • The beginning and end of the packaging format string should be distinguishable. Strong just happens to have ** at both the beginning and end. If it is a Link, the beginning will be [ and the end will be ](URL).
  • When combining Markdown symbols with strings, be careful not to have spaces before or after, otherwise, it will fail.

See the full issue here.

This has been studied for a long time, and for now, we are using an existing package to solve it reverse_markdown.

Special thanks to former colleagues Nick , Chun-Hsiu Liu , and James for their collaborative research. I will write and convert it to native code when I have time.

Results

Original text -> Converted Markdown result

If you have any questions or comments, feel free to contact me.

===

本文中文版本

===

This article was first published in Traditional Chinese on Medium ➡️ View Here


This post is licensed under CC BY 4.0 by the author.

Practical Application Record of Design Patterns

Implementing iOS NSAttributedString HTML Render Yourself