It's Easy to Build: Custom docs project | Part 2 - HTML to MD

In this second part of the series - It's Easy to Build: Custom docs project, we'll learn how to convert our HTML code to an markdown text using Ruby.

It's Easy to Build: Custom docs project | Part 2 - HTML to MD

Hello there, welcome to part 2 of my series to build a custom docs project. Since I'm a kind of DRY person, you should see part 1 of this series here to see the introduction and problem statement of this project.

In this tutorial, we'll learn how to convert our HTML code to the markdown text using the reverse_markdown gem. We'll also learn how to add a definition of a custom HTML tag(<bold></bold>) and parse it with this gem.

Note: This part is not something that we'll implement in our docs project. I wanted to write this to show you guys that we can do something like this for your future projects.

Getting Started

Image from Pixabay

For this tutorial, you'll need Ruby installed in your machine. You can use this GoRails guide to setup ruby.

  1. First, create a new folder and cd into it:

    $ mkdir htmltomd
    $ cd htmltomd
    
  2. Install the reverse_markdown gem:

    • First, you need to create a file named Gemfile
      $ touch Gemfile
      
    • Then add reverse_markdown gem to this file:
      # Gemfile
      source 'https://rubygems.org'
      git_source(:github) { |repo| "https://github.com/#{repo}.git" }
      
      # Your ruby version
      ## It's a good practice to mention it here!
      ruby '2.6.5'
      
      gem 'reverse_markdown', '~> 2.0'
      
  3. Install the dependencies:

    $ bundle i
    
  4. Now create three files:

    • config.rb: We'll write our ruby code logic in this file
    • bold_converter.rb: We'll implement a custom tag <bold></bold> in our example, so reverse_markdown will not know about our custom tag, so we'll add this logic here.
    • index.html: We'll write our HTML code in this file
    • hello.md: The output of the converted HTML code to markdown will be injected in this file.
      $ touch config.rb
      $ touch bold_converter.rb
      $ touch input.html
      $ touch hello.md
      

Codebase:

Let's start writing the code:

  1. First, open up your favorite editor and write a sample HTML code in index.html file:

    <h1>Hello world</h1>
    
    <p>Hey there</p>
    
    <hr />
    
    <blockquote>Let there be the end</blockquote>
    
    <table>
      <tr>
        <td>Hello 1</td>
        <td>Hello 2</td>
      </tr>
      <tr>
        <td>Body 1</td>
        <td>Body 2</td>
      </tr>
      <tr>
        <td>Body 3</td>
        <td>Body 4</td>
      </tr>
    </table>
    
    <bold>This is bold</bold>
    
  2. Now, open the config.rb file and write our logic:

    # First, we need to import `reverse_markdown` library
    require 'reverse_markdown'
    
    # Then import the `bold_converter.rb` - See point 3
    require './bold_converter'
    
    # Now let us import our `index.html` and `hello.md` files:
    ## For `index.html`, we only give READ permission
    ## For `hello.md`, we'll give RE-WRITE permission
    ### This means, everytime we parse code,
    ### It will clean `hello.md` file and rewrite code.
    index_html_file = File.open('./index.html', 'r')
    hello_md_text_file = File.open('./hello.md', 'w+')
    
    # Read the contents of `index.html` file
    html_code = index_html_file.read
    
    # Register our custom `<bold></bold>` tag - See point 3
    ReverseMarkdown::Converters.register :bold, ReverseMarkdown::Converters::Bold.new
    
    # Config for reverse_markdown
    ReverseMarkdown.config do |config|
      ## If there is an unknown tag, raise an error
      config.unknown_tags = :raise
      config.github_flavored  = true
    end
    
    # Now, convert the HTML code to MD
    md_text = ReverseMarkdown.convert(html_code)
    
    # Now, inject the MD text in `hello.md` file
    hello_md_text_file.puts md_text
    
    # Now, close the files
    index_html_file.close
    hello_md_text_file.close
    
    • This was simple; everything is explained in the code comments above, and to summarize:
    • We first import the reverse_markdown gem.
    • Then we import our bold_converter.rb file - See next point
    • Then we import our index.html and hello.md files.
    • We then register our new Bold tag, add config to reverse_markdown and then convert our HTML code to MD.
    • We inject the converted MD text to hello.md file.
    • Finally, we close both the files as a good practice.
    • If you run the script now, you'll see an error because <bold></bold> tag is not yet defined. We'll get this error because we've used config.unknown_tags = :raise to raise errors when we have undefined tags. For more options details see the docs. We'll define it now!
  3. Now, open the bold_converter.rb file and write our custom tag logic:

    # bold_converter.rb
    
    module ReverseMarkdown
      module Converters
        class Bold < Base
          # This is the main convert logic
          def convert(node, state = {})
            content = treat_children(node, state.merge(already_bolded_out: true))
            if content.strip.empty? || state[:already_bolded_out]
              content
            else
              "**#{content}**"
            end
          end
        end
      end
    end
    
    • treat_children method performs conversion of all child nodes. If <bold></bold> tag is already defined in reverse_markdown(which it is not), then we don't want to convert anything.
    • Even if the string is empty, we don't want to convert anything.
    • If everything is right, add ** to the content.
  4. Running the script:

    • Run the script by using this command:
      $ ruby config.rb
      
  5. Now, in the hello.md file, you'll see the HTML to MD converted text.

Summary and conclusion

Image from Pixabay

Let's take a quick look at what we learned today:

  1. First of all, I introduced myself, so don't forget to connect with me at Twitter or elsewhere.

  2. Since I'm a DRY person, I referred to my part 1 article where you can see the introduction and problem statement.

  3. Then we saw what the prerequisites are, and we also set up and understood our file structure for this tutorial.

  4. We then wrote our ruby code logic to solve our problem, i.e., to convert an HTML code to markdown text and injecting it into an hello.md file.

  5. Finally, we took a quick look at this very summary...

That's it and Let there be the end. 🙏