It's Easy to Build: Custom docs project | Part 2 - HTML to MD
In this second part of the series - It's Easy to Build: Custom docs project, we'll learn how to convert our HTML code to an markdown text using Ruby.
Hello there, welcome to part 2 of my series to build a custom docs project. Since I'm a kind of DRY person, you should see part 1 of this series here to see the introduction and problem statement of this project.
In this tutorial, we'll learn how to convert our HTML code to the markdown text using the reverse_markdown
gem. We'll also learn how to add a definition of a custom HTML tag(<bold></bold>
) and parse it with this gem.
Note: This part is not something that we'll implement in our docs project. I wanted to write this to show you guys that we can do something like this for your future projects.
Getting Started
For this tutorial, you'll need Ruby installed in your machine. You can use this GoRails guide to setup ruby.
-
First, create a new folder and
cd
into it:$ mkdir htmltomd $ cd htmltomd
-
Install the
reverse_markdown
gem:- First, you need to create a file named Gemfile
$ touch Gemfile
- Then add
reverse_markdown
gem to this file:# Gemfile source 'https://rubygems.org' git_source(:github) { |repo| "https://github.com/#{repo}.git" } # Your ruby version ## It's a good practice to mention it here! ruby '2.6.5' gem 'reverse_markdown', '~> 2.0'
- First, you need to create a file named Gemfile
-
Install the dependencies:
$ bundle i
-
Now create three files:
config.rb
: We'll write our ruby code logic in this filebold_converter.rb
: We'll implement a custom tag<bold></bold>
in our example, soreverse_markdown
will not know about our custom tag, so we'll add this logic here.index.html
: We'll write our HTML code in this filehello.md
: The output of the converted HTML code to markdown will be injected in this file.$ touch config.rb $ touch bold_converter.rb $ touch input.html $ touch hello.md
Codebase:
Let's start writing the code:
-
First, open up your favorite editor and write a sample HTML code in
index.html
file:<h1>Hello world</h1> <p>Hey there</p> <hr /> <blockquote>Let there be the end</blockquote> <table> <tr> <td>Hello 1</td> <td>Hello 2</td> </tr> <tr> <td>Body 1</td> <td>Body 2</td> </tr> <tr> <td>Body 3</td> <td>Body 4</td> </tr> </table> <bold>This is bold</bold>
-
Now, open the
config.rb
file and write our logic:# First, we need to import `reverse_markdown` library require 'reverse_markdown' # Then import the `bold_converter.rb` - See point 3 require './bold_converter' # Now let us import our `index.html` and `hello.md` files: ## For `index.html`, we only give READ permission ## For `hello.md`, we'll give RE-WRITE permission ### This means, everytime we parse code, ### It will clean `hello.md` file and rewrite code. index_html_file = File.open('./index.html', 'r') hello_md_text_file = File.open('./hello.md', 'w+') # Read the contents of `index.html` file html_code = index_html_file.read # Register our custom `<bold></bold>` tag - See point 3 ReverseMarkdown::Converters.register :bold, ReverseMarkdown::Converters::Bold.new # Config for reverse_markdown ReverseMarkdown.config do |config| ## If there is an unknown tag, raise an error config.unknown_tags = :raise config.github_flavored = true end # Now, convert the HTML code to MD md_text = ReverseMarkdown.convert(html_code) # Now, inject the MD text in `hello.md` file hello_md_text_file.puts md_text # Now, close the files index_html_file.close hello_md_text_file.close
- This was simple; everything is explained in the code comments above, and to summarize:
- We first import the
reverse_markdown
gem. - Then we import our
bold_converter.rb
file - See next point - Then we import our
index.html
andhello.md
files. - We then register our new
Bold
tag, add config toreverse_markdown
and then convert our HTML code to MD. - We inject the converted MD text to
hello.md
file. - Finally, we close both the files as a good practice.
- If you run the script now, you'll see an error because
<bold></bold>
tag is not yet defined. We'll get this error because we've usedconfig.unknown_tags = :raise
to raise errors when we have undefined tags. For more options details see the docs. We'll define it now!
-
Now, open the
bold_converter.rb
file and write our custom tag logic:# bold_converter.rb module ReverseMarkdown module Converters class Bold < Base # This is the main convert logic def convert(node, state = {}) content = treat_children(node, state.merge(already_bolded_out: true)) if content.strip.empty? || state[:already_bolded_out] content else "**#{content}**" end end end end end
treat_children
method performs conversion of all child nodes. If<bold></bold>
tag is already defined inreverse_markdown
(which it is not), then we don't want to convert anything.- Even if the string is empty, we don't want to convert anything.
- If everything is right, add
**
to the content.
-
Running the script:
- Run the script by using this command:
$ ruby config.rb
- Run the script by using this command:
-
Now, in the
hello.md
file, you'll see the HTML to MD converted text.
Summary and conclusion
Let's take a quick look at what we learned today:
-
First of all, I introduced myself, so don't forget to connect with me at Twitter or elsewhere.
-
Since I'm a DRY person, I referred to my part 1 article where you can see the introduction and problem statement.
-
Then we saw what the prerequisites are, and we also set up and understood our file structure for this tutorial.
-
We then wrote our ruby code logic to solve our problem, i.e., to convert an HTML code to markdown text and injecting it into an
hello.md
file. -
Finally, we took a quick look at this very summary...
That's it and Let there be the end. 🙏