Markdown is a lightweight markup language created by John Gruber and Aaron Swartz in 2004. It was designed to be easy to read and write in plain text form while being convertible to HTML for web publishing. Unlike HTML with its verbose angle-bracket tags, Markdown uses intuitive ASCII punctuation characters to indicate formatting.
The philosophy behind Markdown is that the plain-text source should be readable without being processed. For example, compare these equivalent representations:
HTML: <h1>Welcome</h1><p>This is <strong>bold</strong> text.</p>
Markdown: # Welcome
This is **bold** text.
Even without rendering, the Markdown version clearly shows the heading and emphasis. This readability makes Markdown ideal for documentation, README files, blog posts, technical writing, and any content where both source and output matter.
Markdown has become ubiquitous in software development. GitHub, Stack Overflow, Reddit, Discord, and countless other platforms use Markdown for user-generated content. Static site generators like Jekyll, Hugo, and Gatsby build entire websites from Markdown files. Note-taking apps like Obsidian, Notion, and Bear use Markdown as their native format. Learning Markdown is now an essential skill for developers and technical writers.
There are numerous practical reasons to convert HTML content to Markdown format. Understanding these use cases helps you choose the right conversion approach and tools.
Modern software projects use Markdown for README files, wikis, and documentation. If you have existing HTML documentation, converting it to Markdown makes it easier to maintain in version control systems like Git. Markdown files are plain text, so changes are clearly visible in diffs, merge conflicts are easier to resolve, and the documentation source is readable without special tools.
Many content creators migrate from traditional CMS platforms (WordPress, Medium, Ghost) to static site generators (Jekyll, Hugo, Eleventy, Gatsby). These static site generators typically use Markdown for content. Converting HTML blog posts to Markdown is a crucial step in the migration process. Once in Markdown, content can be version-controlled, edited in any text editor, and processed by various build tools.
Web content changes frequently -- sites go offline, articles get deleted, designs change. Converting important HTML articles to Markdown creates a lightweight, portable archive that is readable in any text editor. Markdown archives are much smaller than full HTML pages with embedded CSS and JavaScript, making them practical for long-term storage.
HTML emails and rich text editor output often include excessive formatting markup. Converting to Markdown strips away the clutter while preserving semantic structure. This is useful for including email content in documentation, extracting formatted quotes for blog posts, or cleaning up pasted content from web pages.
Markdown is platform-agnostic. The same Markdown file can be rendered as a web page, PDF, Word document, or presentation slide deck using tools like Pandoc. Converting HTML to Markdown makes content more flexible and reusable across different output formats and platforms.
Editing Markdown is faster and less error-prone than editing HTML. You do not need to remember to close tags, you can easily see the document structure, and most text editors provide Markdown syntax highlighting. For technical writers and developers, Markdown accelerates content creation and maintenance.
Understanding how HTML elements map to Markdown syntax is essential for effective conversion. Here is a comprehensive mapping of common HTML elements to their Markdown equivalents.
HTML:
<h1>Heading 1</h1>
<h2>Heading 2</h2>
<h3>Heading 3</h3>
Markdown:
# Heading 1
## Heading 2
### Heading 3
Markdown supports six heading levels (# through ######), corresponding to HTML's <h1> through <h6>. An alternative syntax uses underlines (=== for h1, --- for h2) but the hash syntax is more common.
HTML:
<p>First paragraph.</p>
<p>Second paragraph.</p>
Markdown:
First paragraph.
Second paragraph.
In Markdown, paragraphs are separated by blank lines. A single line break is ignored unless you end the line with two spaces (for a hard line break).
HTML:
<em>italic</em> or <i>italic</i>
<strong>bold</strong> or <b>bold</b>
Markdown:
*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___
Both asterisks (*) and underscores (_) work for emphasis. Asterisks are more common. You can nest emphasis: **bold with *italic* inside**.
HTML:
<a href="https://example.com">Link text</a>
Markdown:
[Link text](https://example.com)
Or with title attribute:
[Link text](https://example.com "Title")
Markdown also supports reference-style links for repeated URLs: [Link][ref] with [ref]: https://example.com at the bottom of the document.
HTML:
<img src="image.jpg" alt="Description">
Markdown:

With title:

Image syntax is identical to link syntax with a leading exclamation mark. Reference-style images are also supported.
HTML Unordered:
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
Markdown:
- Item 1
- Item 2
Or: * Item 1 / + Item 1
HTML Ordered:
<ol>
<li>First</li>
<li>Second</li>
</ol>
Markdown:
1. First
2. Second
For nested lists, indent by four spaces or one tab. Unordered lists can use -, *, or +.
HTML Inline:
<code>inline code</code>
Markdown:
`inline code`
HTML Block:
<pre><code>code block</code></pre>
Markdown:
```
code block
```
Or indent by 4 spaces:
code block
Fenced code blocks (```) can include a language identifier for syntax highlighting: ```javascript.
HTML:
<blockquote>
<p>Quoted text</p>
</blockquote>
Markdown:
> Quoted text
Nested quotes:
> Level 1
>> Level 2
HTML:
<hr>
Markdown:
---
or ***
or ___
HTML:
<table>
<tr><th>Header</th><th>Header</th></tr>
<tr><td>Cell</td><td>Cell</td></tr>
</table>
Markdown (GFM):
| Header | Header |
|--------|--------|
| Cell | Cell |
While the original Markdown specification defined core syntax, various extensions and flavors have emerged to support additional features. When converting HTML to Markdown, you should choose the flavor that matches your target platform.
CommonMark is a standardized specification created in 2014 to resolve ambiguities in the original Markdown. It defines precise parsing rules and includes a comprehensive test suite. CommonMark is the foundation for many modern Markdown implementations and provides maximum compatibility across platforms.
GitHub Flavored Markdown extends CommonMark with features commonly needed in software development:
Use GFM when targeting GitHub repositories, README files, or issues/pull requests.
Markdown Extra adds features useful for technical writing and publishing:
MultiMarkdown extends Markdown with metadata, cross-references, tables, math support, and glossaries. It is popular for academic writing and book authoring.
Pandoc Markdown is an extended syntax supported by the Pandoc document converter. It includes citations, math, footnotes, tables, definition lists, and many other features needed for academic and technical publishing.
Converting HTML to Markdown is not always straightforward. Understanding common challenges helps you anticipate issues and choose appropriate solutions.
HTML supports complex nested structures, CSS styling, and semantic tags that have no direct Markdown equivalent. Elements like <div>, <span>, custom classes, and inline styles typically get discarded during conversion. If the visual presentation depends on these elements, the Markdown output may look different from the original HTML.
Standard Markdown does not support tables -- you need GitHub Flavored Markdown or similar extensions. Complex HTML tables with merged cells, nested tables, or heavy styling cannot be accurately represented in Markdown. Converters typically flatten complex tables or fall back to HTML.
HTML supports embedded videos, audio, iframes, and interactive content. Markdown only natively supports images. Converters handle media in different ways: some preserve the HTML embed code, some convert to simplified Markdown image syntax pointing to the media URL, and some discard the media entirely.
HTML pages often include JavaScript-generated content, interactive widgets, and dynamic elements. Markdown is a static format -- it cannot represent dynamic behavior. Converters can only capture the HTML as it exists at the time of conversion, missing any dynamically loaded content.
HTML forms, buttons, input fields, and interactive controls have no Markdown equivalents. These elements are either discarded or preserved as raw HTML within the Markdown document (if the Markdown flavor supports HTML fallback).
Semantic HTML5 elements like <article>, <section>, <nav>, <aside>, and <figure> provide meaningful structure but have no Markdown syntax. During conversion, their semantic meaning is typically lost, though their content is preserved.
Characters that have special meaning in Markdown (*, _, #, [, ], etc.) must be escaped with backslashes if they appear as literal characters in the content. Converters need to detect these situations and add escapes to prevent rendering issues.
Numerous tools and libraries are available for converting HTML to Markdown. The best choice depends on your use case, programming language, and required features.
Pandoc is a powerful command-line tool that converts between dozens of document formats, including HTML to Markdown. It supports multiple Markdown flavors and offers extensive customization:
pandoc -f html -t markdown input.html -o output.md
# Specify Markdown flavor
pandoc -f html -t gfm input.html -o output.md
Pandoc is written in Haskell and available for all major platforms. It is the gold standard for document conversion and handles complex documents better than most alternatives.
Turndown is a popular JavaScript library for converting HTML to Markdown in Node.js or browsers. It offers a clean API and extensive customization:
const TurndownService = require('turndown');
const turndownService = new TurndownService();
const markdown = turndownService.turndown('<h1>Hello World</h1>');
console.log(markdown); // # Hello World
Turndown supports plugins for GitHub Flavored Markdown tables and other extensions. It is actively maintained and well-documented.
html2text is a Python library and command-line tool for converting HTML to Markdown. It is simple to use and widely deployed:
import html2text
h = html2text.HTML2Text()
h.ignore_links = False
markdown = h.handle('<h1>Hello</h1>')
The library supports configuration options for handling links, images, emphasis styles, and more.
markdownify is another Python library with a simpler API than html2text:
from markdownify import markdownify as md
markdown = md('<strong>Bold</strong>')
# Returns: **Bold**
It is lightweight and good for simple conversions but offers less configurability than html2text.
Web-based converters provide quick one-off conversions without installing software. Our HTML to Markdown Converter is one such tool, offering client-side conversion with full privacy. Other popular online tools include Browserling's HTML to Markdown and ConvertSimple.
Following these best practices ensures clean, maintainable Markdown output from your HTML conversions.
HTML extracted from web pages or rich text editors often contains excessive markup, inline styles, and tracking codes. Pre-process the HTML to remove unnecessary elements before conversion. Tools like Readability or Mozilla's Readability.js can extract the main content and strip away navigation, ads, and other clutter.
Match the Markdown flavor to your target platform. Use GitHub Flavored Markdown for GitHub repositories, CommonMark for maximum compatibility, Pandoc Markdown for academic writing, or standard Markdown for simple content. Using the wrong flavor may result in unsupported syntax that does not render correctly.
Focus on preserving the document's semantic structure (headings, lists, emphasis) rather than its visual appearance. Markdown is about content structure, not presentation. If specific styling is critical, consider keeping that section as HTML within the Markdown document (most flavors allow inline HTML).
Decide how to handle images before converting. Will you keep the original image URLs, download images locally, or convert to data URIs? For documentation, relative paths to local images are often best. For archival purposes, consider downloading images to ensure they remain available.
Always render the converted Markdown to verify it looks correct. Different Markdown processors may interpret syntax slightly differently. Test headings, lists, code blocks, and links to ensure nothing broke during conversion.
Automated conversion rarely produces perfect results. Budget time for manual cleanup: fix escaped characters that should not be escaped, improve list formatting, adjust heading levels, and reformat code blocks. Good converters get you 90% of the way there, but the final 10% requires human judgment.
Maintaining document structure during conversion is crucial for readability and maintainability.
Ensure heading levels are consistent. If your HTML uses <h2> for main sections, they should become ## in Markdown. Nested subsections should use ### and ####. Avoid skipping heading levels (do not jump from ## to ####).
Nested HTML lists should convert to properly indented Markdown lists. Each nesting level requires four spaces or one tab of indentation. Verify that bullets and numbers align correctly after conversion.
Convert <pre><code> blocks to fenced code blocks (```) with language identifiers when possible. This enables syntax highlighting in most Markdown renderers. Extract the language from class names like class="language-javascript" if present.
Nested blockquotes (<blockquote> within <blockquote>) should convert to nested Markdown quotes (> and >>). Blockquotes containing multiple paragraphs need > prefix on each line.
Understanding practical use cases helps you apply conversion techniques effectively.
When migrating documentation from HTML-based systems to Markdown-based documentation generators (Sphinx, MkDocs, Docusaurus), convert HTML files to Markdown while preserving internal links, code examples, and navigation structure. Use reference-style links for repeated URLs to keep Markdown clean.
Export blog posts from CMS platforms as HTML, then convert to Markdown for static site generators. Include frontmatter (YAML metadata) for title, date, tags, and categories. Convert image references to local file paths and download images to the static site's assets directory.
Archive important web articles by converting them to Markdown. Use Readability or similar tools to extract the main content, then convert to Markdown. Include metadata (original URL, author, date) in frontmatter. Download referenced images locally to prevent link rot.
Convert HTML emails to Markdown for inclusion in documentation or issue tracking. Strip signatures, disclaimers, and formatting noise. Preserve quoted replies as blockquotes. Extract attachment information as links.
Our free HTML to Markdown Converter provides instant, client-side conversion of HTML to clean Markdown. All processing happens in your browser -- your HTML content never leaves your device.
Paste your HTML content into the input area. The tool automatically converts it to Markdown using your selected flavor and options. Preview the rendered output to verify formatting. Copy the Markdown to your clipboard or download it as a file. Adjust conversion options if needed to fine-tune the output.
Stop manually converting HTML markup. Use our free converter to transform HTML to clean, readable Markdown in seconds -- with support for multiple flavors and full privacy.
Try the HTML to Markdown Converter NowMaster Markdown syntax, formatting, and preview tools for documentation and content creation.
Convert between JSON and YAML with syntax comparison and best practices.
Learn HTML entity encoding, special character escaping, and XSS prevention techniques.