137 lines
4.0 KiB
Markdown
137 lines
4.0 KiB
Markdown
# gorltom
|
|
|
|
golang url to mark-down API
|
|
|
|
gorltom is a simple to use API that takes in a full url as a string on this endpoint:
|
|
|
|
https://gorltom.corbia.net/api/url
|
|
|
|
It will then open the page with chromedp (just in case we need to wait for some JS generated content...) and will then take this html atrocity:
|
|
|
|
```html
|
|
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="UTF-8">
|
|
<meta property="og:url" content="https://literally_the_current_url_thank_you.com">
|
|
<meta property="og:something-else" content="but not used properly">
|
|
<meta property="og:something-useful" content="if only dev followed some standards">
|
|
<link type="text/css" rel="stylesheet" href="https://cdn.why.com/too-much.min.css">
|
|
<link type="text/css" rel="stylesheet" href="https://cdn.bootstrap.com/flexbox-are-hard.min.css">
|
|
<title>Title of the example webpage</title>
|
|
</head>
|
|
<body>
|
|
<div class="basically-the-body-tag">
|
|
<noscript>This website works better with JavaScript.</noscript>
|
|
<div class="bloat that is only usefull for browsers">
|
|
<div class="some-ugly-class">
|
|
<nav id="top-menu">
|
|
<ul class="nostyle ul bs">
|
|
<li class="random bs 342345234fffDDD">
|
|
<span class="menu item obviously">
|
|
<a href="/about" target="_blank">ABOUT</a>
|
|
</span>
|
|
</li>
|
|
<li class="random bs 342345234fffDDD">
|
|
<span class="menu item obviously">
|
|
<a href="/blog" target="_blank">BLOG</a>
|
|
</span>
|
|
</li>
|
|
</ul>
|
|
</nav>
|
|
</div>
|
|
<aside>
|
|
...
|
|
</aside>
|
|
<div>
|
|
<section class="main">
|
|
<article>
|
|
<header class="article-header-top-max">
|
|
<h3>Title of the article</h3>
|
|
</header>
|
|
<p>Text of the first paragraph of the article.</p><br>
|
|
<p>Text of the second paragraph of the article.</p><br>
|
|
<p>Text of the third paragraph of the article but this time it contains a <a href="https://link-to-another-website.com/example">link</a> inside of the text.</p><br>
|
|
</article>
|
|
</section>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<script src="https://cdn.spyware.com/lib.min.js"></script>
|
|
<script src="https://cdn.spyware.com/other-lib.min.js"></script>
|
|
<script src="https://cdn.google.com/something-probably-evil.min.js"></script>
|
|
</body>
|
|
</html>
|
|
```
|
|
|
|
And return this beautiful markdown as a string:
|
|
|
|
```md
|
|
# Title of the example webpage
|
|
###### (*gorltom extract of https://notexample.com/*)
|
|
|
|
###### *assumed_menu*
|
|
- [ABOUT](https://notexample.com/about)
|
|
- [BLOG](https://notexample.com/blog)
|
|
|
|
###### *article*
|
|
|
|
### Title of the article
|
|
|
|
Text of the first paragraph of the article.
|
|
|
|
Text of the second paragraph of the article
|
|
|
|
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
|
|
|
```
|
|
|
|
The API will be expecting the following JSON:
|
|
|
|
```json
|
|
{
|
|
"url": "https://full-url-of.com/the/page"
|
|
}
|
|
```
|
|
|
|
And will return the following:
|
|
```json
|
|
{
|
|
"md" : "# Home of full-url-of\n###### (*gorltom extract of https://full-url-of.com/the/page*)\n\n## Some header\n\n#### A tagline maybe\n\n###### *assumed_menu*\n- [HTML for newbies](https://full-url-of.com/html)\n- [CSS for artists](https://full-url-of.com/css)"
|
|
}
|
|
```
|
|
|
|
gorltom is opiniated.
|
|
|
|
Every nav is treated as an "assumed_menu", if the html contains `<main>` or `<article>` tags, it will be indicated in the markdown version.
|
|
|
|
Every table will be turned into csv
|
|
|
|
```html
|
|
<table>
|
|
<thead>
|
|
<td>First Name</td>
|
|
<td>Age</td>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>Alice</td>
|
|
<td>32</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Bob</td>
|
|
<td>34</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
```
|
|
|
|
```csv
|
|
First Name,Age
|
|
Alice, 32
|
|
Bobo, 34
|
|
```
|
|
|
|
The HTML is parsed from top to bottom, node after node.
|
|
|