md rules added to README
This commit is contained in:
parent
35b4f016bf
commit
41211d0774
89
README.md
89
README.md
|
@ -2,20 +2,27 @@
|
|||
|
||||
golang url to mark-down API
|
||||
|
||||
gorltom is a simple to use API that takes in a full url
|
||||
gorltom is a simple to use API that takes in a full url as a string on this endpoint:
|
||||
|
||||
https://gorltom.corbia.net/api/url
|
||||
|
||||
And returns a markdown file with the following format:
|
||||
It will then open the page with chromedp (just in case we need to wait for some JS generated content...) and will then take this html atrocity:
|
||||
|
||||
```html
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta property="og:url" content="https://literally_the_current_url_thank_you.com">
|
||||
<meta property="og:something-else" content="but not used properly">
|
||||
<meta property="og:something-useful" content="if only dev followed some standards">
|
||||
<link type="text/css" rel="stylesheet" href="https://cdn.why.com/too-much.min.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://cdn.bootstrap.com/flexbox-are-hard.min.css">
|
||||
<title>Title of the example webpage</title>
|
||||
</head>
|
||||
<body>
|
||||
<div class="basically-the-body-tag">
|
||||
<noscript>This website works better with JavaScript.</noscript>
|
||||
<div class="bloat that is only usefull for browsers">
|
||||
<div class="some-ugly-class">
|
||||
<nav id="top-menu">
|
||||
|
@ -49,30 +56,17 @@ And returns a markdown file with the following format:
|
|||
</section>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<script src="https://cdn.spyware.com/lib.min.js"></script>
|
||||
<script src="https://cdn.spyware.com/other-lib.min.js"></script>
|
||||
<script src="https://cdn.google.com/something-probably-evil.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
And return this beautiful markdown as a string:
|
||||
|
||||
```md
|
||||
# Title of the example webpage
|
||||
# (*gorltom extract of https://notexample.com/*)
|
||||
|
||||
# *assumed_menu*
|
||||
- [ABOUT](https://notexample.com/about)
|
||||
- [BLOG](https://notexample.com/blog)
|
||||
|
||||
# *article*
|
||||
|
||||
### Title of the article
|
||||
|
||||
Text of the first paragraph of the article.
|
||||
|
||||
Text of the second paragraph of the article
|
||||
|
||||
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
||||
|
||||
```
|
||||
|
||||
# Title of the example webpage
|
||||
###### (*gorltom extract of https://notexample.com/*)
|
||||
|
||||
|
@ -88,4 +82,55 @@ Text of the first paragraph of the article.
|
|||
|
||||
Text of the second paragraph of the article
|
||||
|
||||
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
||||
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
||||
|
||||
```
|
||||
|
||||
The API will be expecting the following JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://full-url-of.com/the/page"
|
||||
}
|
||||
```
|
||||
|
||||
And will return the following:
|
||||
```json
|
||||
{
|
||||
"md" : "# Home of full-url-of\n###### (*gorltom extract of https://full-url-of.com/the/page*)\n\n## Some header\n\n#### A tagline maybe\n\n###### *assumed_menu*\n- [HTML for newbies](https://full-url-of.com/html)\n- [CSS for artists](https://full-url-of.com/css)"
|
||||
}
|
||||
```
|
||||
|
||||
gorltom is opiniated.
|
||||
|
||||
Every nav is treated as an "assumed_menu", if the html contains `<main>` or `<article>` tags, it will be indicated in the markdown version.
|
||||
|
||||
Every table will be turned into csv
|
||||
|
||||
```html
|
||||
<table>
|
||||
<thead>
|
||||
<td>First Name</td>
|
||||
<td>Age</td>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Alice</td>
|
||||
<td>32</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Bob</td>
|
||||
<td>34</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
```
|
||||
|
||||
```csv
|
||||
First Name,Age
|
||||
Alice, 32
|
||||
Bobo, 34
|
||||
```
|
||||
|
||||
The HTML is parsed from top to bottom, node after node.
|
||||
|
||||
|
|
18
index.html
18
index.html
|
@ -6,6 +6,22 @@
|
|||
<title>Document</title>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<td>First Name</td>
|
||||
<td>Age</td>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Alice</td>
|
||||
<td>32</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Bob</td>
|
||||
<td>34</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
Loading…
Reference in New Issue