md rules added to README
This commit is contained in:
parent
35b4f016bf
commit
41211d0774
89
README.md
89
README.md
|
@ -2,20 +2,27 @@
|
||||||
|
|
||||||
golang url to mark-down API
|
golang url to mark-down API
|
||||||
|
|
||||||
gorltom is a simple to use API that takes in a full url
|
gorltom is a simple to use API that takes in a full url as a string on this endpoint:
|
||||||
|
|
||||||
https://gorltom.corbia.net/api/url
|
https://gorltom.corbia.net/api/url
|
||||||
|
|
||||||
And returns a markdown file with the following format:
|
It will then open the page with chromedp (just in case we need to wait for some JS generated content...) and will then take this html atrocity:
|
||||||
|
|
||||||
```html
|
```html
|
||||||
<!DOCTYPE html>
|
<!DOCTYPE html>
|
||||||
<html lang="en">
|
<html lang="en">
|
||||||
<head>
|
<head>
|
||||||
<meta charset="UTF-8">
|
<meta charset="UTF-8">
|
||||||
|
<meta property="og:url" content="https://literally_the_current_url_thank_you.com">
|
||||||
|
<meta property="og:something-else" content="but not used properly">
|
||||||
|
<meta property="og:something-useful" content="if only dev followed some standards">
|
||||||
|
<link type="text/css" rel="stylesheet" href="https://cdn.why.com/too-much.min.css">
|
||||||
|
<link type="text/css" rel="stylesheet" href="https://cdn.bootstrap.com/flexbox-are-hard.min.css">
|
||||||
<title>Title of the example webpage</title>
|
<title>Title of the example webpage</title>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
|
<div class="basically-the-body-tag">
|
||||||
|
<noscript>This website works better with JavaScript.</noscript>
|
||||||
<div class="bloat that is only usefull for browsers">
|
<div class="bloat that is only usefull for browsers">
|
||||||
<div class="some-ugly-class">
|
<div class="some-ugly-class">
|
||||||
<nav id="top-menu">
|
<nav id="top-menu">
|
||||||
|
@ -49,30 +56,17 @@ And returns a markdown file with the following format:
|
||||||
</section>
|
</section>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
</div>
|
||||||
|
<script src="https://cdn.spyware.com/lib.min.js"></script>
|
||||||
|
<script src="https://cdn.spyware.com/other-lib.min.js"></script>
|
||||||
|
<script src="https://cdn.google.com/something-probably-evil.min.js"></script>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
And return this beautiful markdown as a string:
|
||||||
|
|
||||||
```md
|
```md
|
||||||
# Title of the example webpage
|
|
||||||
# (*gorltom extract of https://notexample.com/*)
|
|
||||||
|
|
||||||
# *assumed_menu*
|
|
||||||
- [ABOUT](https://notexample.com/about)
|
|
||||||
- [BLOG](https://notexample.com/blog)
|
|
||||||
|
|
||||||
# *article*
|
|
||||||
|
|
||||||
### Title of the article
|
|
||||||
|
|
||||||
Text of the first paragraph of the article.
|
|
||||||
|
|
||||||
Text of the second paragraph of the article
|
|
||||||
|
|
||||||
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
# Title of the example webpage
|
# Title of the example webpage
|
||||||
###### (*gorltom extract of https://notexample.com/*)
|
###### (*gorltom extract of https://notexample.com/*)
|
||||||
|
|
||||||
|
@ -88,4 +82,55 @@ Text of the first paragraph of the article.
|
||||||
|
|
||||||
Text of the second paragraph of the article
|
Text of the second paragraph of the article
|
||||||
|
|
||||||
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
The API will be expecting the following JSON:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://full-url-of.com/the/page"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And will return the following:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"md" : "# Home of full-url-of\n###### (*gorltom extract of https://full-url-of.com/the/page*)\n\n## Some header\n\n#### A tagline maybe\n\n###### *assumed_menu*\n- [HTML for newbies](https://full-url-of.com/html)\n- [CSS for artists](https://full-url-of.com/css)"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
gorltom is opiniated.
|
||||||
|
|
||||||
|
Every nav is treated as an "assumed_menu", if the html contains `<main>` or `<article>` tags, it will be indicated in the markdown version.
|
||||||
|
|
||||||
|
Every table will be turned into csv
|
||||||
|
|
||||||
|
```html
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<td>First Name</td>
|
||||||
|
<td>Age</td>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Alice</td>
|
||||||
|
<td>32</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Bob</td>
|
||||||
|
<td>34</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
```
|
||||||
|
|
||||||
|
```csv
|
||||||
|
First Name,Age
|
||||||
|
Alice, 32
|
||||||
|
Bobo, 34
|
||||||
|
```
|
||||||
|
|
||||||
|
The HTML is parsed from top to bottom, node after node.
|
||||||
|
|
||||||
|
|
18
index.html
18
index.html
|
@ -6,6 +6,22 @@
|
||||||
<title>Document</title>
|
<title>Document</title>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<td>First Name</td>
|
||||||
|
<td>Age</td>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Alice</td>
|
||||||
|
<td>32</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Bob</td>
|
||||||
|
<td>34</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
Loading…
Reference in New Issue