md rules added to README

This commit is contained in:
ed barz 2023-11-14 23:56:54 +01:00
parent 35b4f016bf
commit 41211d0774
2 changed files with 84 additions and 23 deletions

View File

@ -2,20 +2,27 @@
golang url to mark-down API
gorltom is a simple to use API that takes in a full url
gorltom is a simple to use API that takes in a full url as a string on this endpoint:
https://gorltom.corbia.net/api/url
And returns a markdown file with the following format:
It will then open the page with chromedp (just in case we need to wait for some JS generated content...) and will then take this html atrocity:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta property="og:url" content="https://literally_the_current_url_thank_you.com">
<meta property="og:something-else" content="but not used properly">
<meta property="og:something-useful" content="if only dev followed some standards">
<link type="text/css" rel="stylesheet" href="https://cdn.why.com/too-much.min.css">
<link type="text/css" rel="stylesheet" href="https://cdn.bootstrap.com/flexbox-are-hard.min.css">
<title>Title of the example webpage</title>
</head>
<body>
<div class="basically-the-body-tag">
<noscript>This website works better with JavaScript.</noscript>
<div class="bloat that is only usefull for browsers">
<div class="some-ugly-class">
<nav id="top-menu">
@ -49,30 +56,17 @@ And returns a markdown file with the following format:
</section>
</div>
</div>
</div>
<script src="https://cdn.spyware.com/lib.min.js"></script>
<script src="https://cdn.spyware.com/other-lib.min.js"></script>
<script src="https://cdn.google.com/something-probably-evil.min.js"></script>
</body>
</html>
```
And return this beautiful markdown as a string:
```md
# Title of the example webpage
# (*gorltom extract of https://notexample.com/*)
# *assumed_menu*
- [ABOUT](https://notexample.com/about)
- [BLOG](https://notexample.com/blog)
# *article*
### Title of the article
Text of the first paragraph of the article.
Text of the second paragraph of the article
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
```
# Title of the example webpage
###### (*gorltom extract of https://notexample.com/*)
@ -89,3 +83,54 @@ Text of the first paragraph of the article.
Text of the second paragraph of the article
Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
```
The API will be expecting the following JSON:
```json
{
"url": "https://full-url-of.com/the/page"
}
```
And will return the following:
```json
{
"md" : "# Home of full-url-of\n###### (*gorltom extract of https://full-url-of.com/the/page*)\n\n## Some header\n\n#### A tagline maybe\n\n###### *assumed_menu*\n- [HTML for newbies](https://full-url-of.com/html)\n- [CSS for artists](https://full-url-of.com/css)"
}
```
gorltom is opiniated.
Every nav is treated as an "assumed_menu", if the html contains `<main>` or `<article>` tags, it will be indicated in the markdown version.
Every table will be turned into csv
```html
<table>
<thead>
<td>First Name</td>
<td>Age</td>
</thead>
<tbody>
<tr>
<td>Alice</td>
<td>32</td>
</tr>
<tr>
<td>Bob</td>
<td>34</td>
</tr>
</tbody>
</table>
```
```csv
First Name,Age
Alice, 32
Bobo, 34
```
The HTML is parsed from top to bottom, node after node.

View File

@ -7,5 +7,21 @@
</head>
<body>
<table>
<thead>
<td>First Name</td>
<td>Age</td>
</thead>
<tbody>
<tr>
<td>Alice</td>
<td>32</td>
</tr>
<tr>
<td>Bob</td>
<td>34</td>
</tr>
</tbody>
</table>
</body>
</html>