md rules added to README

2023-11-14 23:56:54 +01:00 · 2023-11-14 23:56:54 +01:00 · 41211d0774
parent 35b4f016bf
commit 41211d0774
2 changed files with 84 additions and 23 deletions
--- a/README.md
+++ b/README.md
@ -2,20 +2,27 @@

 golang url to mark-down API

-gorltom is a simple to use API that takes in a full url
+gorltom is a simple to use API that takes in a full url as a string on this endpoint:

 https://gorltom.corbia.net/api/url

-And returns a markdown file with the following format:
+It will then open the page with chromedp (just in case we need to wait for some JS generated content...) and will then take this html atrocity:

 ```html
 <!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
+    <meta property="og:url" content="https://literally_the_current_url_thank_you.com">
+    <meta property="og:something-else" content="but not used properly">
+    <meta property="og:something-useful" content="if only dev followed some standards">
+    <link type="text/css" rel="stylesheet" href="https://cdn.why.com/too-much.min.css">
+    <link type="text/css" rel="stylesheet" href="https://cdn.bootstrap.com/flexbox-are-hard.min.css">
    <title>Title of the example webpage</title>
 </head>
 <body>
+<div class="basically-the-body-tag">
+<noscript>This website works better with JavaScript.</noscript>
 <div class="bloat that is only usefull for browsers">
    <div class="some-ugly-class">
        <nav id="top-menu">
@ -49,30 +56,17 @@ And returns a markdown file with the following format:
        </section>
    </div>
 </div>
+</div>
+<script src="https://cdn.spyware.com/lib.min.js"></script>
+<script src="https://cdn.spyware.com/other-lib.min.js"></script>
+<script src="https://cdn.google.com/something-probably-evil.min.js"></script>
 </body>
 </html>
 ```

+And return this beautiful markdown as a string:
+
 ```md
-# Title of the example webpage
-# (*gorltom extract of https://notexample.com/*)
-
-# *assumed_menu*
- [ABOUT](https://notexample.com/about)
- [BLOG](https://notexample.com/blog)
-
-# *article*
-
-### Title of the article
-
-Text of the first paragraph of the article.
-
-Text of the second paragraph of the article
-
-Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
-
-```
-
 # Title of the example webpage
 ###### (*gorltom extract of https://notexample.com/*)

@ -89,3 +83,54 @@ Text of the first paragraph of the article.
 Text of the second paragraph of the article

 Text of the third paragraph of the article but this time it contains a [link]("https://link-to-another-website.com/example") inside of the text.
+
+```
+
+The API will be expecting the following JSON:
+
+```json
+{
+    "url": "https://full-url-of.com/the/page"
+}
+```
+
+And will return the following: 
+```json
+{
+    "md" : "# Home of full-url-of\n###### (*gorltom extract of https://full-url-of.com/the/page*)\n\n## Some header\n\n#### A tagline maybe\n\n###### *assumed_menu*\n- [HTML for newbies](https://full-url-of.com/html)\n- [CSS for artists](https://full-url-of.com/css)"
+}
+```
+
+gorltom is opiniated.
+
+Every nav is treated as an "assumed_menu", if the html contains `<main>` or `<article>` tags, it will be indicated in the markdown version.
+
+Every table will be turned into csv
+
+```html
+    <table>
+        <thead>
+            <td>First Name</td>
+            <td>Age</td>
+        </thead>
+        <tbody>
+            <tr>
+                <td>Alice</td>
+                <td>32</td>
+            </tr>
+            <tr>
+                <td>Bob</td>
+                <td>34</td>
+            </tr>
+        </tbody>
+    </table>
+```
+
+```csv
+First Name,Age
+Alice, 32
+Bobo, 34
+```
+
+The HTML is parsed from top to bottom, node after node.
+
--- a/index.html
+++ b/index.html
@ -7,5 +7,21 @@
 </head>
 <body>

+    <table>
+        <thead>
+            <td>First Name</td>
+            <td>Age</td>
+        </thead>
+        <tbody>
+            <tr>
+                <td>Alice</td>
+                <td>32</td>
+            </tr>
+            <tr>
+                <td>Bob</td>
+                <td>34</td>
+            </tr>
+        </tbody>
+    </table>
 </body>
 </html>