summaryrefslogtreecommitdiff
path: root/content-org/blog.org
diff options
context:
space:
mode:
authormms <michal@sapka.me>2024-06-28 14:19:54 +0200
committermms <michal@sapka.me>2024-06-28 14:19:54 +0200
commit9566717e717438cee59e6f251a8234b5164f56b5 (patch)
tree786302fb417b0a06617b2e2fb88b03ab386d7329 /content-org/blog.org
parent26ecb8dab3c403241f4cdb166535c933afb7023a (diff)
feat(blog): honeypot
Diffstat (limited to 'content-org/blog.org')
-rw-r--r--content-org/blog.org32
1 files changed, 30 insertions, 2 deletions
diff --git a/content-org/blog.org b/content-org/blog.org
index 7b1c18e..e22862e 100644
--- a/content-org/blog.org
+++ b/content-org/blog.org
@@ -7,7 +7,7 @@
#+HUGO_WEIGHT: auto
#+HUGO_SECTION: blog
-* 2024 [42/43] :@blog:
+* 2024 [43/44] :@blog:
:PROPERTIES:
:EXPORT_HUGO_SECTION: blog/2024
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :image_dir "blog/images" :image_max_width 600
@@ -61,9 +61,37 @@ Federation is great for the people, not for the gatekeepers.
DMA is failing here, because it allows Facebook to pick and choose who they make Messenger interoperable.
There will be no official XMPP bridge anytime soon, and therefore there will be no real interoperability.
-What we need is a great product.
+What we need is a great product
/We've got the technology/.
+** DONE LLM honeypot
+CLOSED: [2024-06-28 Fri 14:14]
+:PROPERTIES:
+:EXPORT_FILE_NAME: llm-honeypot
+:EXPORT_HUGO_CUSTOM_FRONT_MATTER: :abstract The only way to fight I see
+:END:
+
+Big tech doesn't care about people; LLM industry actively seeks harm.
+We're [[https://www.theverge.com/2024/6/27/24187405/perplexity-ai-twitter-lie-plagiarism][seeing it time after time again]].
+They consider the open web to be a resource that exists only for them to harvest.
+
+But the web was designed with good intentions in mind.
+There is no way to actively /block/ them.
+Copyright? Nope, fair use.
+Robots.txt? Nope, some don't care - other pretend to care after the theft.
+Identifying them? Good luck. Not only the IPs are in /millions/, but they lie in their user-agents.
+
+Some are trying to poison the LLM by prompt injection, but this will not work in any bigger dataset.
+
+Personally, I want to at least try.
+Therefore, my site contains a honeypot: /open [[https://michal.sapka.me/git/mms/Library-of-knowledge][a gi repository]] and your IP will be logged/.
+For now I collect them, but soon they will be blocked on my firewall for some time - a week maybe?
+
+This repo is:
+- disallowed by robots.txt, so no good agents would harvest it
+- labeled as ban hammer in the description.
+
+I'll wait for some time and publish results.
** DONE Yey EU
CLOSED: [2024-06-26 Wed 22:10]
:PROPERTIES: