summaryrefslogtreecommitdiff
path: root/content/blog/2024/llm-honeypot.md
diff options
context:
space:
mode:
authormms <michal@sapka.me>2024-06-28 14:19:54 +0200
committermms <michal@sapka.me>2024-06-28 14:19:54 +0200
commit9566717e717438cee59e6f251a8234b5164f56b5 (patch)
tree786302fb417b0a06617b2e2fb88b03ab386d7329 /content/blog/2024/llm-honeypot.md
parent26ecb8dab3c403241f4cdb166535c933afb7023a (diff)
feat(blog): honeypot
Diffstat (limited to 'content/blog/2024/llm-honeypot.md')
-rw-r--r--content/blog/2024/llm-honeypot.md32
1 files changed, 32 insertions, 0 deletions
diff --git a/content/blog/2024/llm-honeypot.md b/content/blog/2024/llm-honeypot.md
new file mode 100644
index 0000000..aa6240f
--- /dev/null
+++ b/content/blog/2024/llm-honeypot.md
@@ -0,0 +1,32 @@
++++
+title = "LLM honeypot"
+author = ["MichaƂ Sapka"]
+date = 2024-06-28T14:14:00+02:00
+categories = ["blog"]
+draft = false
+weight = 2002
+abstract = "The only way to fight I see"
++++
+
+Big tech doesn't care about people; LLM industry actively seeks harm.
+We're [seeing it time after time again](https://www.theverge.com/2024/6/27/24187405/perplexity-ai-twitter-lie-plagiarism).
+They consider the open web to be a resource that exists only for them to harvest.
+
+But the web was designed with good intentions in mind.
+There is no way to actively _block_ them.
+Copyright? Nope, fair use.
+Robots.txt? Nope, some don't care - other pretend to care after the theft.
+Identifying them? Good luck. Not only the IPs are in _millions_, but they lie in their user-agents.
+
+Some are trying to poison the LLM by prompt injection, but this will not work in any bigger dataset.
+
+Personally, I want to at least try.
+Therefore, my site contains a honeypot: _open [a gi repository](https://michal.sapka.me/git/mms/Library-of-knowledge) and your IP will be logged_.
+For now I collect them, but soon they will be blocked on my firewall for some time - a week maybe?
+
+This repo is:
+
+- disallowed by robots.txt, so no good agents would harvest it
+- labeled as ban hammer in the description.
+
+I'll wait for some time and publish results.