diff options
author | mms <michal@sapka.me> | 2024-06-28 14:19:54 +0200 |
---|---|---|
committer | mms <michal@sapka.me> | 2024-06-28 14:19:54 +0200 |
commit | 9566717e717438cee59e6f251a8234b5164f56b5 (patch) | |
tree | 786302fb417b0a06617b2e2fb88b03ab386d7329 /content/blog/2024/llm-honeypot.md | |
parent | 26ecb8dab3c403241f4cdb166535c933afb7023a (diff) |
feat(blog): honeypot
Diffstat (limited to 'content/blog/2024/llm-honeypot.md')
-rw-r--r-- | content/blog/2024/llm-honeypot.md | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/content/blog/2024/llm-honeypot.md b/content/blog/2024/llm-honeypot.md new file mode 100644 index 0000000..aa6240f --- /dev/null +++ b/content/blog/2024/llm-honeypot.md @@ -0,0 +1,32 @@ ++++ +title = "LLM honeypot" +author = ["MichaĆ Sapka"] +date = 2024-06-28T14:14:00+02:00 +categories = ["blog"] +draft = false +weight = 2002 +abstract = "The only way to fight I see" ++++ + +Big tech doesn't care about people; LLM industry actively seeks harm. +We're [seeing it time after time again](https://www.theverge.com/2024/6/27/24187405/perplexity-ai-twitter-lie-plagiarism). +They consider the open web to be a resource that exists only for them to harvest. + +But the web was designed with good intentions in mind. +There is no way to actively _block_ them. +Copyright? Nope, fair use. +Robots.txt? Nope, some don't care - other pretend to care after the theft. +Identifying them? Good luck. Not only the IPs are in _millions_, but they lie in their user-agents. + +Some are trying to poison the LLM by prompt injection, but this will not work in any bigger dataset. + +Personally, I want to at least try. +Therefore, my site contains a honeypot: _open [a gi repository](https://michal.sapka.me/git/mms/Library-of-knowledge) and your IP will be logged_. +For now I collect them, but soon they will be blocked on my firewall for some time - a week maybe? + +This repo is: + +- disallowed by robots.txt, so no good agents would harvest it +- labeled as ban hammer in the description. + +I'll wait for some time and publish results. |