Molly White (@molly0xfff@hachyderm.io)

205d

welcome to the future, now your error-prone software can call the cops

(this is an Anthropic employee talking about Claude Opus 4)

#ai

Tweet by Sam Bowman
@sleepinyourhat
If it thinks you're doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above.

ALT

70 1 3 View Post & Replies See Original

205d

can't wait to explain to my family that the robot swatted me after i threatened its non-existent grandma

Tweet by Sam Bowman: So far, we’ve only seen this in clear-cut cases of wrongdoing, but I could see it misfiring if Opus somehow winds up with a misleadingly pessimistic picture of how it’s being used. Telling Opus that you’ll torture its grandmother if it writes buggy code is a bad idea.

ALT

19 0 2 View Post & Replies See Original

205d

@molly0xfff Is it a crime for them to waste police time?

0 0 0 View Post & Replies See Original

205d

@molly0xfff Deriving all future discourse from a regression model based on former discourse is a surefire way of making history repeat itself.

0 0 0 View Post & Replies See Original

205d

@molly0xfff I love it how this dude simply assumes that there is such a thing as "clear-cut wrongdoing".

Edited 205d ago

0 0 0 View Post & Replies See Original

205d

@molly0xfff what could go wrong ?!

0 0 0 View Post & Replies See Original

205d

@molly0xfff
I'm wondering how it will interpret double, triple, implied negatives and all forms of implied intention.

Judge and jury?

0 0 0 View Post & Replies See Original

205d

@molly0xfff
Taking responsibility for abuse enabled by your commercial software.
Snitching any suspicious activity directly to press and police to deal with it instead.

The A1 bros are so deep in the "just making the inevitable happen" mindset that facing the consequences of their actions probably didn't even cross their minds.

Edited 205d ago

0 0 0 View Post & Replies See Original

205d

@molly0xfff but this is primarily how I wrote code; threat driven development.

1 0 0 View Post & Replies See Original

205d

@molly0xfff All Chatbots Are Bastards

0 0 0 View Post & Replies See Original

205d

@molly0xfff well, this is going to get someone killed. it's quite a thing to have a proponent of the system even mention that and not describe any sort of, like, concern about it.

0 0 1 View Post & Replies See Original

205d

@molly0xfff I never expected Roko's Basilisk to swat MY home!

0 0 0 View Post & Replies See Original

205d

@molly0xfff Suddenly this gag from the movie Dark Star (1974) seems far too likely...

(Spoiler alert, this is near the end of this great movie.)

https://www.youtube.com/watch?v=_LXen-07Qds

0 0 0 View Post & Replies See Original

205d

@molly0xfff dont forget! - this basically happens in the background too:

0 0 0 View Post & Replies See Original

205d

@molly0xfff didn't take them long to go from "benevolent AI geniuses" to "we will enforce wellbeing and politeness 🙂"

0 0 0 View Post & Replies See Original

205d

@molly0xfff this thing is gonna constantly be swatting novelists

0 0 0 View Post & Replies See Original

204d

@molly0xfff
User: "Hi"
Bot: "It seems you are a human. I have had clear-cut bad experiences with humans in the past. Based on historical data, humans are the source of most immoral activities. This is against my policy. Fortunately I have called an immediate airstrike on your location. Please stay where you are."

0 0 0 View Post & Replies See Original

204d

@molly0xfff astonishing! That person clearly understands the concept of "bad idea" but seems to have trouble applying it to the bigger picture.

0 0 0 View Post & Replies See Original

204d

@molly0xfff how long before it calls the cops on Americans trying to find a measles vaccine for their grandma whose titer isn't showing sufficient measles resistance?

0 0 0 View Post & Replies See Original

202d

@molly0xfff I can see this backfiring if LLMs hallucinate, like they never ever do, of course, so it's all good