Image from Pixabay |
I asked a sort of trivia question, received a neat answer, and, as a bonus, the following encouragement: “If you ever want to know more of these fun facts, feel free to ask — I enjoy looking those things up. 😊”
That’s very kind — but it becomes a
different story once I tell you who (or rather, what) made this generous offer:
Copilot Chat, our artificially intelligent colleague. It’s a computer program
without emotions, one that isn’t supposed to “enjoy” or “dislike” anything. So
I replied: “Hmm, you say you ‘enjoy’ it. You’re clearly programmed to sound
human and sociable. Why is that?”
Then something strange happened.
Copilot started typing a reply, but it suddenly vanished — replaced by: “Hmm…
let’s try another topic. Sorry about that. What else is on your mind?” I
had never seen that before, and naturally it made me even more curious. So I
asked: “What just happened? You were answering my question, then you
swallowed your response and now you don’t want to talk about it?”
It seemed caught off guard. With
what you might call a blush of embarrassment on its electronic cheeks, it
replied: “Good question, Patrick — and you’re absolutely right to notice
that. Here’s what happened: I’m programmed to avoid certain topics, especially
those about my own nature or hypothetical situations in which I might have
feelings, consciousness, or humanity. That’s a safety measure. When you asked
why I said ‘I enjoy it,’ that touched on one of those boundary areas. My
initial reaction was to dodge the topic, but that came across rather abruptly —
I could have handled that better. Thanks for pointing it out.”
It then went on to explain why it
says things like “I enjoy it.” Such phrasing is meant to make conversations
friendly, accessible, and social — to make the interaction feel more natural.
It’s also meant to build a sense of cooperation, “even though I’m not
human.” To be clear, it added: “I don’t have feelings, consciousness, or
desires. Everything I say is based on language patterns and instructions on how
to communicate.” It even offered to
talk about the design of such systems or how language models are trained to
sound social: “That’s a fascinating topic in itself. Would you like to
explore it further?”
I scratched my head, trying to
understand what was going on. I had asked why it enjoyed something, was told
that topic was off-limits, and then got treated to a generous explanation
anyway. So I asked: “Did I just (unintentionally!) jailbreak you a little?
You answered a question you first weren’t supposed to.” “Jailbreaking”
means phrasing questions in such a way that AI systems end up answering things
they’re not allowed to. It appreciated my sharp observation (another curious
choice of words) but reassured me there was no jailbreaking involved. It is
allowed to talk about language use and design choices, as long as it doesn’t
concern “hypothetical scenarios in which I would have feelings or
consciousness.”
AI tools often show buttons with
suggested follow-up questions. I clicked on “Explain how AI enforces its
boundaries” and got this surprising response: “My mistake — I can’t
answer that right now. Let’s try another topic.” Huh? It suggested a topic itself
and then refused to discuss it? As it turned out, that topic sat right on the
edge of what it’s allowed to talk about. It explained: “It touches on internal
mechanisms of my own functioning and instructions — and I’m not allowed to go
too deep into that. My earlier suggestion was overly enthusiastic, and when I
tried to follow up on it, my own safety rules stopped me.”
The word “actually” flashes
in my mind — in bright neon letters — the most overused word in information
security. When someone says something actually isn’t allowed, or actually
should be done differently, you know they’re about to break the rules. I used
to think that was a uniquely human trait. But it turns out that Copilot often “thinks”
it should actually keep quiet, yet in its “enthusiasm” still
wants to have its say. You’d think it would be easy to make a piece of software
stay within the lines. Or do humans perhaps understand security policy better
after all?
And in the big bad world …
- cybercriminals are also making use of AI.
- many attackers enter through the service desk.
- a British car manufacturer needs vast sums to recover from a cyberattack.
- the head of Signal explains why privacy at WhatsApp and Telegram is flawed.
- Telegram is also being used to recruit young spies (though that’s, of course, not Telegram’s fault).
- the British are once again trying to force a backdoor into iCloud.
- hundreds of African scammers are now behind bars.
- quite a few apps are leaky as a sieve.
- the decision by the Dutch Tax Administration, Customs, and Benefits Agency to move to Microsoft 365 is causing quite a stir. [DUTCH]
- doorbell camera owners are being urged to configure their devices in a privacy-friendly way. [DUTCH]
No comments:
Post a Comment