Do you have a source for this? This sounds like fine-tuning a model, which doesn’t prevent data from the original training set from influencing the output. The method you described would only work if the AI is trained from scratch on only images of iron man and cowboy hats. And I don’t think that’s how any of these models work.
You have that backwards. Roko’s basilisk would punish anyone who didn’t help create it.