• 0 Posts
  • 24 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle




  • Just because something is available to view online does not mean you can do anything you want with it. Most content is automatically protected by copyright. You can use it in ways that would otherwise by illegal only if you are explicitly granted permission to do so.

    Specifically, Stack Overflow licenses any content you contribute under the CC-BY-SA 4.0 (older content is covered by other licenses that I omit for simplicity). If you read the license you will note two restrictions: attribution and “share-alike”. So if you take someone’s answer, including the code snippets, and include it in something you make, even if you change it to an extent, you have to attribute it to the original source and you have to share it with the same license. You could theoretically mirror the entire SO site’s content, as long as you used the same licenses for all of it.

    So far AI companies have simply scraped everything and argued that they don’t have to respect the original license. They argue that it is “fair use” because AI is “transformative use”. If you look at the historical usage of “transformative use” in copyright cases, their case is kind of bullshit actually. But regardless of whether it will hold up in court (and whether it should hold up in court), the reality is that AI companies are going to use everybody’s content in ways that they have not been given permission to do so.

    So for now it doesn’t matter whether our content is centralized or federated. It doesn’t matter whether SO has a deal with OpeanAI or not. SO content was almost certainly already used for ChatGPT. If you split it into 100s of small sites on the fediverse it would still be part of ChatGPT. As long as it’s easy to access, they will use it. Allegedly they also use torrents for input data so even if it’s not publicly viewable it’s not safe. If/when AI data sourcing is regulated and the “transformative use” argument fails in court and if the fines are big enough for the regulation to actually work, then sure the situation described in the OP will matter. But we’ll have to see if that ever happens. I’m not holding my breath, honestly.




  • No, the intent and the consequences of an action are generally taken into consideration in discussions of ethins and in legislation. Additionally, this is not just a matter of ToS. What OpenAI does is create and distribute illegitimate derivative works. They are relying on the argument that what they do is transformative use, which is not really congruent with what “transformative use” has meant historically. We will see in time what the courts have to say about this. But in any case, it will not be judged the same way as a person using a tool just to skip ads. And Revanced is different to both the above because it is a non-commercial service.


  • It’s definitely not “draconian” to make enshittification illegal. But you don’t regulate the turning-to-shit part. You regulate the part where they offer a service for free or too cheap so that they kill the competition. This is called anti-competitive and we supposedly address it already. You also regulate what an EULA can enforce and the ability of companies to change the EULA after a user has agreed to it. Again, these concepts already exist in law.

    We’ve essentially already identified these problems and we have decided that we need to address them, but we been ineffective in doing so for various reasons.



  • Humans are not generally allowed to do what AI is doing! You talk about copying someone else’s “style” because you know that “style” is not protected by copyright, but that is a false equivalence. An AI is not copying “style”, but rather every discernible pattern of its input. It is just as likely to copy Walt Disney’s drawing style as it is to copy the design of Mickey Mouse. We’ve seen countless examples of AI’s copying characters, verbatim passages of texts and snippets of code. Imagine if a person copied Mickey Mouse’s character design and they got sued for copyright infringement. Then they go to court and their defense was that they downloaded copies of the original works without permission and studied them for the sole purpose of imitating them. They would be admitting that every perceived similarity is intentional. Do you think they would not be found guilty of copyright infringement? And AI is this example taken to the extreme. It’s not just creating something similar, it is by design trying to maximize the similarity of its output to its training data. It is being the least creative that is mathematically possible. The AI’s only trick is that it threw so many stuff into its mixer of training data that you can’t generally trace the output to a specific input. But the math is clear. And while its obvious that no sane person will use a copy of Mickey Mouse just because an AI produced it, the same cannot be said for characters of lesser known works, passages from obscure books, and code snippets from small free software projects.

    In addition to the above, we allow humans to engage in potentially harmful behavior for various reasons that do not apply to AIs.

    • “Innocent until proven guilty” is fundamental to our justice systems. The same does not apply to inanimate objects. Eg a firearm is restricted because of the danger it poses even if it has not been used to shoot someone. A person is only liable for the damage they have caused, never their potential to cause it.
    • We care about peoples’ well-being. We would not ban people from enjoying art just because they might copy it because that would be sacrificing too much. However, no harm is done to an AI when it is prevented from being trained, because an AI is not a person with feelings.
    • Human behavior is complex and hard to control. A person might unintentionally copy protected elements of works when being influenced by them, but that’s hard to tell in most cases. An AI has the sole purpose of copying patterns with no other input.

    For all of the above reasons, we choose to err on the side of caution when restricting human behavior, but we have no reason to do the same for AIs, or anything inanimate.

    In summary, we do not allow humans to do what AIs are doing now and even if we did, that would not be a good argument against AI regulation.





  • lsblk is just lacking a lot of information and creating a false impression of what is happening. I did a bind mount to try it out.

    sudo mount -o ro --bind /var/log /mnt
    

    This mounts /var/log to /mnt without making any other changes. My root partition is still mounted at / and fully functional. However, all that lsblk shows under MOUNTPOINTS is /mnt. There is no indication that it’s just /var/log that is mounted and not the entire root partition. There is also no mention at all of /. findmnt shows this correctly. Omitting all irrelevant info, I get:

    TARGET                                                SOURCE                 [...]
    /                                                     /dev/dm-0              [...]
    [...]
    └─/mnt                                                /dev/dm-0[/var/log]    [...]
    

    Here you can see that the same device is used for both mountpoints and that it’s just /var/log that is mounted at /mnt.

    Snap is probably doing something similar. It is mounting a specific directory into the directory of the firefox snap. It is not using your entire root partition and it’s not doing something that would break the / mountpoint. This by itself should cause no issues at all. You can see in the issue you linked as well that the fix to their boot issue was something completely irrelevant.



  • Im not 100% comfortable with AI gfs and the direction society could potentially be heading. I don’t like that some people have given up on human interaction and the struggle for companionship, and feel the need to resort to a poor artificial substitute for genuine connection.

    That’s not even the scary part. What we really shouldn’t be uncomfortable with is this very closed technology having so much power over people. There’s going to be a handful of gargantuan immoral companies controlling a service that the most emotionally vulnerable people will become addicted to.



  • Exactly this. I can’t believe how many comments I’ve read accusing the AI critics of holding back progress with regressive copyright ideas. No, the regressive ideas are already there, codified as law, holding the rest of us back. Holding AI companies accountable for their copyright violations will force them to either push to reform the copyright system completely, or to change their practices for the better (free software, free datasets, non-commercial uses, real non-profit orgs for the advancement of the technology). Either way we have a lot to gain by forcing them to improve the situation. Giving AI companies a free pass on the copyright system will waste what is probably the best opportunity we have ever had to improve the copyright system.


  • LLMs can do far more

    What does this mean? I don’t care what you (claim) your model “could” do, or what LLMs in general could do. What we’ve got are services trained on images that make images, services trained on code that write code etc. If AI companies want me to judge the AI as if that is the product, then let them give us all equal and unrestricted access to it. Then maybe I would entertain the “transformative use” argument. But what we actually get are very narrow services, where the AI just happens to be a tool used in the backend and not part of the end product the user receives.

    Can it write stories in the style of GRRM?

    Talking about “style” is misleading because “style” cannot be copyrighted. It’s probably impractical to even define “style” in a legal context. But an LLM doesn’t copy styles, it copies patterns, whatever they happen to be. Some patterns are copyrightable, eg a character name and description. And it’s not obvious what is ok to copy and what isn’t. Is a character’s action copyrightable? It depends, is the action opening a door or is it throwing a magical ring into a volcano? If you tell a human to do something in the style of GRRM, they would try to match the medieval fantasy setting and the mood, but they would know to make their own characters and story arcs. The LLM will parrot anything with no distinction.

    Any writer claiming to be so unique that they aren’t borrowing from other writers is full of shit.

    This is a false equivalence between how an LLM works and how a person works. The core ideas expressed here is that we should treat products and humans equivalently, and that how an LLM functions is basically how humans think. Both of these are objectively wrong.

    For one, humans are living beings with feelings. The entire point of our legal system is to protect our rights. When we restrict human behavior it is justified because it protects others; at least that’s the formal reasoning. We (mostly) judge people based on what they’ve done and not what we know they could do. This is not how we treat products and that makes sense. We regulate weapons because they could kill someone, but we only punish a person after they have committed a crime. Similarly a technology designed to copy can be regulated, whereas a person copying someone else’s works could be (and often is) punished for it after it is proven that they did it. Even if you think that products and humans should be treated equally, it is a fact that our justice system doesn’t work that way.

    People also have many more functions and goals than an LLM. At this point it is important to remember that an LLM does literally one thing: for every word it writes it chooses the one that would “most likely” appear next based on its training data. I put “most likely” in quotes because it sounds like a form of prediction, but actually it is based on the occurrences of words in the training data only. It has nothing else to incorporate to its output, and it has no other need. It doesn’t have ideas or a need to express them. An LLM can’t build upon or meaningfully transform the works it copies, it’s only trick is mixing together enough data to make it hard for you to determine the sources. That can make it sometimes look original but the math is clear, it is always trying to maximize the similarity to the training data, if you consider choosing the “most likely” word at every step to be a metric of similarity. Humans are generally not trying to maximize their works’ similarity to other peoples’ works. So when a creator is inspired by another creator’s work, we don’t automatically treat that as an infringement.

    But even though comparing human behavior to LLM behavior is wrong, I’ll give you an example to consider. Imagine that you write a story “in the style of GRRM”. GRRM reads this and thinks that some of the similarities are a violation of his copyright so he sues you. So far it hasn’t been determined that you’ve done something wrong. But you go to court and say the following:

    • You pirated the entirety of GRRM’s works.
    • You studied them only to gain the ability to replicate patterns in your own work. You have no other user for them, not even personal satisfaction gained from reading them.
    • You clarify that replicating the patterns is achieved by literally choosing your every word to be the one that you determined GRRM would most likely use next.
    • And just to be clear you don’t who GRRM is or what he talks like. Your understanding of what word he would most likely use is based solely on the pirated works.
    • You had no original input of your own.

    How do you think the courts would view any similarities between your works? You basically confessed that anything that looks like a copy is definitely a copy. Are these characters with similar names and descriptions to GRRM’s characters just a coincidence? Of course not, you just explained that you chose those names specifically because they appear in GRRM’s works.