On February 6, Stable Diffusion, a pioneer in AI Art, was sued by Getty Images, the biggest provider of images and videos in the world, for "flagrant infringement of Getty Images' intellectual property." Stability AI allegedly copied more than 12 million pictures from Getty's database without their consent, violating both copyright and trademark laws, according to Getty.
The training data for AI Art tools, which need a lot of pictures and artwork, is frequently taken from websites without the owner's permission. When artists perceived that their gray matter was being openly stolen, this technology set off a surge of emotional responses. The use of users' social media profile pictures as AI training data will cause privacy invasions for even users. One day, "false" pictures of themselves without any awareness will turn up in an unknown location.
What is AI Art Generators’ data source?
One of the biggest frustrations of text-to-image generation AI models is that they feel like a black box. We are aware that they were trained using web-based images, but which ones? The obvious question for a photographer or artist is whether their work was used to train the AI model, but this is remarkably challenging to say.
Sometimes, the data isn’t available at all: OpenAI has said it’s trained DALL-E 2 on hundreds of millions of captioned images but hasn’t released the proprietary data. In comparison, the Stable Diffusion team has been open and honest about how their model is developed. Stable Diffusion's popularity has skyrocketed since it was made available to the general public last week, thanks in large part to its free and permissive licensing, which is already included in the newest Midjourney beta, NightCafe, and Stability AI's own DreamStudio app, in addition to being usable on your own computer.
LAION, a nonprofit whose compute time was largely funded by Stable Diffusion’s owner, Stability AI, is one the most popular source of data sets that AI Art generators like: DALL-E 2, Stable Diffusion, etc use to train their algorithm. All of LAION’s image datasets are built off of Common Crawl, a nonprofit that scrapes billions of webpages monthly and releases them as massive datasets. LAION collected all HTML image tags that had alt-text attributes, classified the resulting 5 billion image-pairs based on their language, and then filtered the results into separate datasets using their resolution, a predicted likelihood of having a watermark, and their predicted “aesthetic” score (i.e. subjective visual quality).
Why is this a problem?
Although there are private policies regarding the General Data Protection Regulation (GDPR), data on LAION's data filtering according to these regulations has not been found yet. This raises doubts about the transparency and ethicality of the datasets provided by LAION - which is used to train many AI Art generators.
Although there has been much debate regarding the law's effectiveness, there is no denying that it is one of the strictest data protection regulations in the world. In contrast to HIPAA or other US data protection laws, GDPR mandates that businesses use the highest privacy settings possible by default and places restrictions on data usage to six categories, including permission provided, a vital interest, and a legal requirement.
The case of Stable Diffusion
When Stable Diffusion was introduced in 2022, it gained popularity online rapidly. It set itself apart from other A.I. text-to-image tools like DALL-E 2 and Midjourney by being open-source and being free. However, as soon as the program gained popularity, users started looking at its source code.
We know the captioned pictures used for Stable Diffusion, along with the dataset from LAION, were scraped from the web, but where? The majority of the pictures, or about 47% of them, came from just 100 different websites, with Pinterest hosting the most. 8.5% of the entire dataset, or more than a million images, are taken directly from pinimg.com on Pinterest.
More than 15,000 pictures from Getty were discovered in a recent analysis of 12 million images from the Stable Diffusion dataset. The Getty Images watermark was frequently distorted in the images produced by the A.I. picture generator. However, Getty Image has charged Stable Diffusion with "illegally copying and processing millions of images covered by copyright" in order to develop its signature software.
This legal dispute has created significant issues regarding the ethical use of picture copyright by Stable Diffusion and all AI art generators.
Conclusion
One of the key challenges in the field of responsible AI is developing robust and comprehensive frameworks for ethical decision-making and risk management in AI development and deployment. This includes considerations such as fairness, privacy, transparency, accountability, and bias mitigation.
To address these challenges, a number of organizations, including governments, NGOs, and industry groups, have developed ethical principles and guidelines for AI development and deployment. These include frameworks such as the IEEE Global Initiative for Ethical Considerations in AI and Autonomous Systems, the European Commission's Ethics Guidelines for Trustworthy AI, and the AI Ethics Guidelines developed by the World Economic Forum.
However, despite these efforts, there is still much work to be done to ensure that AI is developed and deployed in a responsible and ethical manner. As AI continues to advance and become more integrated into our lives, it is important that we continue to focus on the ethical and societal implications of AI and work to address these challenges in a comprehensive and collaborative manner.
Pixta AI and the vision of responsible AI
With the belief that a better future would be created by a better & more responsible AI development. PIXTA AI is proud to empower AI developments at many industry leaders in automotive, manufacturing, retail, banking, research institutes, etc. We bring added values & strong partnership to our clients, by the full commitment of compliance & highest quality assurance. If you want to get to know more knowledge or contribute to responsible AI, don't hesitate to contact us right today!
Comments