The internet we know today is not the internet we knew 20-25 years ago. It is a much, much larger space with so much data and knowledge that it would be impossible for any of us to actually read everything it has to offer in our lifetimes. But all this data and knowledge makes it perfect to train AI with.
In many ways, training AI is similar to training a person. You feed them knowledge and based on that knowledge, the better they will be at performing their tasks.
Now, not everyone might be comfortable with the idea of AI scraping data from their websites to train itself with, which is why Google has since announced that they will now be giving web publishers the choice as to whether or not they will allow data from their websites to be used to train AI such as Google Bard.
“Today we’re announcing Google-Extended, a new control that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products. By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.”
What’s interesting about this is that back in July, a report from Gizmodo revealed an update to Google’s privacy policy in which the company basically said that they reserve the right to scrape all publicly available information to build its AI tools. Not too long after, Google then made a public call to create a standard that would give publishers a choice and more control over whether or not their data could be used for AI training.
Since a new standard has yet to be created, this new Google-Extended tool will be Google’s answer to the problem for the time-being.
Source: Google