Protecting Your Content From AI: A Contrarian View

protecting your content from AI

There has been a flurry of panicked posts about protecting your content from AI. There have been lawsuits, probes, and new software that prevents sites like ChatGPT from accessing your content from being absorbed into large language models. Within 14 days of the availability of code that can prevent AI data scraping, nearly 20% of the top 1,000 websites in the world began using it.

What should you and your business do? Should you keep AI away?

My advice today might seem counterintuitive. Maybe when AI comes to suck up your content, you should say, “suck away.” Actually, we need to come up with a better phrase than that. But you know what I mean.

Let’s pause, take a deep breath, and rationally examine the issue of protecting your content from AI in the context of your future business success.

Acknowledging complexity

100 percent human contentFirst, I must acknowledge that this is an insanely complex and evolving issue. The legal, ethical, and economic considerations for large enterprises, newspapers, movie studios, and other media companies are unique.

When it comes to protecting your content from AI, any individual artist, author, or other creator may disagree with me, and I honor their right to make their own decisions.

My post today specifically aims at content creators, entrepreneurs, and businesses trying to rise above the noise and achieve business benefits from their content marketing.

The bottom line is, I believe that more business benefits will accrue to you by NOT protecting your content from AI, even if it is copyrighted. To understand why, let’s begin by reviewing an important content marketing philosophy …

Unleash your content

Here is a fundamental truth: The economic value of content that is not seen and shared is zero.

Chances are you’re working hard to create amazing content. You post on social media and engage with fans to build your audience. All good. Now, your job is to get that content to move through your audience and beyond, and that means focusing on content transmission (This strategy was the subject of my book The Content Code).

I’ve been against gated content, and the ridiculous notion that you shouldn’t publish on “rented land.” Of course you should. My view is, publish your content everywhere your audience could possibly find it, consume it, and share it! Unleash your content!

The first consideration: If you protect your content from AI — a technology that is becoming the foundation of search and content discovery — and your competitors don’t, will you be better off? Probably not.

An old dilemma

The argument about protecting your content from AI is strangely familiar. This is the same debate we had in the early days of content marketing — “What??? You want me to give away my content and best ideas for free?

Yes, we all had to do that because if we didn’t provide free and helpful content, the competitor down the street would. Their content would be highlighted by search, discovered, and shared … and we would lose.

Publishing free content was a radical idea. Before the internet, many businesses made money from their protected content. Research firms built profitable businesses by selling original reports for hundreds of thousands of dollars. That business model is nearly obsolete now. For better or for worse, information flows freely on the web. Once you publish anything, anywhere, it will probably find its way to the open waters of the web.

Let’s get specific about what’s happening to copyrighted content today, with or without AI. I put tremendous effort into my books, and making money from a business book is not easy! Every month, I find some nefarious group that is selling illegally digitized versions of my books. There are even sites out there selling my blog posts as aids in writing student term papers.

For a while, I tried to fight back. But it’s like that arcade game Whac-A-Mole. Every time I try to take a whack, another illegal site pops up somewhere else. If people really want to access and spread your content, there is no recourse, there is no stopping it.

So, even if you create a wall around your content, it will probably seep into the AI machine anyway. If you use software defense against AI, what would keep somebody from cutting and pasting it manually into an LLM?

Let’s put the issue of attribution aside for a moment. If you’re not freaked out by Google using your content for free, why are you freaked out about AI using it?

My first business from AI

A few months ago, I reported getting my first consulting contract from ChatGPT.

A new client found me by searching for “top 10 marketing experts.” I tried this myself, and the list would shuffle on each query, but I was usually in the top 10. Friends tried this in Europe, and the same names came up.

Let’s be honest. Am I one of the top 10 marketing experts in the world? No, I’m not. I could easily name 10 people in my circle of immediate friends who are smarter than me!

How did I make that AI-generated list? It’s the same way I show up on “best-of” blog lists and Google search results — I’ve had the tenacity and courage to put my content into the world with fierce consistency for 15 years.

AI is the future of search — it’s called Search Generative Experience (SGE). It’s already incorporated into Google.

My new client found me because I am present on the web, and now I’m present on AI. I believe that will serve me well as search evolves.

The cost of invisibility

Beyond revenue, there is an implication for impact and influence.

One of the organizations fighting AI content practices is The New York Times. This news organization is arguably the newspaper of record in America and one of the most important news sources in the world. As more students, researchers, and students turn to ChatGPT and other platforms for knowledge and research, is it in the best interest of The New York Times to be unaccounted for?

If you’re protecting your content from AI, you’re no longer part of the public conversation, at least as it is represented on ChatGPT and other AI platforms. Your view is invisible. What do you risk when you and your business are unaccounted for?

My smart friend Aleksandra Pimenides recently commented in our RISE marketing community:

“AI is an important source of knowledge transmission. Teachers take something and pass it on to their students. Libraries have books for people to read and learn. Likewise, LLMs act as an intermediary of transmission. Do Newton’s descendants get paid every time a student is taught the principle of gravity? Do libraries get fined when people go there to read and learn about subjects for free? To what extent should information and knowledge be monetized? Maybe there’s a distinction to be made between knowledge and information?”

A view of the true risk

I think much of the anxiety on this subject comes from an image of some AI bot cutting and pasting your unique content without attribution. That’s not exactly how it works.

Here is an explanation from Benedict Evans, which appeared in his wonderful newsletter (edited slightly for style)

“LLMs are not databases. They deduce or infer patterns in language by seeing vast quantities of text created by people — we write things that contain logic and structure, and LLMs look at that and infer patterns from it, but they don’t keep it. So ChatGPT might have looked at thousands of stories from The New York Times, but it hasn’t kept them. Moreover, those stories themselves are just a fraction of a fraction of a percent of all the training data. The purpose is not for the LLM to know the content of any given story or any given novel — the purpose is for it to see the patterns in the output of collective human intelligence.

“This is not Napster. OpenAI hasn’t ‘pirated’ your book or your story and it isn’t handing it out for free. In Tim O’Reilly’s great phrase, data isn’t oil; data is sand. It’s only valuable in the aggregate of billions and your novel is just one grain of dust in the Great Pyramid. This isn’t supposed to be an oracle or a database. It’s supposed to be inferring ‘intelligence’ from seeing as much of how people talk (as a proxy for how they think) as possible.

“If this is, at a minimum, a foundational new technology of the next decade, and it relies on all of us collectively acting as mechanical turks to feed it, do we all get paid, or do we collectively withdraw? It seems somehow unsatisfactory to argue that “this is worth a trillion dollars, and relies on using your work, but your own individual work is only 0.0001% so you get nothing.” Is it adequate or even correct to call this ‘fair use?’ Does it matter, in either direction? Do we change our definition of fair use?”

In the United States, copyright rights are limited by the doctrine of “fair use,” under which certain uses of copyrighted material for criticism, commentary, news reporting, teaching, scholarship, or research may be considered fair.

As an example, I took a snippet from Benedict’s copyrighted newsletter, provided proper attribution, and used it today to teach. That’s fair use.

Here’s the problem with AI. Think of your copyrighted content as a lovely cake that you baked. It is your original and distinctive work. But inside AI, your work isn’t a cake. It’s an ingredient put into a blender to make a new cake. What’s fair use in that environment?

I dabble in watercolor painting. Seeking credit in an AI model is similar to the maker of my paints wanting attribution credit for this painting:

Protecting your content from AI watercolor example

Even if I used one unique type of paint patented by a supplier, would I give them credit for the painting? No. I actually sold this painting. Should I give part of the revenue to Arches, the company who supplied the paper? I literally could not have made this without the paper and paint yet it is my original work, period.

Attribution

“Originality is nothing but judicious imitation.” – Voltaire

I think most of the “protecting your content from AI” conversation would disappear if we were assured we get credit for our work, in the case where credit might be important — like a meaningful, original idea. After all, we’re OK with Google scraping our content if we get credit for it in search results, right?

Let’s go back to the current state of the internet for a reality check.

In 2014, I wrote one of the most famous blog posts in marketing history, “Content Shock.” This is not idle bragging. The numbers back it up. “Content Shock” — a phrase I coined — has shown up in books, speeches, conferences, college classes, and millions of pieces of content. If you Google the term, there are 610 million results, like these:

protecting your content from AI example

Writing a bold post like this did its job. It helped establish thought leadership and provided thousands of links to my original article.

However.

I assure you that I have not received 610 million links back to my site! Even if I received a million links, that would mean I have attribution on just .002% of all references to my original idea.

Clearly, people are using and abusing my work without attribution. Does this mean I should block Google from accessing my post? Of course not.

As Tim O’Reilly said, data is sand that is only valuable when aggregated into something bigger. My blog post is a grain of sand in the content economy. If you want to be part of that economy, you must put pride aside.

No matter how protective I might feel about my intellectual property, it’s sand. And even if I am credited, who reads the footnotes?

In any event, I think the problem of attribution will go away. It’s already happening. There are academic AI sites and writing assistants that allow you to search with references. I use an AI-powered tool through BuzzSumo that creates writing briefs with legitimate and relevant references. Very helpful, and it leads me to smart new content I can quote with attribution.

The option to learn original sources for attribution will be a more common option across all platforms eventually.

Conclusion

Comparing how content works on the web today versus content integrated into LLMs and AI search allows us to make a rational conclusion to allow AI bots to scrape content from our sites, at least for most businesses. AI will be a major component of search going forward.

This is a complex and evolving issue, but I believe that regulations and best practices will favor creators who allow their content to be used in LLMs over time. The attribution problem will likely be solved on many platforms and regulations will adjust to a new framing of “fair use.”

Having an effective presence within AI models and AI search utilities could result in business benefits that outweigh the risks of misusing your copyrighted content.

I’ll say once again that this is a complex issue but for most businesses, I think it makes sense to be part of the machine.

Mark SchaeferMark Schaefer is the executive director of Schaefer Marketing Solutions. He is the author of some of the world’s bestselling marketing books and is an acclaimed keynote speaker, college educator, and business consultant. The Marketing Companion podcast is among the top business podcasts in the world. Contact Mark to have him speak at your company event or conference soon.

Follow Mark on TwitterLinkedInYouTube, and Instagram.

Illustration courtesy MidJourney

 

All posts

The Marketing Companion Podcast

Why not tune into the world’s most entertaining marketing podcast!

View details

Let's plot a strategy together

Want to solve big marketing problems for a little bit of money? Sign up for an hour of Mark’s time and put your business on the fast-track.

View details

Close