What you need to know about the current state of user data protection in AI tools.
Generative AI can seem like literal magic at work. But magic is mysterious and one of the biggest mysteries users are concerned with right now is “What happens to the data I put into these tools? Is it protected?” The short answer is “Yes” but it wouldn’t be absurd to believe otherwise.
The common thinking now is, “If it’s on the internet, it’s there forever.” So (follow me here) if the models powering AI are largely trained on data from the internet…and you enter data to these tools that are on the internet…it’s not farfetched to assume that your data now belongs to the internet and could be used to train the very models you’re employing (or used in some other capacity that you don’t want.)
In January, a corporate lawyer at Amazon sought to prevent that hypothetical cycle by warning employees against entering company information to ChatGPT. The lawyer suspected that Amazon’s information was being used to train the GPT model because certain outputs closely matched existing Amazon materials (however it’s not clear if those materials were already publicly available online.) Other large companies went on to block ChatGPT from employee use to avoid this potential pitfall and in March, OpenAI changed its policy around having user inputs train future model iterations.
Now, user data protection is less of a concern in AI now compared to a few months ago because of developments like that. But as the tools grow increasingly secure, companies and general users should still do their due diligence to ensure that their tool of choice is secure, according to Jasper’s Director of Information Security John Bullough. In fact, he and his team break this once-puzzling challenge down into pretty simple principles in their work.
“Security is relatively the same no matter what you're doing,” said Bullough. “I've spent most of my career in banking and everyone sees those entities as having to be highly secure because we're dealing with money. And securing a financial institution involves the same principles that we apply when our customers enter their data into Jasper, for example.”
Data privacy has been a top-tier concern for internet users for a while and the advent of generative AI shines a new light on that subject. Thankfully, Bullough took a break from guiding Jasper’s data protection team to lend some valuable insight on the current state of security in AI as a whole. He also gave advice on what businesses and general users can do to better protect themselves when using AI tools and the role governments or other regulatory bodies might play in shaping the future of security in AI.
I've been at Jasper for a year and in security for almost 20 years in various roles. At Jasper, I lead the information security team and that covers a lot of areas. We work with the engineering team to develop a secure product. We cover compliance and just got our SOC 2 Type 1 report a few weeks ago, which is a pretty big achievement. We also cover privacy and we've been implementing tighter controls to be more holistically compliant with General Data Protection Regulation and the California Consumer Privacy Act. There are always new regulations to follow and we work on that stuff too.
I feel like AI and Jasper specifically are kind of unique and in that we have all the security concerns of a regular SaaS company. So concerns around: access control, how we're storing data, how we're transmitting data, who we share with, and privacy concerns around what we collect and what we do with data once we collect it.
We also have other overarching concerns about LLMs. They’re inherently a bit of a black box. You put data in and it spits out an answer out but it’s hard to understand exactly what's happening, even for very hardcore data scientists. The concerns that I hear most from customers are around the data going into the model. There's been some research around the data used to train a model whether you get that data out again. As far as I know, there haven't been any large scale demonstrations of that. And we feel like because of that, the models are secure and we can deal with them well.
But ultimately, I think the biggest concerns right now are some of the unknowns. We feel like we have a good handle on how to protect this stuff. But in any new field, there are always going to be some questions like, are we doing everything we should and are the protections that we've put in place enough?
Well with Jasper for example, some time ago we entered agreements with all the third-party models we use where they cannot train on the data our customers input. Nothing that goes to the platform will ever be used to train any of those models, whether that be a customer company’s documentation to enhance their brand’s voice or a custom input typed directly into Jasper.
And that's partially to protect the data itself but also to make customers feel comfortable using AI since it’s such a new technology. It’s the biggest concern we hear from customers and a major one for us as we do more research in this area, so we’re addressing it right away.
“Having some smart regulation around what we're doing is wise and will also provide a level of legitimacy and comfort for people knowing that there are protections in place.”
I think long term we'll see something similar to the GDPR: a body of regulation that guides us in the specific controls we need around AI. While AI has seen tons of interest there are also some people who are a little bit wary of jumping into the AI deep end. Having some smart regulation around what we're doing is wise and will also provide a level of legitimacy and comfort for people knowing that there are protections in place.
If you have a good security program and you want to be GDPR compliant, sometimes there's red tape you have to follow that might require changing some processes and very specific actions. But you’re not wholesale changing how you do business. So hopefully, if we're approaching security correctly, even when these hypothetical regulations around AI come in, we won't have to wholesale change what we're doing — we'll just formalize some processes or comply with some specific requirements. For really small startups trying to get into the AI space, that's where it might impact innovation the most in the market, in the short term. But overall, I think it would be good and things wouldn't have to change significantly.
The thing I talk to businesses about a lot is choosing an AI partner that is ready to help them protect data.
There are a lot of AI companies now. And whether or not they can protect user data correctly is unknown. I'm sure some of them can do a great job. And some of them, maybe not. So the best thing is to find a partner that can help you properly secure any data that goes into the AI. You want to make sure the platform has a good security program and controls in place to protect any data that you might input.
A few things. One is, be careful what you put in there, especially if you don't have an agreement with a company and you might be looking to find a free version of AI, for example. We tell people not to enter personal private information, or PPI, into Jasper. While we have all these agreements and protections to safeguard that data, I think the best protection is just not putting PPI there. We want people to feel comfortable putting proprietary information from their company there but PPI is too sensitive.
Generally speaking, once you enter the data, it's there forever. A company can kind of do whatever with your data if you don't have an agreement with them. If you're just asking AI random questions or you need help writing a speech it's probably fine, but just be careful.
And this goes with everything on the internet, not just AI, but you should also have some healthy skepticism about what comes out of AI. Facts are still kind of tricky for AI so don't just believe everything that comes out of it, especially if you're asking for specific factual information. Go verify that.
Sometimes I see people talk about this in the same way people spoke about the early internet like, “I don't trust it, so I'm not going anywhere near it.” I feel like that's too far. That's not a reason to avoid it. Have healthy skepticism but use it in a way where you can protect yourself and your information.
This technology is best when it functions as an assistant versus a replacement for your brain. It's much better as something to help you create better instead of creating for you, which helps get around some of the big pitfalls people are concerned with like copyright issues. If you’re altering outputs a bit and using them as a starting point for your own creation, you'll protect yourself in the day-to-day use of widely available AI tools.
I also don't buy the idea that AI is going to take over the world. These large language models are incredible but when you use them, you also quickly learn what their limitations are. To say that there's any danger of this type of AI going beyond its bounds or taking over the world is just kind of silly. I like the jokes about it, but that's a bit unrealistic.
Shopping Jasper for a large business? Update your input to 100+ employees to meet with our team. Please use business email to meet with our team.