Discussions

View Only

Back to discussions

Expand all | Collapse all

AI tools that use data prompts and user data to train models

1. AI tools that use data prompts and user data to train models

1 Recommend
Nick Marchese
Posted 01-29-2024 09:49 AM

Reply Reply Privately
Hey all,

Happy Monday! I've been trying to gauge the community response about this "stance" I'm considering taking when it comes to recommending/purchasing AI tools that openly say that they are using your data and prompts (while de-identified) to train and improve their machine learning data models. The few that I've come across lately are the TurnItIn AI Detection tool and the KnowBe4 PhishER/PhishML tool. ChatGPT also says they train the models based on your inputs, but they give a very simple opt-out solution while the former two I mentioned do not. In contract, there are tools like Adobe CC Firefly that explicitly say they do not on your content.

So, here are the questions I have for the community:

Is anyone else as concerned about this as I am? Purchasing or endorsing services that are actively training on user data, employees and students-alike.

Is anyone "taking a stand" on this front and willing to chat?

Should I just not be concerned because I can't stop this tidal wive from coming?

Thanks for your thoughts!

#TeachingandLearning
#CybersafetyandDataSecurity

------------------------------
Nick Marchese
Emma Willard School
Troy NY
------------------------------
2. RE: AI tools that use data prompts and user data to train models

1 Recommend
Brent Halsey
Posted 01-30-2024 07:41 AM

Reply Reply Privately
Hi Nick,

We have a similar stance and lack clear guardrails for our community. As we are engaged, we have advised against using tools that train on our data. We are working towards an AI governance group comprised of various faculty and staff to help form guardrails going forward, but we see the same tidal wave coming, and keeping up with what each tool is doing may be a losing battle. I'd be interested in any ongoing conversations about this topic.

Thank you,

------------------------------
Brent Halsey
Columbus Academy
Gahanna OH
------------------------------

Original Message
3. RE: AI tools that use data prompts and user data to train models

1 Recommend
Stacy Hawthorne
Posted 01-31-2024 01:05 PM

Reply Reply Privately
This is a great conversation and one that may eventually be decided by the FTC. Newly proposed changes to privacy laws would actually prohibit edtech software providers from using data collected for one purpose being used to train other, even similar, products. You can read more at FTC Proposes Strengthening Children's Privacy Rule to Further Limit Companies' Ability to Monetize Children's Data or comment on the proposed changes before March 11, 2024.

------------------------------
Stacy Hawthorne, Ed.D., CETL
Chief Academic Officer
Learn21
------------------------------

Original Message
4. RE: AI tools that use data prompts and user data to train models

1 Recommend
Vinnie Vrotny
Posted 01-30-2024 09:49 AM

Reply Reply Privately
Nick -

You are spot on and are being an excellent steward for your school.

As for TurnitIn's policy, this has always been their stated case, that they use the submissions to train their tools, both the integrity/plagarism detector and for their supposed AI detections tool.

Data governance is something we should all be concerned with with increasing requirements from many states and via the use of AI. It is better to get ahead of the curve now rather than playing catch up later.

------------------------------
Vinnie Vrotny
The Kinkaid School
Houston TX
vinnie.vrotny@kinkaid.org
------------------------------

Original Message
5. RE: AI tools that use data prompts and user data to train models

0 Recommend
Nick Marchese
Posted 01-30-2024 11:02 AM

Reply Reply Privately
Thank you @Brent Halsey and @Vinnie Vrotny! I'm so glad I'm not alone! I was getting worried! Do either of you have anything formal (or even casual) written up about this that you would be willing to share or collaborate on?

------------------------------
Nick Marchese
Emma Willard School
Troy NY
------------------------------

Original Message
6. RE: AI tools that use data prompts and user data to train models

2 Recommend
Hudson Harper
Posted 01-30-2024 07:04 PM
Edited by Hudson Harper 01-30-2024 07:06 PM
| view attached

Reply Reply Privately
Hey @Nick Marchese, I agree with @Brent Halsey and @Vinnie Vrotny that data and AI governance are super important. There are still a lot of unintended and frankly unknown consequences to allowing large language models to be trained on sensitive or sensitive adjacent data.

We have a formal AI policy that applies to the whole school, which I'm attaching to this post. While I think it addresses what you're mentioning, Nick, I don't think it fully sets out all the guidelines we've internally set for ourselves when it comes to using AI for analyzing or manipulating data.

In our internal PD, what we've discussed are the following:

1) Any AI tools have to pass the same vetting we use for any of our tools. As a recent leak of user credentials via ChatGPT indicates, our AI tools need to meet the same basic cybersecurity guidelines as all of our other systems/platforms. This means ensuring things like SOC2, GDPR, CCPA, etc. compliance is met.

2) For now, our internal policy is to not use AI tools that train on our data or allow for things like human review/reinforcement when it comes to sensitive data. This includes de-identified/redacted data as there are examples of data being pieced back together by LLMs with enough context. We haven't gotten this far institutionally, but using synthetic data is a way to get around this issue and is perhaps something to look for in the way companies are training on data.

3) Again, we're not at the level of deciding to invest in enterprise features like OpenAIs Team or Enterprise plan (nor do we necessarily have the budget for it), but in my opinion that feels like a baseline before even considering entering sensitive data into an AI product. Either that or only using AI at the API level, which is not going to be accessible to most.

4) We still have ethical dilemmas around allowing models to train on student work, because of concerns that we might be giving away our students' voices, ideas, and creativity to an AI model. We also don't want to inadvertently reinforce biases present in these models.

5) There's been some discussion of open-source LLMs like LLama or Mistral, but there are questions about reliability, bias, deployment, logging, access control, etc.

We'll continue to iterate, and we're very much interested in what other schools are doing, so I'd love to collaborate with you/anyone else in the ATLIS community on this issue.

------------------------------
Hudson Harper
The Downtown School
Seattle WA
------------------------------

Attachment(s)

DTS AI Policy_9.22.23.pdf 51 KB 1 version

Original Message

Discussions

AI tools that use data prompts and user data to train models

Nick Marchese01-29-2024 09:49 AM

Brent Halsey01-30-2024 07:41 AM

Stacy Hawthorne01-31-2024 01:05 PM

Vinnie Vrotny01-30-2024 09:49 AM

Nick Marchese01-30-2024 11:02 AM

Hudson Harper01-30-2024 07:04 PM

1. AI tools that use data prompts and user data to train models

2. RE: AI tools that use data prompts and user data to train models

3. RE: AI tools that use data prompts and user data to train models

4. RE: AI tools that use data prompts and user data to train models

5. RE: AI tools that use data prompts and user data to train models

6. RE: AI tools that use data prompts and user data to train models

Related Content

RE: AI tools that use data prompts and user data to train models

New AI Guide for Independent School Trustees

Custom AI Generated Welcome Videos?!

Data Retention Policies for Departed Faculty/Staff

IT Budgeting Models

Contact Us