Ensuring data privacy and protecting intellectual property (IP) becomes a major concern when dealing with large public language models (LLMs). The harsh truth about public LLMs and the companies behind them, such as OpenAI, is that they have copied or, should I say, stolen the entire world's data from people who have created it for decades. Can you trust these LLM companies? I don't trust them. You shall never give them your personal information, and keep your enterprise data away from these LLMs.
Here are some thoughts on how we can protect our enterprise data and IP from LLMs while using GenAI for our purpose:
a. No PII or Sensitive Data in Prompts: Sanitize inputs to remove personally identifiable information (PII) or confidential data before sending to any LLM.
b. Prompt Redaction and Logging
c. Compliance with Regulations
a. Model Licensing Awareness
Understand licensing terms for both open-source and commercial LLMs. Some restrict use in commercial or derivative applications.
b. Avoid Data Leakage to Model Providers
c. Prevent IP Exposure
The best thing you can do is to train your developers, analysts, and business users on safe prompt engineering.