DataMapper: Configuration to minimize false positives

"How to config settings in DataMapper to minimize too many 'false positives'?"


Answer: 

High Risk definition: To classify a document as a High risk, DataMapper looks for high risk numbers. For more accurate detection of numbers, it is necessary to choose the country format(s), based on which DataMapper will be able to recognize the number. 


Example: Filtering on numbers from within DataMapper can be done from filters on the Risk Documents page. We are scanning for: passport numbers (DK & UK), NINO numbers (UK), drivers licenses, CPR and bank card info (credit card number).

From here you can review the results of the file with Preview or Go to folder.   

Preview will pop-up as shown below:


Enhancing Accuracy of High-Risk Findings with DataMapper 

At Safe Online, we understand that companies work with diverse files and formats in many different countries. To improve accuracy in identifying high-risk findings and enhance the user experience, we offer a customized solution. Here is how the process works: 

  1. Initial Rule-Based Scanning: 
  • We begin by performing an initial scan using our rule-based models. These models are designed to identify sensitive data by recognizing specific patterns and keywords relevant to high-risk data, such as NINO numbers in the UK. 
  1. Review and Analysis: 
  • Once the initial scan is complete, we collaborate with the customer to review the results. This step involves identifying any false positives (incorrectly flagged data) and any sensitive data that may have been missed. 
  1. Customized Training with LLM: 
  • Using the data from the review session, we perform additional training on our Large Language Model (LLM) with the customer’s specific data. This customized training helps our LLM to better understand the context and reduce false positives in future scans. 
  1. Re-Scanning for Improved Accuracy: 
  • After the LLM has been trained with the customized data, we re-scan the data. This step ensures that the findings are more accurate, with a significant reduction in false positives and missed sensitive data. 

Benefits of This Process: 

  • Increased Accuracy: Tailored scanning and training lead to more precise identification of high-risk data. 
  • Reduced False Positives: Enhanced LLM processing minimizes incorrect flags, saving time and resources. 
  • Improved User Experience: More accurate results mean a smoother, more efficient data management process for users. 
  • Continuous Improvement: The system becomes increasingly accurate over time as more data is processed and the LLM is further trained. 

By leveraging the power of DataMapper and our advanced AIM engine, Safe Online ensures that your company's sensitive data is identified accurately and efficiently, enhancing both security and user satisfaction. 



Do you have any more questions? Please reach out so we can help! Write us at support@safeonline.dk or simply use the chat button in the below left corner.

Still need help? Contact Us Contact Us