Ask the expert: how to auto-label your documents and data.

Imagine your organisation was contacted by a national auditor who wanted to do a spot check of business practices. The auditor would begin by requesting records that demonstrate you are complying with regulations. However, finding that data, and being able to bring it up when you need, can be very challenging.

Modern organisations collect and manage more data than ever before. To make sense of these vast swathes of digital records, a labeling policy tags various kinds of documents and applies rules to them. Labelling provides plenty of benefits:

It helps you quickly find various kinds of records fast
It means you apply the right access permissions to different kinds of file (e.g., your customers’ credit card details will be labelled as ‘sensitive’)
It stops you from accidentally deleting records you need to hold onto for compliance reasons
Helps you comply with various rules and regulations such as GDPR

If you are collecting a lot of records, it can be difficult to label them all manually in a consistent way. And that’s where auto-labelling helps. If your organisation uses Microsoft products, the following guide will walk you through how to auto-label your documents and data.

Auto-label your documents and data in the cloud

Auto-labeling is now native to Microsoft SharePoint Online, OneDrive, and Exchange online. This means that files (data at rest) and emails (data in transit) that match defined sensitive information categories are automatically detected and labeled. Microsoft has over 200 out-of-the-box sensitive info types that administrators can make use of and customize to fit their data protection needs.

So, how do you go about setting up auto-labelling?

Define your target scope

This can be all SharePoint Sites, OneDrive accounts and all email users. It is advisable go down this route when you are confident about the sensitive information type locations for your labels. An alternative – especially where you would like to test it out or simulate your auto-labeling scenario – is to target a subset of SharePoint sites or user accounts.

Running simulation in your production environment

Once you have defined your labels and identified your target scope, we’d recommend doing a simulation. That will provide insights on which data would be classified. Simulation is non-intrusive and often needs to run for no more than a couple of hours, depending on the size of the target content.

Continued iteration and experimenting

In some cases, you might need to re-define your labels, protection settings for identified data, and your scope, in order to get the best experience for your environment. This process can be repeated several times over to gain confidence.

Enforcing auto-labeling

After validating simulation analysis and results, auto labeling can be enforced based on your model. FITTS would recommend a ring-based deployment approach for best results. Files at rest in Office apps and in SharePoint sites are labeled automatically and so are any new files added to those locations. Data in transit is also scanned for any sensitive information and labels are automatically applied.

Free consultation: Get a free 2-hour workshop on automating data security

Creating Custom labels

Microsoft has 200+ inbuilt sensitive information types available for common usage scenarios. These pattern-based classifiers can detect information like credit card data, personal identifiable data, national identity numbers for some countries, social security numbers, etc.

In some scenarios, you might need to identify and protect data that isn’t among Microsoft’s pre-configured options. You’ll therefore need to customize or define your own sensitive information type(s). These might include National ID numbers and Employee ID numbers, among other bespoke identifiers within your organization.

To do this, you must define the following characteristics of a sensitive information type in the Microsoft 365 Compliance Portal:

Name: How you will refer to your custom sensitive information type
Description: The kind of information you are scanning for
Pattern: This defines how sensitive information is detected. It consists of:

- Primary element: This is the main element and can be a regular expression, a keyword list, keyword dictionary or a function. Regular expressions are commonly used (this site can be helpful when creating, modifying and testing them). For example, a primary element might be eight digits, where the first two digits are constant. Sensitive information types can have several patterns.
- Supporting elements: These help to increase the confidence of the match. For example, ‘National ID’ or ‘Identity Card Number’ for the case of detecting national identity card number information type.
- Confidence level: A choice of high, medium or low confidence indicates how much of the supporting evidence was detected. A higher confidence level would be given when more supporting evidence is found in matched content.
- Proximity: Which would be the number of characters between the primary element and the supporting elements.

Publishing the labels

Once your sensitivity labels are defined and configured, they are then published by applying a label policy. The label policy is deployed to the intended users and/or groups together with the protection settings to use, depending on your organization’s labeling taxonomy. Labels are re-usable and can be applied to different target groups.

Microsoft’s Information Protection labels, as defined in your organization, would align to your data classification structure. That defines controls for protecting your data and how your information is handled both internally and externally. Examples of such controls would be:

Applying content markings on documents, such as a watermark, header, or footer
Encrypting content based on the label applied
Allowing specific teams or users to view, edit or print documents based on an applied label
Sending end-user awareness notifications or pop-ups to familiarize them with proper labeling

Auto-label your documents and data with a free workshop

At FITTS, we can help you begin using auto-labelling to ensure you stay up to date with all regulatory, compliance, and internal rules that apply at your organisation. To get you started with auto-labelling, we’re offering a free, 2-hour consultation workshop to help you begin automating your data security.

Contact us today to get started.

Ben Kasema

Ben Kasema is a strategic and technically-savvy Head of Technology with over a decade of experience driving transformative IT initiatives across diverse industries. He excels at crafting IT strategies that align with business objectives and drive consistent growth, with a proven track record of delivering innovative and high-value solutions. As the Head of Technology at FITTS, he spearheads strategic technical leadership and direction for the organisation's managed and professional services division, with a focus on cloud-based solutions. Benjamin has worked with businesses of all sizes across Africa and the UK and is highly skilled in infrastructure and security management, cloud migration, and business process optimisation. He is passionate about driving digital innovation, community development, building professional relationships, and technology education and training.

Ask the expert: how to auto-label your documents and data.

Auto-label your documents and data in the cloud

Creating Custom labels

Publishing the labels

Auto-label your documents and data with a free workshop

Ben Kasema

ABOUT US

LATEST POSTS

USEFUL LINKS

CONTACT US

ABOUT US

LATEST POSTS

USEFUL LINKS

OUR CONTACTS