DLP, Data Classification and GoAnywhere

GoAnywhere UK & Ireland User Group 2021


Please see below our DLP, Data Classification and GoAnywhere webinar. This webinar outlines how data classification

software Titus and Clearswift DLP solution can be used alongside your GoAnywhere deployment to provide

improved protection from ransomware and malware, as well as enhanced data loss protection and control.


Video Transcript

Nick Hogg & Aaron Fox, HelpSystems

You can use the help systems, data classification, and adaptive DLP solutions alongside your go anywhere deployment to provide improved protection from ransomware and malware and enhanced data loss protection and compliance controls. My name's Nick Hogg, and I'm one of the directors of technical training here at help systems.

And in this session, I'm joined by Aaron Fox, who is our regional director for EMEA, for our data classification. In this session, Aaron is going to give you an overview of our tightest data classification solution, and then I'll cover our adaptive DLP solutions from Clearswift before rounding off with a demonstration of how you can use both of these alongside your GoAnywhere deployment in order to enhance your malware and data loss protection and compliance controls.

No, please feel free to use the questions panel, to ask any questions as we go through the session and we'll try and respond to them in the chat. I'll also be on the Q&A session later on to answer any further questions that you might have. As we saw in the earlier session, you can use the data classification solutions to understand what your sensitive data is and where it lives within your network.

Use the adaptive DLP solutions to govern how that data has been shared to ensure that it's only being shared with the correct people, both inside and outside of the organisation. And you can use our managed file transfer and our digital rights management solutions to protect that data, to ensure that when it's been shared appropriately, it's also being shared securely.

Now I'm going to handle over to Aaron who will talk to you about our Titus data classification? So. Great. Thanks, Nick. So let me just take you through an introduction of Titus product and data classification in general. Um, so just to start the, the concept of classifying things, isn't a new one. Uh, when you think about how shipping companies work and actually when you think about how you label the boxes, when you move, um, You know the concept of putting both the, the readable labels on the side of a box.

So things are fragile hold this way up, or even things like kitchen or Aaron’s room, whatever it might be, um, helps to inform you or the shipping company as to how you should treat that box. Um, at the same time, what they also do is put things like barcodes on a box, which means that that can be scanned and data can be read to understand, uh, also what should happen to that box and where it should go.

And really the concept of data classification is exactly the same. So what we do is apply visual markings and metadata onto documents, which are obviously the equivalent of the labels and the barcodes. And that helps both the users that interact with the data and the downstream technologies to understand what the data is and how it should be treated.

Now, just to kind of elaborate a bit on the value of data classification and understand where it fits in the kind of, why did data security product set. What we've done here is broken down the data life cycle into six phases. And so that's just so that we can understand in a more granular fashion, what happens to data from the point is created.

And actually if you take a step back from this and just think about how you and users interact with data on a day-to-day basis, whether that sharing a document with somebody who might edit it and send it on to somebody else or perhaps you're looking to start moving data into the cloud. Each of these actions would expose that data to a very specific risk.

And so what we do is by classifying the data, the point is created to understand, first of all, how sensitive it is. Um, so things like, is it internal public or confidential? But also what's the context? So is this a financial document? Is it illegal file? Does it contain personal information? And what that means is by applying those labels at the point that they just created, we can adequately apply handling rules throughout the rest of the data lifecycle to ensure it's properly protected.

So more specifically, how does our product work? So again, what we do is identify the sensitivity of the data by applying labels such as, you know, general business internal and confidential, and that would appear as both visual markings and metadata, the visual markings obviously being important because it means the users have the ability to quickly identify what the data is.

Part of what we do is the deployment is a period of end user awareness training. And that's really to educate the users on what data classification is, what the labels mean and therefore how you should handle data based on the label that's been applied. Also we apply the metadata. That replicates the labels.

And there's some examples on the slide here. Um, what that means, and we'll go into this in a bit more detail in a moment is that downstream technologies can have the ability to read that metadata, um, and then use that to apply their own handling policies and rules to ensure that it's adequately protected.

And then just looking again into how that kind of piece is into the wider data security piece is that once you have these labels and metadata applied, it means that these downstream technologies such as data loss prevention and encryption, et cetera, have the ability to read them and then apply their own policies to it.

Another thing that's worth mentioning is that once we classify the data , we also have the ability to apply some, some handling rules inherently within the product. And that tends to be focused on things like printing restrictions, data, attention periods, um, and also email restrictions. So for example, if something is classified as internal, we can prevent that from being sent to an external recipient.

So in terms of the tightest data classification process, um, we not only have a product for classifying data at the point of creation, but we also have a product that deals with data at rest. Um, and so just to kind of describe to you how that process works, um, is, is kind of in four phases. So first of all, we use the data discovery component of the product to detect beta and this tends to be more focused on structured data.

Then what we do is try to scan it to identify what kind of sensitive content there might be contained within the document. That helps us then to identify how sensitive it should be so that we can apply our sensitivity labels to the document.

Then the final point is by understanding the, the value and risk of the data, again, and having those labels applied, we can start to putting workflows. That will mean. Um, we've got various integrations with tools like encryption. So for example, if something is classified as, as confidential, and that means that it must be encrypted, we can automatically create those workflows to ensure that it's got those protections applied to it.

Now, just to touch on the methods by which you can actually identify and classify the data. And this is obviously more focused on the point of creation. We can do this in a number of different ways. So the first of which is automated which is using our machine learning capabilities. This method essentially relies on our tools scanning for certain parameters. And if we find them, then we can obviously help to determine how sensitive the file is and automatically apply those tags.

The second is user driven, um, and, and obviously as you'd expect, that means that the user is given the ability to select the classification label based on their understanding of the document. I think this is actually a particularly important thing to consider. You know, people are wasting, the automated is going to be the best option. But actually there's certain scenarios where the user would always understand the nuances within the document better than any machine. And so particularly for highly sensitive data, you know, you want a combination of machine and user working together.

The third one is system suggested, and that could be for things like default labelling. So for example, on emails, if, if every recipient is part of the same organisation, we could default those labels to internal unless we also detect confidential data. Or for example, if somebody is typing up a word file and person information is identified by our contents scan feature then the system would automatically suggest to the user, you know, we think you should classify this as a minimum of confidential because of this data we found. So ultimately what we can do is work with you to identify your classification policy and to help you to figure out a way to enforce that using the tools that are at our disposal.

Then you use a combination of these three components, so that for each situation we kind of set up a policy that is, you know, maximum efficiency in terms of, you know, if a user doesn't need to be involved and we don't get them involved. But equally ensuring the, in those particularly highly confidential scenarios where uses, uh, insights are important, um, that we also ensure that they kind of put their input into the process.

Now, one thing that's important to mention here is actually, the impact of human error. Often when you see presentations about data security or cybersecurity tools, there's a big focus on looking at users as a risk that needs to be mitigated by. Um, reducing their access to things, and that does go a long way to fix the problem.

But what it doesn't incorporate is the fact that users will still have access to some sensitive data and users are always liable to make mistakes. And just to kind of highlight that point, we get our data here from the ICO, um, who obviously investigate actual data breach incidents. And what they feed back to us is that for the top five causes of data loss every year, are human error.

And that's things like, for example, somebody printing off a confidential file and leaving it at the printer, um, attaching the wrong document to an email before they send it. Um, or even adding a wrong recipient onto an email before it's sent. And often these are very honest, simple mistakes. You know, there wasn't any malicious intent involved there, but the impact of those actions can obviously be very significant.

And so one of the kind of USP's of data classification and the types of product is that what we do is focus on making the user better and better informed about how they should handle the data. And really that covers two main things. So the first of which is by having the visual labels applied. It means that, you know, we're making it very clear to the user.

This is confidential data, therefore this is what should happen. Um, and so they should be less likely to make mistakes. But equally because we have these handling rules and natively within the product and also using things like Clearswift and other DLP tools, it means that even when mistakes are made, the likelihood is that we'll pick up on them and stop them from happening.

So just to touch, um, again, a bit more on, on how we kind of piece in with the wider data security strategy within organisations. Actually this should lead quite nicely on to the next piece is around, Clearswift and go anywhere. Um, is something that I mentioned earlier with the metadata. What Titus is also done as an organisation has built integrations with third party and downstream technologies.

And, and here's just some of the kind of key examples, um, sort of just take you through a couple of those. So with DLP often what we find is the organisations struggle to get a policy that, um, helps them mitigate the, the challenges of things like false positives and false negatives. And so the way that we can help with that is by having the understanding of what the document is and applying the metadata.

What we do is, is slightly tweaked the policy within DLP to search for the metadata tags that we apply and then decide whether something should or shouldn't happen based on the label that's going to apply. And what we've found in some key organisations is that that's helped to reduce the problem of things like false positives.

There's one example of a, an organisation, um, through use of data classification actually reduced their false positives by 80% just by having data classification applied. So a big potential impact that. Another one that I'll give an example on is, is the encryption. So if, for example, we have a policy with, uh, with an organisation that dictates that if something is classified as confidential, that it should be encrypted.

We also have the ability to create workflows so that at the point of classifying the document, it then becomes automatically encrypted. This helps to put a plug another gap because obviously if you ask the user to manually go back and encrypt something, you know, with best intentions, the likelihood is that there'll be certain scenarios where those kinds of things are missed.

So by having the ability to automate that process, and again, just ensures that, that that data is going to be adequately protected. And I guess just a kind of general summary note on this is that what we find is when organisations deploy data classification because of the interoperability and some of the examples that we've stated here organisations tend to find that it enhances and improves their return investment of their existing investments they've made in things like DLP encryption. SIEM (security information and event management) is another good example, just quickly because from a threat detection and response function within our reporting console, we can make a log of you know, if somebody classify something for the first time, or perhaps if somebody goes back into a confidential file and tries to down classify it, even if we prevent that from happening, we assume like a log of the fact that it happened.

So what we can then do is export that event, log information into like a SIEM or user behaviour analysis tool, um, and that can help provide extra information and extra granularity in the, um, threat detection and response process. Um, and so it's essentially, you know, enhancing those existing deployed.

So I'll hand over to Nick. Now who's going to take you through the adaptive DLP and go anywhere solutions from HelpSystems and from. Thanks Aaron. So what I'd like to do now is move on to talk to you about how you can use our adaptive DLP controls plugged into your GoAnywhere deployment in order to look for a number of the threats within your managed file transfer.

And some of that can be focused around the kind of hygiene side of things in terms of stopping that MFT floor from becoming the vector where malware, ransomware, spyware comes into the organisation, but equally it's how we can use those same adaptive DLP controls to complement the data classification that Aaron was talking about.

To look for your sensitive information, intellectual property, compliance related data, to stop those kinds of accidental or potentially malicious, data loss or compliance event. You can integrate our secure ICAP gateway with your go anywhere managed file transfer system. In order to inspect profile transfers, to look for a number of those risks then.

So we can provide you with that additional layer of protection from malware and ransomware coming into the organisation. We obviously have the onboard antivirus from Kaspersky, Sophos and Avira to look for the known pieces of malware and ransomware risks. But we have to be conscious that there's a, quite a significant volume of successful ransomware malware attacks that are making it past the more kind of traditional antivirus defences.

So we'll look in a few slides time at our sanitisation mechanisms that can deal with some of these new emergent malware and ransomware risks. We can also inspect the manager felt transfer traffic is moving around between your employees or with you or to your business partners or your customers to look for the sensitive information and that could be the classification tags that have been inserted. They are by something like Titus to do the data classification, to make sure that the the top secret information or the PII data is only being shared with the creative people, both inside and outside of the organisation, based upon the classification.

But equally we can also inspect the contents, the firewall to maybe look for data that's been misclassified or hasn't yet had the classification appliance to the file, and we can look for keywords or phrases that might indicate that something's your intellectual property. Uh, we can look for any of the hundreds of pre-configured tokens for personally identifiable information and PCI data within there.

And as well as blocking the transfers where appropriate we'll look in a second at some of the additional mechanisms we have in terms of the data redaction functionality that allow you to deal with the data risks, but with maybe becoming a barrier to the legitimate business communications. We looked at a number of scenarios in the earlier session, which showed you how you can integrate the adaptive DLP controls within the managed file transfer traffic and in the first kind of compliance scenario, whether it's GDPR data or healthcare or data or whatever.

If you've got data classification in place within the environment, that's correctly classifying the sensitive data. We can obviously look for those classification tags and use that to control when that data is, and isn't able to be shared outside or inside of the organisation. But as we also saw in the scenario, you don't necessarily need to start off by deploying the data classification solutions.

If you need to see some quick returns of investments to deal with some business issues you've got, then in the case of that healthcare provider here, you saw that in the phase one, they were able to integrate the adaptive DLP controls with the managed file transfer traffic, to look for the keywords and tokens that would indicate the presence of healthcare data, and then use that to ensure that the healthcare information was only going to the hospitals and the insurance companies.

And then in phase two, they were coming back to deploy the data classification solutions to take a more considered longer term approach within. We also saw in this area was how you can use the onboard antivirus and the sanitisation mechanisms to allow maybe data to be shared with your organisation, by customers, members of the public and so on.

But we can give you the confidence with the adaptive DLP controls plugged into the MFT in order to allow the data into the organisation, but stop these very malicious malware and ransomware risks coming into the business. One thing I did want to touch on here actually was this additional scenario or specifically focused on secure mail.

Because one of the things we find when we're talking to existing go anywhere, customers is that they typically want to ensure that the messages and the files are moving through secure mail within their go anywhere deployments actually have the same policies applied to them as the emails that pass through the corporate legal systems.

Because obviously if you give the user the buttons, look at whatever that allows them to send an attachment or a message through secure mail and potentially get responses back from a customer or a business partner. Having the ability to inspect that traffic and have a consistent policy within the MFT traffic as you have on your email traffic, I think is hugely important.

So again, those same adaptive DLP controls that can be plugged into the MFT to look at the file transfers can also inspect the secure mail traffic and give you that consistency of protection. That will mirror what you see within the corporate middle floor. Now. Our ICAP gateway can do all the kinds of standard stop and block scanning that you would expect.

We can look for the sense of data, the classification tanks, when we can stop that from leaving the organisation. But one of the features that we introduced, I guess the best part of a decade ago now was our adaptive DLP controls. And these are essentially a number of mechanisms that give you additional options.

So alongside the stop and block, where we see the data classification tag that says something is top secret, I can absolutely not leave the business. We can also use these additional mechanisms to remove the data risks, the malware risks and the ransomware risks, but then leave the rest of the file, the data in tact in order to go on its way, once you've removed the risks from within that communication.

And what you'll see in practices that really allows you to be very, almost aggressive about the policies you're putting in place, because we're not interrupting the whole communication. We're just removing the things that pose the risk within there.

Let's have a look at these in practice. The first of these, the data redaction feature gives us the ability to look within editable file formats, like word documents, PDF files, and so on, but also within scanned documents and imagery for the presence of sensitive information, uh, that could be sensitive words or phrases, or it could be any of the hundreds of tokens, like PCI information and personally identifiable information.

So rather than blocking that whole transfer, what we can do is we can actually go in there, remove the things that poses the risk to the communication, and then allow that to go on its way. So we redact out the things that look like the PII data, the PCI data, but then we leave the rest of the information intact.

So hopefully the person on the far end of that gets over that what they need in order to do their job. And that last bit there up at working in both directions, I think is kind of quite pertinent because more and more of the organisations we deal with want to have inbound redaction policies as well, where there are potentially seen information coming into employees who don't necessarily need to see some of the more sensitive pieces of PII data, PCI data, but they might need to see the rest of the file in order to do their job.

So, you can essentially limit your kind of DLP in your compliance risks by removing that sensitive data that the employees have no business seeing as it enters the business. Now the data redaction piece is typically around the visible information that employees maybe just choose to share because they don't understand it's sensitive.

And shouldn't be shared with this individual either inside or outside of the organisation. The document sanitisation functionality gives you the ability to look for things like the metadata, the version of history that the change control information within a document and strip that out. So, potentially as an attacker of the metadata is useful to me because that allows me to gather some information about your organisation. Like the username of who's created a document who's reviewed it. And so on, which could be potentially useful if I was wanting to send a spoofed message into an organisation to trick one of your users into sharing some credentials or opening a file that's been infected with malware or ransomware.

But equally, as well as sanitising the metadata, we can also look at the version of history and the change control information. So we can make sure that the file your users choose to share outside of the organisation contains the data that they expect and no hidden surprises in the, in the background.

So that paragraph that they thought was deleted, but it's still potentially there in the version history can be removed from there. So we can just simply sit there and your managed file transfer traffic. We can look for the presence of the metadata. We can look for the change control information and the own, and et cetera changes.

We can strip that out and make sure that what leaves the organisation is what was intended to leave the organisation. But it's not just about looking for the sensitive data, leaving the business. Obviously there's a high risk of malware and ransomware coming into organisations. So to supplement our onboard antivirus detection from the likes of Kaspersky Sophos and Avira, we can also use our active content sanitisation to deal with some of these new and emergent data risks.

It could be something as simple as what looks like a purchase order or a CV comes into the organisation to somebody in the finance department or the human resources department is a PDF file from their point of view. They've opened about a hundred of these already today. Sort of what's the risk for PDF, but as an attacker, I can put some active content in there, some JavaScripts and VB script, or some macros that when the user opens the document, we'll just sign silently, install the ransomware or spyware into the organisation.

So to supplement the ability to detect the, the pieces of malware and ransomware. What we can do is simply sit there and that MFT traffic and strip out the active content, but leave the data intact. So from the end user's perspective, they get all the data, the document that they need to do the job, but we've removed all of the potential malware or ransomware risks coming in hidden within that document.

So, what led to do now is step across to my demonstration environment to show you how the data classification, the adaptive TLP controls and the managed file transfer solutions can all work together. So it could be that one of my users has just created a new document or they've received something from a business partner.

And they recognise that based upon the presence of PCI data or personally identifiable information, that this needs to be classified as internal inline with the corporate policy. So we can use the little plugin we have within a Microsoft word here, the, to essentially select the correct classification.

And then if I hit the save button here, you can see that we know marked visibly within the header and the watermark. To let the user know that this level of classification has been associated with this document. And that can be very useful from a kind of end user education piece, because with every interaction you're kind of reminding them of their importance of their data within the documents.

At the same time as putting the visible markings in there, we can also put the metadata tags in there. And what we can do is both with Titus at the desktop level, but also within the adaptive DLP controls plugged into your managed file transfer traffic. We can look for these classification tags to start to enforce a policy around the suitable sharing of data.

So if I step across to my GoAnywhere, just know I've got a simple project here, which is going to call an ICAP resource. In this case, the ICAT resource is our secure ICAP gateway. It's worth pointing out that when you pass managed file transfer traffic across to the ICAP gateway, you can also give us more useful information alongside of that in terms of the authenticated user or the source, the destination.

And we can actually use all this additional information to not just inspect the data, but also to apply policies based upon who's trying to share the data or where it's coming from or where it's going to. Now, my simple demonstration environment here, I've got simple policy routes that just has a number of checks within it.

A number of content rules that says what is, and isn't acceptable within the MFT traffic. So we can look for pieces of malware. We can look for dangerous file types, but we can also look for the classification tax within the documents that Titus put in place. We can also inspect the documents to look for sense of data both visible to redact.

But also we can sanitise out the metadata and the other hidden information in the background of the. So if I step across now to my MFT, uh, client here, just going to use this simple web client here, and I'm going to go and applaud that word document that we've just classified as an internal and the policy I've put in place is this internal data simply can't leave the organisation through the MFT solution.

So if I hit the refresh button, now you can see that we've actually passed the data across to the MFT, to the ICAP gateway. We've done the content scanning. We simply said, this cannot leave the business. And we've actually left a little block page behind. So you can start to use some of the hinged user education, both using the blog page, but also you can use the workflows to kick off email notifications and so on to individuals within the organisation.

But it could be that actually we want a slightly more nuanced policy than simply blocking something based upon the classification level. So what we can do, I'm just going to upload a whole host of files right now is we can also perform a number of other tasks with the ICAP gateway, inspecting the MFT traffic.

So it could be that, you know, we simply want to do some file type control. We just don't want executable, file types entering the organisation. But equally, if you are a manufacturing organisation or somebody like that, you could use those same file type controls to stop AutoCAD files from leaving the business, unless they were going to approved recipients.

We might potentially find sensitive data within the documents. So if we have a look at this PDF file here, we can see that it's not yet her classification applied to it, but it does still contain the PCI data and the PII information that's of concern to us. So when that's been passed through our ICAP gateway, rather than block it, all we've done is we simply gone in and redacted that.

As I mentioned earlier that doesn't just happen within the edit to text documents. We can also do that within the scan documents and images, because we've got the optical character recognition to extract the text out, recognise that there's a presence of sensitive data there and redact that inline with the corporate policy, as well as the data moves through the, your MFT traffic.

We also mentioned the document sanitisation. So it could be that we've got some unaccepted changes within a document with some document comments we step across and have a look at their document properties here. That could be some properties that we would like to remove, but equally there might be some that we want to whitelist like the classification tag, because we actually want to use that somewhere else within the policy.

So to have a look at what's been transferred through the MFT solution here. It will just take a second to pop up. It's protected view. We have a lot of control over which elements we sanitise out. So in my policy here, I've chosen to strip out the unaccepted changes, essentially accept them, but I've chosen to leave the document, comments intact.

But you can see here that we still have the ability to do the redaction within the document comments there. And if we have a look at the metadata. You can see here that we've chosen to be quite brutal about it and strip everything apart from these Titus tags that we want to use elsewhere within the policy.

We also talked about the act of content sanitisation as, as a way of protecting yourself from these new and emergent malware risks that may come into the business. So in this case here, I've got a PDF file here. It’s got some active content in it that simply executes as soon as the user opens the document.

And that's potentially one of those areas that can bring the ransomware or spyware into the business. What we can do, and we saw the scenario earlier with the police force, but equally it could be a financial organisation or something like that, where you want a member of the public to be able to afford, to see a scan of a driving license to you, but you don't want that potential act of content risk within there.

If I open this PDF, though, you can see that there won't be a pop-up because we've stripped out the active content, but we have left all the underlying data intact. So that allows your employees to get hold of the document and what they need to do their job. It just doesn't serve as this a way of kind of piggybacking the malware or the ransomware risk into the organisation.

And very lastly we mentioned secure mail and one of the scenario was earlier. So again, we've got this file that contains the PCI data and the PII data there. If I sent that through secure mail, we can apply the same consistent policy as we've got within the rest of the MFT traffic. But importantly, it's succinct, consistent policy as we have within our corporate mail flow as well.

So if I go and I try and open this failover through secure mail. You can see that we've got the ability to do the data redaction was in there as well. So that is a very high-level overview of how you can essentially put the data classification, the adaptive DLP controls and the GoAnywhere MFT altogether to really sort of increase the level of protection you get around your sensitive information and from the malware and the ransomware risks.

So if you've got any additional questions, please feel free to ask them in the chat now. The next session we'll cover the role of agents with some real life examples from other MFT users. And with that, I'll just say, thank you very much from Aaron and myself, your team is very much appreciated. As you've seen, you can use the help systems, data classification, and adaptive DLP solutions alongside your go anywhere deployment in order to provide improved protection from ransomware, malware, and enhanced data loss protection and compliance controls.