Optimize with AI! Lee Neubecker sets out on a quest to find out what’s happening with Artificial Intelligence as it relates to the eDiscovery review process. Lee visits eDiscovery Director, Jeffrey Wolff from ZyLAB and together they examine how new AI algorithms are coded for priority review and can rank documents for relevance, saving countless hours and dollars for the client.
Part 2 of 3 Part Series on Smarter Solutions eDiscovery
Optimizing eDiscovery with AI Video Transcript Follows
Lee Neubecker (LN): Hi, I have Jeff Wolff back on the show again from ZyLAB. Jeff, thanks for coming back.
Jeff Wolff (JW): Thank you.
LN: And today we’re going to talk a little bit more about trends in Artificial Intelligence as it relates to eDiscovery and the review process that comes along with that. Jeff, what do you see happening right now with Artificial Intelligence as it relates to the eDiscovery review process?
JW: So what we’ve noticed over time is that, traditionally, Artificial Intelligence was always deemed to be only valid in cases where you had hundreds of thousands or millions of documents. And one of the changes that have happened over the last few years is that the Artificial Intelligence models have gotten so much better than you can now use them for much smaller data sets, and so we evangelize the use of Artificial Intelligence in smaller data sets, even, a thousand documents, you’re going to get a better review, more efficient, and more correct, faster, with AI than you would with a team of reviewers.
LN: So if you have a project and you’re using your platform, let’s say there are a million pages of documents that need to be reviewed. You put a review team on starting that process, and they start categorizing and coding, as they get through the first ten thousand documents, what is your software doing to help make this process more efficient and effective for them?
JW: Sure, so if you’re using traditional, what we call supervised machine learning, that used to be referred to as predictive coding, what our software allows you to do is train a small training batch, so a small sample of the documents, and code them for responsiveness, whether they’re responsive or not responsive. And we’ve made it very easy for users to do that. So, you can create issues, and for each issue, you get two tabs, responsive or not responsive, and you just train, you look through a bunch of training documents and you tag the documents appropriately, and the machine classifier learns, very quickly, what is responsive, what is not responsive. So, maybe after two or at most three training batches, the classifier is now bringing you back almost exclusively responsive documents. It’s already smart enough to do that. And so you only need a few training rounds to get the classifier well over the 80%, typical 80% precision and recall threshold that most attorneys feel is what the human is capable of, but the machine will do 90, 95% precision and recall, so you can be assured, not only are you getting a more efficient and more correct review, but you’re also doing it in a whole lot less time with a whole lot fewer people.
LN: And so, are your algorithms looking for synonyms, and similar phrasing that has equivalent word matches?
JW: It’s a bit of secret sauce. But, yeah, we use a support vector machine-based set of algorithms, kind of the most modern version of machine learning. And it is effective, it understands what our topics that were identified in the document, and what other topics are like them. So that’s how it’s doing an identification. But you’re effectively training in or on that.
LN: So the people using your platform, are they having to necessarily review all of the documents, or are you basically, based on the trained review process, you’re taking that universe of a million, and as they get through it, it’s starting to cluster.
LN: There’s a set that, this probably isn’t useful, and you don’t have to look at it, but you can look through it just to see.
LN: They have confidence that it’s not excluding relevant stuff, right?
JW: Yeah. What we find from an AI standpoint is that the two primary use cases that attorneys have when they use AI are priority review, so that means hey, I’m going to start teaching the data about, the classifier about my data set, and I’m going to show what responsive documents look like, and then I want it to rank all the remaining documents for me for relevance. And so I’m going to then put eyes on those top-ranking documents. That’s effectively looking for the smoking gun, right? That’s one. But they also use it a lot for QC and this is where I see I’m trying to put a lot more attorneys into utilizing AI, is you’ve already done your tagging, and you had eyes on all of your documents, now go back and use the AI and compare it against what your human reviewers did, and see if you’ve missed things. Because inevitably, your reviewers are not going to be all at the same level. Some people are going to miss-tag documents, and the AI has a really good chance of picking up those mistakes and showing them to you.
LN: So have there been any published studies that document the effectiveness of AI with the review process?
JW: There’s been a bunch of them. I know Law Geeks did one that was pretty interesting. What I’ve read recently is that only about, nationally, about 4% of all cases use Artificial Intelligence officially. But then again, there’s no requirement, in the meet and confer that you identify that you are using Artificial Intelligence in a discovery case. So a lot of attorneys can be used, and just not reporting it. Which is fine, because back when the review was manual, and you went through paper and bankers boxes, you didn’t have to document the process for that review. So why should you have to document the fact that you using a machine to do some of the identification of documents and responsiveness today?
LN: So are there potential problems as a result of using AI for failing to produce relevant documents?
JW: No, I think the case law already demonstrates that AI is an accepted form of using, of identifying reviewed documents, and again, even if you’re just using it for QC purposes, you’re still better off. You’re still less likely to miss things than if you hadn’t used it at all.
LN: Great, well, it’s been great. Thanks a bunch for being on the show.
JW: My pleasure, my pleasure.