Posted by Jay W. Curry

Frank Perez, CEO of Sfile gives us the low down on utilizing artificial intelligence to analyze dark data.

Please excuse any typos in this hasty transcript.

Jay Curry: And we’re back. Hello, Texas. Welcome to Texas Business Radio. Wow, we’re having fun today. We’re talking big data and folks, this is state of the art. And this program, I’m just gonna cut everything short so we can spend as much time with Frank as possible. Let me just tell you to go to and everything we have, and everything we do is right there in beautiful color. I’m Jay Curry. My co-host is George Walden. George, we’re gonna talk about big data. This is gonna be fun. What do you think?

George Walden: Well we’re not just talking about big data, we’re talking about dark data. That’s a term I haven’t heard up until today and oh my goodness. It’s one thing to talk about data science and data analytics, and what we’re trying to mine out there in the marketplace, but it is … you just have no vision of how large this problem actually is until you start this conversation. And we have Frank Perez today to lead us and guide us through this.

Jay Curry: Yes, sir. Frank, thank you for joining us. Frank is the CEO of Sfile. Thanks for coming.

Frank Perez: Oh, you’re welcome. And thank you for inviting me.

Jay Curry: Take a shot at trying to tell us in ten minutes what Sflie’s doing. This is amazing stuff.

Frank Perez: Well, Sfile is a data analytics company that’s focused on actually taking dark data, which is a variation of big data. Dark data, basically, is all the unstructured data that is all over the place. You have it in public sources such as newspapers, clippings, things of that degree. You also have it in public records. In the case of the oil and gas space, you have regulatory filings, you have all the data that’s occurring about production data, characteristics of the wells that were drilled, historically, and things like that.
We take this dark data and we use artificial intelligence and machine learning to train computers how to go in there and actually read through this data, and actually extrapolate from it, with the same context as any geophysicist may have, a geologist, or even a petroleum engineer, and read and understand this data to create, basically, structured data. So, we take it from a dark data approach into a well enlightened structured data sets that we can then use for building models for doing optimizations for reservoir characteristics, for optimizations of completion designs, and so forth and so forth. You can imagine once you have data like that, the kinds of things you can actually understand about it.

Jay Curry: There’s no end to it though. You go all the way back to recorded history and you bring it in. If you’re dealing with wells, you’re going back to when they started plugging holes … or drilling holes. They want … from the Rockefellers and then…

Frank Perez: Well, you have to when you think about it. A reservoir itself basically is, it’s a living natural element, a natural entity in a sense. It’s constantly being affected by what you do to it. And if you think of it like the human body, we poke and prod it a lot and ultimately, it’s gonna change because of these things. And so, as we drain that reservoir the rock characteristics change, the understanding of what we have inside those reservoirs change. So, we have to go back from a historical perspective and then drive it through time so we can actually understand exactly what has occurred, and what can we expect to occur knowing what we know? And the more we know, the better we can predict. And ultimately, the better we can predict, the better we can de-risk the large amounts of capital that are basically put into to see if we can extract these things economically.

Jay Curry: So, this is heavy, heavy artificial intelligence. You got data that’s 300 years old, you got to recognize what that data meant in their day, and project it into today. And you do that with a computer, there’s not people sitting there going, “Let’s see. Thomas Jefferson signed this will. He drilled an oil or water well.” You got to figure out what it was in the day and the only way to do that … we’re talking super computers.

Frank Perez: Oh, got to be. Massive amounts of parallel processing computing that’s occurring out there too. So, we’re leveraging this huge phenomenon of highly condensed core computers where you have … a single server can have up to 256 cores on it. And now we’re using GPUs where we have tens of thousands of GPU cores to help us, basically, paint the dots and put them together for us is what’s happening.

Jay Curry: Just the computing power alone we’re dealing with super computers. And your artificial intelligence is what’s taking all of this speed and all this information and making sense out of it. For who?

Frank Perez: Well, we’re making sense out of it, basically, for the EMP operators to help them. All the capital that goes into it, so we have all this private equity, we have all this public money, we have to de-risk this capital. And in order for them to, basically, continue this experimentation, to continue this economic boom, we’ve had to make sure that we can make these efforts economical. Because if we don’t make ’em economical, eventually we’re not gonna be able to afford the capital it’s gonna take to do it. And the more we can make it economical, and the more we can de-risk this capital, the more we can encourage this economy to continue to grow behind all this effort that’s happening out there. So, there’s a little bit of looking at it from that perspective and understanding that our big job, ultimately, is being able to find and extrapolate enough knowledge that we can actually make better decisions.

Jay Curry: Wow. How do you do this? Do you have an army of people? Do you have … You’re working with IBM, with the big super computers. I mean, this is like Google for oil.

Frank Perez: It is exactly just that as a matter of fact. We modeled a lot of the efforts that we did from what Google did to help mine oil, so we can actually have these great applications on our smartphones and things like that. So, we took those as cues and looked at how they accomplished these things across these cloud competing platforms to get these things done.
These things are very much democratized now. We can have access to this kind of computing power fairly inexpensively now, with these great ideas. Everything in the open source community has allowed us to, basically, build on a knowledge of past, understanding how to accomplish these things as well, too. We hired great data scientists, great engineers, great, great experts who have decades of petroleum engineering experience, geological experience, geophysical experience.
I’m bringing them all together so that we can actually teach computers how to understand what it’s reading. And then ultimately, to the degree that these professionals, these experienced individuals, that they can actually create as good quality, if not better quality, only different is this … Is that we as human beings, basically, are really poor at collecting data. The reason why is because we’re subjective, we get interrupted, we get exhausted. All these different things that happen. And then you spread that across teams of people, now you just multiplied that problem, you know, times by how many people you have out there.
What computers are capable of doing, is they are capable, once you teach them how to do one thing, they can consistently do it over and over again. And even when they create an air, they can create the air in a consistent matter that we can go back and fix it and reprocess these things again real quickly. And the challenge that we have, basically, is how fast and how complex these things can be that we’re creating so much data that we can no longer measure the quality of what we’re doing very, very efficiently. So, we’re building systems now to help us deal with that even further right now. That they can actually go in there and statistically analyze whether or not our results are better than they were before, and whether or not the data has changed fast enough that we don’t even understand what’s happening.
So, we’re having to version data. We have to version software, we have to version models. All these things that require huge degree of engineering effort, just to be able to make it work in a consistent manner over and over again.

Jay Curry: I got to ask. You gotta have a gazillion PhDs because every element of this is state of the art. It’s outside of the box thinking, and it’s pulling it all together and in my day, we used to bring teams together to build a software. You’re bringing people together to build a software that’s going to do the software, kind of. Does that make sense? I mean, you’re not even working on the data, you’re working on the artificial intelligence that’s gonna work on the data.

Frank Perez: That’s right. That’s what we’re doing. We do have a great team with PhDs on our team. But to be honest with you, having a team assembled like that is actually kind of a … it’s luck. We’re in this really interesting position at Sfile that we had the fortunate opportunity that the right mix of people came together at the right time. We’ve had computer scientists who are coming into a job marketplace where we were basically in a recession of all places. We had the petroleum industry that, basically, was going through their own contraction as well too. So, we had all these people with years of history that were looking for something new to do as well, too. And then machine learning, and cloud computing was all coming about the same time.
We’ve basically had the vision to be able to track great people for early on that gave us the ability to have success. And that success as well, too, was translated by also having great clients who gave us an opportunity to look through all their rows of terabytes and terabytes of data to be able to make something of it, so that they can actually benefit from the same things as well, too. So, all of these things just came in at the right place, and the right time and it’s just a magical thing that’s happening right now. And our success, it’s just been really nice to see. After having been a participant in many startups, knowing that that’s actually kind of lucky than anything else, I really feel like we’re on something that’s very, very big. Huge.

Jay Curry: Boy, it’s exciting. Yeah.

George Walden: Well, I was fascinated by the fact that you’re not only taking what you’re doing in the oil field industry and creating your intelligence. You’re also taking competing models and looking at how competing markets are going to affect whether or not you should be doing these ventures. I think that’s fascinating. How did you get to that?

Frank Perez: Well, there was a point in time, and I think you probably heard this by now, but we’ve always said it. If you’re not a software business, now you will be you just don’t know it. We’ve all heard it and that’s true.

Jay Curry: It is.

Frank Perez: So, my whole career has been about automated all these things like that. When you think about every venture that I’ve ever done and every problem we try to solve, you build upon a same thing to take it to the next place. And you translate that same challenge into the challenge you’re looking at right now, and you’re figuring out how you can do even better. But now you have a second iteration, third iteration, tenth iteration to try to do it all over again.

Jay Curry: Wow. Frank, thank you so much. We’ve been talking to Frank Perez, CEO of Sfile. Now, this is pretty interesting stuff. How does somebody go learn more about it? Do you have a website, do you have a-

Frank Perez: Yes.

Jay Curry: Where would you tell them to go?

Frank Perez: Well, you can visit us at, which you’ll find more about our company. But we still try to keep it a little bit stealthy. But I think as well too, there’s a lot of journals out there about big data, in the petroleum industry as well too. You’re gonna see a lot of different variations of things happening. We’re not the only ones. A lot of the leading operators are really, really taking advantage of this technology and succeeding at it.

Jay Curry: Well, we’ve spent our time on this one. But I know you’re a serial entrepreneur. We’re gonna have you back, we’re gonna talk about pumps and pipes, and we’re gonna … there’s just a lot of stuff we need to be talking about. So, we’re gonna have you back. Thank you for joining us.

Frank Perez: And thank you.

Jay Curry: Folks, we’re gonna have to cut this off ’cause somebody insists that we pay a few bills. So, don’t go anywhere. We’re gonna pay a couple bills and we’ll be right back.

Sponsored in part by:
CFA Banner Ad
Rand 2
UH Valenti School 1
Vistage Jay 1
Primeway FCU
Dell 1
Salesforce Main
Mouth Marketing 1
About the Author
Jay W. Curry

Jay W. Curry

Along with hosting “Texas Business Radio”, Jay is a Professional Certified Coach and Master Chair facilitating four Houston-based Vistage peer groups. In addition to being a best selling non-fiction author, the 2015 release of his award winning novel, Nixon and Dovey: the Legend Returns, adds novelist to his title. Jay holds a BS in Mathematics from Oklahoma State and an MS in Computer Science from Kansas State. You can learn more about Jay HERE.

Sponsored in part by:
Nixon and Dovey
RREA Banner
WP Engine
Bayou Graphix 1
Last Shadow
Valesco 1
Intero Advisory 1
Houston ISO9000
Recent Posts

Leave a Comment



Contact Us
  • This field is for validation purposes and should be left unchanged.