Lara Arikan: using Machine learning to predict mixed fluid properties

Authors: Lara Arikan, Livia Fulchignoni, Daniel M. Tartakovsky

The increasing demand for faster oil and gas flow simulations can be satisfied in various ways. One of them is the use of proxy models with low computational cost. In this sense, Black Oil fluid modeling can be thought of as a simplified model of the more rigorous compositional approach. However, not all reservoir fluid compositions can be modeled through the Black Oil approach with reasonable accuracy. This paper investigates when the empirical Black Oil formulation can be used to model reservoir fluids and which set of equations represents better a particular composition.
A total of 8 and 4 traditional Black Oil equations for the gas solubility ratio (Rs) and oil formation volume factor (Bo) respectively were tested against 1626 experimental data points from a collection of 197 Brazilian fluid samples. This data was extracted from a large set of PVT reports through an automatic data mining procedure. First, samples were separated into two groups, according to the quality of the Black Oil predictions. For samples that could be appropriately represented by the equations, patterns that dictate the best correlation for each property were observed. A k-nearest neighbors classifier was trained to predict the best set of correlations for a reservoir fluid characterized by its API gravity (API), gas density (dg), gas-oil ratio (GOR), bubble point pressure (Pb), and carbon dioxide molar content. It performed at 57% accuracy for the Bo and 71% accuracy for the Rs model predictions. However, plotting these characteristics against the “best” class of model for each property shows that in many cases, the differences between the errors associated with each model are very close to each other. Therefore a better approach might be to train a multi-label classifier that would predict multiple applicable models to account for comparable accuracy levels.

6 Comments on “Lara Arikan: using Machine learning to predict mixed fluid properties

  1. Hi Lara,
    Nice job. two questions. What was the most interesting thing that you learned in doing the research about physics? Also – Do you think that running more than one model would be implemented by oil companies?
    – Jenny

    • Hi Jenny,
      Thanks for your comment! I think what I meant was more along the lines that a better classifier would identify multiple possible “best” models, and a company could choose from among them. In either case, I’m certain oil companies do already run more than one model to check which works best for a given simulation; my aim was to help them run fewer, having provided them with a small subset of the best models already. The most interesting thing I learned was certainly the methodology itself – I picked up much of the techniques I used on the go, which is why the slide for future work is so dense. There is plenty to learn for me before I can tackle the suggestions I have made myself.

  2. Hi Lara,
    Excellent job! Could you talk a bit more about the process of fitting the KNN model? Do you think it is possible to further improve the accuracy of the KNN?

    • Hi Livia! I’m very glad for your praise. The process was as follows: the data points I worked with were measurements of environmental conditions along with values for Bo and Rs for various petroleum samples. I ran these measurements through the equations that I was evaluating to see which equation gave the lowest error compared to the measured Bo or Rs value, depending on which property I was investigating at the time. This equation was the “label” for the classifier. Here I relied largely on scikit-learn to do the heavy lifting for me, and simply fed it the measurements as a tuple along with their label. I looped through many k-values and preserved only the k of highest accuracy; I validated this accuracy by testing with a random fifth of the dataset each time. More data would certainly improve the KNN’s performance.

    • Hello Jason! It would be interesting to have an algorithm produce actual values of Bo and Rs based on previous instances rather than rely on an equation to predict it for us. If you ask about how exactly one can manage this, I would like to learn this myself. Hopefully I will soon be able to give you a more involved answer!

Leave a Reply

Your email address will not be published. Required fields are marked *