.. _python:
.. public

Introduction to Python and pygeostat
====================================

Introduction
------------

This documentation gives a brief overview of python for researchers and
affiliates of CCG. This documentation is *not* an introduction to python
in general, but is designed to help with:

-  deciding whether using python would be valuable for your work
-  applying python to geostatistics problems using CCG and GSLIB
   software
-  the basics of using python and pygeostat to analyze data, visualize
   results and interface with Paraview

Python is an interpreted, high level general purpose programming and
scripting language which is very well suited to many of the common
scripting tasks where bash or matlab would typically be used, and many
tasks where an entire Fortran program might have previously been used.
The huge collection of libraries for python covering machine learning,
statistics, data management and optimization make it very useful for
geostatistical workflows. There are many excellent tutorials for Python
3 available (http://docs.python.org/3/tutorial/introduction.html) which
do a much better job introducing the language than I would.

Pygeostat is a python module using fortran libraries which was written
to facilitate working with GeoEAS data files (GSLIB format), CSV data
files, and GSLIB and CCG software. The integrated fortran libraries
making pygeostat extremely quick to use for many applications, such as
variogram calculation and modeling requiring no parameter files. For
other GSLIB and CCG programs, generic interface routines are provided to
interface with the programs using standard string interpolation
techniques familiar to anyone who has scripted with Bash.

Loading Modules
---------------

The following code section is normally used as a header for scripts
using pygeostat. pygeostat is imported as well as the python math
package numpy and data analysis package pandas. The standard python
plotting library matplotlib is imported. The %inline option is used to
include figures in this documentation rather than saving images
separately.

.. code:: python

    '''pygeostat_examples.ipynb: Demonstrate usage of pygeostat'''
    __author__ = 'Jared Deutsch'
    __date__ = '2015-11-04'
    __version__ = '4.0'

    import pygeostat as gs
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline

Super quick introduction to python for CCG researchers
------------------------------------------------------

An extremely quick introduction to python is given here. Python is a
high level interpreted programming language. Variable assignment and
math is simple. The print operator prints to the screen. Whitespace (in
the form of 4 spaces) is used for blocks such as if statements, or
loops.

.. code:: python

    print('Hello World!')

    print('Math is easy and works as expected')
    a = 1
    b = 2
    print('a =',a,'b =',b)
    print('a + b =',a+b)
    print('a/b+0.2 =',a/b+0.2)

    if (1>2):
        print('This statement is unreachable')
    else:
        print('Note the 4 spaces for block statements like if and loops')

    print('Loops can use a list, or a range among other things!')
    for i in [1,2,10]:
        print(i)

    print('Note that everything starts from 0 in python unlike Fortran')
    for i in range(3):
        print(i)

    mystr = 'Hello World!'
    print('Lots of helpful options are built in such as string splitting:',mystr.split())


.. parsed-literal::

    Hello World!
    Math is easy and works as expected
    a = 1 b = 2
    a + b = 3
    a/b+0.2 = 0.7
    Note the 4 spaces for block statements like if and loops
    Loops can use a list, or a range among other things!
    1
    2
    10
    Note that everything starts from 0 in python unlike Fortran
    0
    1
    2
    Lots of helpful options are built in such as string splitting: ['Hello', 'World!']


Importing GSLIB files to python with pygeostat
----------------------------------------------

We will use the oilsands.dat training data set for this example. A
pygeostat DataFile object is used to store the data, and relevant
parameters such as the names of x, y and z coordinates. The DataFile
class stores data by default in a Pandas DataFrame called 'data'. Pandas
provides the 'head()' function to print the first few rows of a
DataFrame - we can see that the oilsands data is read in to this object.

.. code:: python

    datafl = gs.DataFile(flname='../../examples/data/oilsands.dat',
                         dh='Drillhole Number',x='East',y='North',z='Elevation')
    datafl.data.head()




.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Drillhole Number</th>
          <th>East</th>
          <th>North</th>
          <th>Elevation</th>
          <th>Bitumen</th>
          <th>Fines</th>
          <th>Chlorides</th>
          <th>Facies Code</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>257.5</td>
          <td>7.378</td>
          <td>28.784</td>
          <td>-9</td>
          <td>-9</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>254.5</td>
          <td>9.176</td>
          <td>22.897</td>
          <td>-9</td>
          <td>-9</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>251.5</td>
          <td>11.543</td>
          <td>15.144</td>
          <td>-9</td>
          <td>-9</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>248.5</td>
          <td>6.808</td>
          <td>30.598</td>
          <td>-9</td>
          <td>-9</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>245.5</td>
          <td>10.657</td>
          <td>18.011</td>
          <td>-9</td>
          <td>-9</td>
        </tr>
      </tbody>
    </table>
    </div>


|

There are a number of special variables in the DataFile class including
the drillhole identifier (dh), and coordinates x, y and z. These can be
accessed by name, or number. Recall that as python uses zero-ordering
(the same as C and C++), column 1 in python refers to column 2 in
GSLIB/Fortran one-ordering.

.. code:: python

    print('\nFirst couple rows of Z:\n', datafl.data[datafl.z].head())
    print('\nFirst couple rows of DH:\n', datafl.data['Drillhole Number'].head())
    print('\nFirst couple rows of the second column: (column 2 if GSLIB/Fortran 1 ordered or 1'
          ' if Python/C/C++ 0 ordered)\n', datafl.data.icol(1).head())


.. parsed-literal::


    First couple rows of Z:
     0    257.5
    1    254.5
    2    251.5
    3    248.5
    4    245.5
    Name: Elevation, dtype: float64

    First couple rows of DH:
     0    2
    1    2
    2    2
    3    2
    4    2
    Name: Drillhole Number, dtype: int64

    First couple rows of the second column: (column 2 if GSLIB/Fortran 1 ordered or 1 if Python/C/C++ 0 ordered)
     0    1245
    1    1245
    2    1245
    3    1245
    4    1245
    Name: East, dtype: float64


pygeostat provides a number of convenience functions which can be used.
Slicing is easy and unique categories can be identified.

.. code:: python

    print('\nUnique categories can be identified, for example facies code:')
    print(datafl.unique_cats('Facies Code'))

    print('\nSlicing can be done as well, for example data in drill hole 3:')
    print(datafl.data[datafl.data['Drillhole Number'] == 3])

    print('\nThe bitumen and fines content for fines greater than 40 (first few rows)')
    print(datafl.data[['Bitumen','Fines']][datafl.data['Fines'] > 40.0].head())

    print('Creating new columns is simple, for example bitumen * fines - 200:')
    datafl.data['Bitumen times Fines minus 200'] = datafl.data['Bitumen']*datafl.data['Fines']-200
    print(datafl.data.head())


.. parsed-literal::

    Unique categories can be identified, for example facies code:
    [-9, 23, 30, 31, 40, 50, 60, 61, 69, 70]

    Slicing can be done as well, for example data in drill hole 3:
        Drillhole Number    East    North  Elevation  Bitumen   Fines  Chlorides  \
    18                 3  1262.5  10821.3      242.5   11.378  15.659         -9
    19                 3  1262.5  10821.3      239.5   12.436  12.163         -9
    20                 3  1262.5  10821.3      236.5    8.900  23.700         -9
    21                 3  1262.5  10821.3      233.5    8.870  23.837         -9
    22                 3  1262.5  10821.3      230.5    9.733  21.057         -9
    23                 3  1262.5  10821.3      227.5   10.984  16.931         -9
    24                 3  1262.5  10821.3      224.5   11.113  16.549         -9
    25                 3  1262.5  10821.3      221.5    9.054  23.551         -9
    26                 3  1262.5  10821.3      218.5   15.029   5.716         -9
    27                 3  1262.5  10821.3      215.5   12.957  10.565         -9
    28                 3  1262.5  10821.3      212.5   13.589   8.537         -9

        Facies Code
    18           -9
    19           -9
    20           -9
    21           -9
    22           -9
    23           -9
    24           -9
    25           -9
    26           -9
    27           -9
    28           -9

    The bitumen and fines content for fines greater than 40 (first few rows)
        Bitumen   Fines
    54    2.900  43.500
    73    2.465  44.838
    74    0.000  52.900
    75    0.000  52.900
    76    0.000  52.900
    Creating new columns is simple, for example bitumen * fines - 200:
       Drillhole Number  East     North  Elevation  Bitumen   Fines  Chlorides  \
    0                 2  1245  10687.09      257.5    7.378  28.784         -9
    1                 2  1245  10687.09      254.5    9.176  22.897         -9
    2                 2  1245  10687.09      251.5   11.543  15.144         -9
    3                 2  1245  10687.09      248.5    6.808  30.598         -9
    4                 2  1245  10687.09      245.5   10.657  18.011         -9

       Facies Code  Bitumen times Fines minus 200
    0           -9                      12.368352
    1           -9                      10.102872
    2           -9                     -25.192808
    3           -9                       8.311184
    4           -9                      -8.056773


Getting basic statistics from the data set can be done using the
describe command - here just bitumen and fines are summarized:

.. code:: python

    datafl.data[['Bitumen','Fines']].describe()




.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Bitumen</th>
          <th>Fines</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>count</th>
          <td>5808.000000</td>
          <td>5808.000000</td>
        </tr>
        <tr>
          <th>mean</th>
          <td>7.708852</td>
          <td>28.707298</td>
        </tr>
        <tr>
          <th>std</th>
          <td>5.136709</td>
          <td>21.247085</td>
        </tr>
        <tr>
          <th>min</th>
          <td>0.000000</td>
          <td>0.861000</td>
        </tr>
        <tr>
          <th>25%</th>
          <td>2.877750</td>
          <td>10.166000</td>
        </tr>
        <tr>
          <th>50%</th>
          <td>7.480000</td>
          <td>24.453000</td>
        </tr>
        <tr>
          <th>75%</th>
          <td>12.666000</td>
          <td>42.823250</td>
        </tr>
        <tr>
          <th>max</th>
          <td>18.428000</td>
          <td>86.777000</td>
        </tr>
      </tbody>
    </table>
    </div>

|

Correlation matrix:

.. code:: python

    datafl.data[['Bitumen','Fines']].corr()

.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Bitumen</th>
          <th>Fines</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Bitumen</th>
          <td>1.000000</td>
          <td>-0.843863</td>
        </tr>
        <tr>
          <th>Fines</th>
          <td>-0.843863</td>
          <td>1.000000</td>
        </tr>
      </tbody>
    </table>
    </div>

|

Conditionally setting data in Pandas requires indexing the location in
the array:

.. code:: python

    # Set all facies -9 values to -999 values
    datafl.data.loc[datafl.data['Facies Code'] == -9,'Facies Code'] = -999
    datafl.data.head()




.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Drillhole Number</th>
          <th>East</th>
          <th>North</th>
          <th>Elevation</th>
          <th>Bitumen</th>
          <th>Fines</th>
          <th>Chlorides</th>
          <th>Facies Code</th>
          <th>Bitumen above 7</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>257.5</td>
          <td>7.378</td>
          <td>28.784</td>
          <td>-9</td>
          <td>-999</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>254.5</td>
          <td>9.176</td>
          <td>22.897</td>
          <td>-9</td>
          <td>-999</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>251.5</td>
          <td>11.543</td>
          <td>15.144</td>
          <td>-9</td>
          <td>-999</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>248.5</td>
          <td>6.808</td>
          <td>30.598</td>
          <td>-9</td>
          <td>-999</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>245.5</td>
          <td>10.657</td>
          <td>18.011</td>
          <td>-9</td>
          <td>-999</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>

|

Saving data to CSV and VTK file formats
---------------------------------------

Working with data is only useful if it can be saved. Here, the data is
re-imported to get rid of the changes to the data set.

.. code:: python

    datafl = gs.DataFile(flname='../../examples/data/oilsands.dat', dh='Drillhole Number', x='East',
                         y='North', z='Elevation')

We can create an indicator column which is 1 where the bitumen is above
7% and 0 otherwise.

.. code:: python

    datafl.data['Bitumen above 7'] = np.zeros(len(datafl.data['Bitumen']))
    datafl.data.loc[datafl.data['Bitumen'] >= 7.0,'Bitumen above 7'] = 1
    datafl.data[['Bitumen','Bitumen above 7']].head()




.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Bitumen</th>
          <th>Bitumen above 7</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>7.378</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>9.176</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>11.543</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>6.808</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>10.657</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>

|

This file can be saved as a GSLIB file or as a CSV file which can be
opened by Excel.

.. code:: python

    outgslibfl = 'oilsands_out.dat'
    outcsvfl = 'oilsands_out.csv'

    datafl.writefile(flname=outgslibfl, fltype='GSLIB')

.. parsed-literal::

    The datafile can be saved as a GSLIB file: oilsands_out.dat

Or as a CSV file which can be opened by Excel A subset of columns, can
be saved with any of these modes - just saving X, Y and Z for example:
oilsands\_out.csv

.. code:: python

    datafl.writefile(flname=outcsvfl, variables=['East','North','Elevation'], fltype='CSV')

A VTK file can also be created for visualization with Paraview. This
requires the DataFile class to have the X, Y and Z coordinates (.x, .y
and .z attributes)

.. code:: python

    outvtkfl = 'oilsands_out.vtk'
    datafl.writefile(flname=outvtkfl, fltype='VTK')

.. parsed-literal::

    The datafile can be saved as a VTK file: oilsands_out.vtk

A small screen capture from paraview of the oil sands data points

.. code:: python

    from IPython.display import Image
    datavtk = Image(filename='oilsands_vtk.png')
    datavtk




.. image:: ./figures/oilsands_vtk.png

.. _classes:

Python Classes
--------------

Class objects can be viewed as a container for a structured set of variables and functions. As a
result of the controlled structure where variables containing data or parameters are saved in
specific format, to a specific variable, functions can be used in a semi-automatic fashion. As such,
work flows can be greatly simplified by using classes!

Within the code of classes, variables stored within the class are initialized as
``self.<variable>``. The ``self`` name space tells python that the variable should remain accessible
to the user and any other functions within the class. These ``self`` variables stored within the
class can be any permissible python object; which includes but is not limited to: string, list,
dictionary, pandas dataframe, float, integer, ect.

As an example, we've already started a :class:`gs.DataFile <pygeostat.data.data.DataFile>` class above with the
oilsands.dat training data set, saved as ``datafl``.

On the user-end, to call the data stored within the ``datafl`` variogram class, you would replace
``self`` with ``data``. For example, let check out the the first few lines that were loaded from
the data file:

.. code:: python

    datafl.data.head()




.. raw:: html

    <div>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Drillhole Number</th>
          <th>East</th>
          <th>North</th>
          <th>Elevation</th>
          <th>Bitumen</th>
          <th>Fines</th>
          <th>Chlorides</th>
          <th>Facies Code</th>
          <th>Bitumen above 7</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>257.5</td>
          <td>7.378</td>
          <td>28.784</td>
          <td>-9</td>
          <td>-9</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>254.5</td>
          <td>9.176</td>
          <td>22.897</td>
          <td>-9</td>
          <td>-9</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>251.5</td>
          <td>11.543</td>
          <td>15.144</td>
          <td>-9</td>
          <td>-9</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>248.5</td>
          <td>6.808</td>
          <td>30.598</td>
          <td>-9</td>
          <td>-9</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1245</td>
          <td>10687.09</td>
          <td>245.5</td>
          <td>10.657</td>
          <td>18.011</td>
          <td>-9</td>
          <td>-9</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>

|

Within the source code for :class:`gs.DataFile <pygeostat.data.data.DataFile>`, the
column IDs for the x, y, and z data are saved within the class as they
were specified when it was created. To view what they are you can call
them as:

.. code:: python

    print('The x data column ID is: ', datafl.x)
    print('The y data column ID is: ', datafl.y)
    print('The z data column ID is: ', datafl.z)


.. parsed-literal::

    The x data column ID is:  East
    The y data column ID is:  North
    The z data column ID is:  Elevation

Much like variables are stored within the class, so are functions unique to the class. Meaning they
are only accessible by the class object. So like you would call a stored variable, you call
functions as an extension of the variogram class name space. For example, when using a
:class:`gs.Variogram <pygeostat.variograms.variograms.Variogram>`, you can calculate the experimental variograms by calling:

.. code:: python

    vario.varcalc()


Within pygestat there are numerous classes set-up. These include:

  * :class:`gs.DataFile <pygeostat.data.data.DataFile>`
  * :class:`gs.GridDef <pygeostat.data.grid_definition.GridDef>`
  * :class:`gs.Variogram <pygeostat.variograms.variograms.Variogram>`
  * :class:`gs.VariogramModel <pygeostat.variograms.variograms.VariogramModel>`
  * :class:`gs.Program <pygeostat.programs.programs.Program>`

Using python + pygeostat for GSLIB scripting
--------------------------------------------

Python (and pygeostat to make things easier) can be used for scripting
together GSLIB and CCG programs. Normally we would use bash for scripts,
but python is very useful for complicated scripts, particularly for
visualization and keeping track of numerous domains, variables and data
files. Python can also be used to call bash scripts or execute shell
code (and vice versa).

The standard method for running a GSLIB program with python + pygeostat
is to create a GSLIBProgram class for each program that will be used. We
can create a string with variables (such as {datafl}) which can then be
interpolated. A small example:

.. code:: python

    histpltpar = '''                  Parameters for HISTPLT
                      **********************

    START OF PARAMETERS:
    {datafl}          -file with data
    {varcol}   0                        -   columns for variable and weight
    -1.0     1.0e21              -   trimming limits
    {outfl}                   -file for PostScript output
     0.0      -20.0               -attribute minimum and maximum
    -1.0                         -frequency maximum (<0 for automatic)
    20                           -number of classes
    0                            -0=arithmetic, 1=log scaling
    0                            -0=frequency,  1=cumulative histogram
    0                            -   number of cum. quantiles (<0 for all)
    3                            -number of decimal places (<0 for auto.)
    {varname}                                                    -title
    1.5                          -positioning of stats (L to R: -1 to 1)
    -1.1e21                      -reference value for box plot
    '''

    histplt = gs.Program(program='histplt', parfile='histplt.par')

    print('Basic operations like calling programs with string interpolation is fast')
    print('A convenience "GSCOL" function is provided for getting the column given a name')
    histplt.run(parstr=histpltpar.format(datafl=datafl.flname,varcol=datafl.gscol('Bitumen'),
        varname='Bitumen',outfl='histplt_bitumen.ps'))


.. parsed-literal::

    Basic operations like calling programs with string interpolation is fast
    A convenience "GSCOL" function is provided for getting the column given a name
    Calling:  ['histplt', 'histplt.par']

    HISTPLT Version: 2.905

     data file = ../../examples/data/oilsands.dat
     columns =           5           0
     trimming  limits =       -1.000000    1.000000E+21
     output file = histplt_bitumen.ps
     attribute limits =    0.000000E+00      -20.000000
     frequency  limit =       -1.000000
     number of classes =          20
     log scale option =           0
     cumulative frequency option =           0
     number of decimal places =           3
     title = Bitumen
     position of stats =        1.500000
     reference value =   -1.100000E+21

    There are     5808 data with:
      mean value          =      7.70886
      median value        =      7.48000
      standard deviation  =      5.13627
      minimum and maximum =       .00000    18.42800


    HISTPLT Version: 2.905 Finished

    Stop - Program terminated.




Parameters can also be saved in a dictionary and used that way. In
addition, looping over multiple variables is easy - they can simply be
referred to by name.

.. code:: python

    print('Parameters can also be saved in a dictionary and used that way')
    mypars = {'datafl':datafl.flname,
                'varcol':datafl.gscol('Fines'),
                'varname':'Fines',
                'outfl':'histplt_fines.ps'}
    histplt.run(parstr=histpltpar.format(**mypars))

    print('Looping over multiple variables is easy with this approach')
    for variable in ['Bitumen','Fines']:
        mypars = {'datafl':datafl.flname,
                'varcol':datafl.gscol(variable),
                'varname':variable,
                'outfl':'histplt_'+variable+'.ps'}
        histplt.run(parstr=histpltpar.format(**mypars))


.. parsed-literal::

    Parameters can also be saved in a dictionary and used that way
    Calling:  ['histplt', 'histplt.par']

    HISTPLT Version: 2.905

     data file = ../../examples/data/oilsands.dat
     columns =           6           0
     trimming  limits =       -1.000000    1.000000E+21
     output file = histplt_fines.ps
     attribute limits =    0.000000E+00      -20.000000
     frequency  limit =       -1.000000
     number of classes =          20
     log scale option =           0
     cumulative frequency option =           0
     number of decimal places =           3
     title = Fines
     position of stats =        1.500000
     reference value =   -1.100000E+21

    There are     5808 data with:
      mean value          =     28.70733
      median value        =     24.45354
      standard deviation  =     21.24525
      minimum and maximum =       .86100    86.77700


    HISTPLT Version: 2.905 Finished

    Stop - Program terminated.


    Looping over multiple variables is easy with this approach
    Calling:  ['histplt', 'histplt.par']

    HISTPLT Version: 2.905

     data file = ../../examples/data/oilsands.dat
     columns =           5           0
     trimming  limits =       -1.000000    1.000000E+21
     output file = histplt_Bitumen.ps
     attribute limits =    0.000000E+00      -20.000000
     frequency  limit =       -1.000000
     number of classes =          20
     log scale option =           0
     cumulative frequency option =           0
     number of decimal places =           3
     title = Bitumen
     position of stats =        1.500000
     reference value =   -1.100000E+21

    There are     5808 data with:
      mean value          =      7.70886
      median value        =      7.48000
      standard deviation  =      5.13627
      minimum and maximum =       .00000    18.42800


    HISTPLT Version: 2.905 Finished

    Stop - Program terminated.


    Calling:  ['histplt', 'histplt.par']

    HISTPLT Version: 2.905

     data file = ../../examples/data/oilsands.dat
     columns =           6           0
     trimming  limits =       -1.000000    1.000000E+21
     output file = histplt_Fines.ps
     attribute limits =    0.000000E+00      -20.000000
     frequency  limit =       -1.000000
     number of classes =          20
     log scale option =           0
     cumulative frequency option =           0
     number of decimal places =           3
     title = Fines
     position of stats =        1.500000
     reference value =   -1.100000E+21

    There are     5808 data with:
      mean value          =     28.70733
      median value        =     24.45354
      standard deviation  =     21.24525
      minimum and maximum =       .86100    86.77700


    HISTPLT Version: 2.905 Finished

    Stop - Program terminated.




Grid definitions can be used the same way as other parameters, ie: a
string that we interpolate into parameter files, but saving it as a
pygeostat GridDef class helps with saving out the file as a VTK file for
visualization with paraview. An example using the grid definition class
and calling kt3dn:

.. code:: python

    gridstr = '''35 500.  100.      -nx, xmin, xsize
          60  5000.  100.      -ny, ymin, ysize
          15   145.  10.      -nz, zmin, zsize'''

    kt3dnpar = '''                 Parameters for KT3DN
                     ********************

    START OF PARAMETERS:
    ../../examples/data/oilsands.dat
    {cols}  0    -columns for DH,X,Y,Z,var,sec var
    -1.0e21   1.0e21       -   trimming limits
    0                            -option: 0=grid, 1=cross, 2=jackknife
    NOFILE                       -file with jackknife data
    1   2   0    5    0          -   columns for X,Y,Z,vr and sec var
    0    100         -debugging level: 0,3,5,10; max data for GSKV
    temp
    {outfl}
    {griddef}
    4   4   4         -x,y and z block discretization
    2    40              -min, max data for kriging
    0      0      -max per octant, max per drillhole (0-> not used)
    4000.0  4000.0  4000.0     -maximum search radii
     0.0   0.0   0.0                 -angles for search ellipsoid
    1   7.71   0.6   -0=SK,1=OK,2=LVM(resid),3=LVM((1-w)*m(u))),4=colo,5=exdrift, skmean, corr
    0 0 0 0 0 0 0 0 0            -drift: x,y,z,xx,yy,zz,xy,xz,zy
    0                            -0, variable; 1, estimate trend
    NOFILE                       -gridded file with drift/mean
    4                            -  column number in gridded file
    2    0.08                     -nst, nugget effect
    2    0.25  40.0   0.0   0.0    -it,cc,ang1,ang2,ang3
            400.0  300.0  25.0    -a_hmax, a_hmin, a_vert
    2    0.67  40.0   0.0   0.0    -it,cc,ang1,ang2,ang3
          8000.0 4500.0  30.0    -a_hmax, a_hmin, a_vert
    '''

    print('For grids it is useful to save the grid definition which can be loaded from a string or specified as nx=,ny=,...')
    griddef = gs.GridDef(grid_str=gridstr)
    print(str(griddef))
    print('Values can be accessed directly using this way, for example nx =',griddef.nx)

    print('This can be used for scripting, for example with kt3dn:')
    kt3dn = gs.Program(program='kt3dn', parstr=kt3dnpar, parfile='kt3dn.par')
    kt3dnpars ={'datafl':datafl.flname,
                'griddef':str(griddef),
                'outfl':'kt3dn.out',
                'cols':datafl.gscol([datafl.dh,datafl.x,datafl.y,datafl.z,'Bitumen'],string=True)}

    print('Note that the string representation of columns can be returned using the GSCOL function for a list of variables:',
          kt3dnpars['cols'])

    kt3dn.run(parstr=kt3dn.parstr.format(**kt3dnpars))


.. parsed-literal::

    For grids it is useful to save the grid definition which can be loaded from a string or specified as nx=,ny=,...
    35 500.0 100.0
    60 5000.0 100.0
    15 145.0 10.0
    Values can be accessed directly using this way, for example nx = 35
    This can be used for scripting, for example with kt3dn:
    Note that the string representation of columns can be returned using the GSCOL function for a list of variables: 1 2 3 4 5
    Calling:  ['kt3dn', 'kt3dn.par']

     KT3DN Version: 4.202

      data file = ../../examples/data/oilsands.dat
      columns =            1           2           3           4           5
               0
      trimming limits =  -1.0000000E+21  1.0000000E+21
      kriging option =            0
      jackknife data file = NOFILE
      columns =            1           2           0           5           0
      debugging level =            0
      debugging file = temp
      output file = kt3dn.out
      nx, xmn, xsiz =           35   500.0000       100.0000
      ny, ymn, ysiz =           60   5000.000       100.0000
      nz, zmn, zsiz =           15   145.0000       10.00000
      block discretization:           4           4           4
      ndmin,ndmax =            2          40
      max per octant =            0
      max per drillhole =            0
      search radii =    4000.000       4000.000       4000.000
      search anisotropy angles =   0.0000000E+00  0.0000000E+00  0.0000000E+00
      ktype, skmean =           1   7.710000
      drift terms =            0           0           0           0           0
               0           0           0           0
      itrend =            0
      external drift file = NOFILE
      variable in external drift file =            4
      nst, c0 =            2  7.9999998E-02
      it,cc,ang[1,2,3];            2  0.2500000       40.00000      0.0000000E+00
      0.0000000E+00
      a1 a2 a3:    400.0000       300.0000       25.00000
      it,cc,ang[1,2,3];            2  0.6700000       40.00000      0.0000000E+00
      0.0000000E+00
      a1 a2 a3:    8000.000       4500.000       30.00000
     Checking the data set for duplicates
      No duplicates found
     Data for KT3D: Variable number            5
       Number   =         5808
       Average  =    7.708860
       Variance =    26.38120
     Presorting the data along an arbitrary vector
     Data was presorted with angles:   12.50000       12.50000       12.50000
     Setting up rotation matrices for variogram and search
     Setting up super block search strategy

     Working on the kriging
       currently on estimate      3150
       currently on estimate      6300
       currently on estimate      9450
       currently on estimate     12600
       currently on estimate     15750
       currently on estimate     18900
       currently on estimate     22050
       currently on estimate     25200
       currently on estimate     28350
       currently on estimate     31500

     KT3DN Version: 4.202 Finished




Saving gridded GSLIB files in VTK format
----------------------------------------

Gridded files can be saved out in VTK format after loading up using
pygeostat making sure to specify a grid definition.

.. code:: python

    krigedfl = gs.DataFile(flname='kt3dn.out',readfl='True',dftype='grid',
                           griddef=griddef)

After that just specify to save as VTK, same as before:  kt3dn_out.vtk

.. code:: python

    krigedfl.writefile(flname='kt3dn_out.vtk',fltype='vtk')

A small screen capture from paraview of the kriged grid

.. code:: python

    from IPython.display import Image
    krigedvtk = Image(filename='kt3dn_vtk.png')
    krigedvtk

.. image:: ./figures/kt3dn_vtk.png



Parallel GSLIB scripting with pygeostat
---------------------------------------

Running a GSLIB program in parallel using pygeostat is straight forward
using a list of parameter dictionaries. Suppose we wanted histograms of
bitumen, fines and chlorides for this data set. Of course, it would be
faster to just run these one after another but this quick task will be
used to demonstrate how parallelization with pygeostat can be used. A
list of argument dictionaries is created - we specify a different
parameter file name (for example histplt\_bitumen.par,
histplt\_chlorides.par) for each variable and different parameter
strings (string representations of parameter files).

There is another consideration. If GSLIB programs attempt to
simultaneously open and read a file, only one process will get access to
the file causing the other program calls to fail. We can get around this
by specifying a "testfilename" as a parameter. Before executing each
parallel process, pygeostat will attempt to open and read the file. If
the file is busy, pygeostat will wait a short period to time and try
again. This works in most cases, but will not work for massive data
files, or programs with long run times which open a data file many times
such as trans\_trend.

.. code:: python

    # Create an empty list of GSLIB calling parameters, callpars
    callpars = []
    # For each variable we want to run in parallel, assemble a dictionary of parameters and append to callpars
    for variable in ['Bitumen','Fines','Chlorides']:
        # Establish the parameter file for this variable
        mypars = {'datafl':datafl.flname,
                'varcol':datafl.gscol(variable),
                'varname':variable,
                'outfl':'histplt_'+variable+'.ps'}
        # Assemble the arguments for the GSLIB call and add the arguments to the list of calls
        callpars.append({'parstr':histpltpar.format(**mypars),
                'parfile':'histplt_'+variable+'.par',
                'testfilename':datafl.flname})

    # Now we can run in parallel:
    histplt = gs.Program(program='histplt', parfile='histplt.par')
    gs.runparallel(histplt, callpars)


.. parsed-literal::

    Creating parallel processes
    Pool assembled, asynchronously processing
    Asynchronous execution finished.


Desurveying and compositing
---------------------------

There is limited functionality in pygeostat to perform linear
desurveying and compositing of raw data

.. code:: python

    # Read in the collars
    collarfl = gs.DataFile(flname='./collars.csv',readfl=True,x='Easting',
                                     y='Northing',z='Elevation',dh='Drillhole')
    print('Collars:\n',collarfl.data.head())
    # Surveys
    surveyfl = gs.DataFile(flname='./downhole.csv',readfl=True,dh='Drillhole')
    print('\nSurveys:\n',surveyfl.data.head())
    # Raw data file
    rawdatafl = gs.DataFile(flname='./rawdata.csv',readfl=True,dh='Drillhole',
                                   ifrom='From',ito='To')
    print('\nRaw data file:\n',rawdatafl.data.head())

The raw data will first be composited to 15 meter lengths using the most
frequent category for rock type and a linear average for grade.

.. code:: python

    comps = gs.utils.desurvey.set_comps(rawdatafl,15.0)
    print('Composite lengths to use:\n',comps)
    upscaled = gs.utils.desurvey.get_comps(comps,rawdatafl,{'Rock Type':'categorical',
                                                   'Grade':'continuous'})
    rawdatafl.origdata = rawdatafl.data.copy() # Save a copy of the original data just in case...
    rawdatafl.data = upscaled # Make the upscaled data the primary data set
    print('\nComposited data:\n',upscaled)


.. parsed-literal::

    Composite lengths to use:
        Drillhole  From   To
    0      DH101     0   15
    1      DH101    15   30
    2      DH101    30   45
    3      DH101    45   60
    4      DH101    60   75
    5      DH101    75   90
    6      DH101    90  105
    7      DH102     0   15
    8      DH102    15   30
    9      DH102    30   45
    10     DH102    45   60
    11     DH102    60   75
    12     DH102    75   90
    13     DH102    90  105
    14     DH102   105  120
    15     DH102   120  135
    16     DH102   135  150

    Composited data:
            Grade  Rock Type Drillhole  From   To
    0   5.370400          2     DH101     0   15
    1   4.765147          1     DH101    15   30
    2   5.151280          1     DH101    30   45
    3   5.127600          2     DH101    45   60
    4   4.936727          3     DH101    60   75
    5   4.376967          3     DH101    75   90
    6   4.437598          3     DH101    90  105
    7   4.876240          4     DH102     0   15
    8   4.926200          4     DH102    15   30
    9   5.072747          4     DH102    30   45
    10  4.294660          4     DH102    45   60
    11  5.264047          4     DH102    60   75
    12  5.272420          3     DH102    75   90
    13  5.004067          4     DH102    90  105
    14  4.906380          2     DH102   105  120
    15  4.464113          4     DH102   120  135
    16  4.276362          3     DH102   135  150


Now these composites will be desurveyed and X,Y,Z values assigned to the
midpoints of the composites.

.. code:: python

    drillholes = gs.utils.set_desurvey(collarfl,surveyfl,'Along','Azimuth','Inclination')
    print('Drillhole objects are created which can be used to desurvey:')
    print(drillholes['DH101'])
    print(drillholes['DH102'])


.. parsed-literal::

    Drillhole objects are created which can be used to desurvey:
    Drillhole DH101 collared at (1867.0,12022.0,342.0)
    Drillhole DH102 collared at (1927.0,12880.0,335.0)


.. code:: python

    gs.utils.get_desurvey(rawdatafl,drillholes,x=collarfl.x,y=collarfl.y,z=collarfl.z)
    print(rawdatafl.data.head())


.. parsed-literal::

          Grade  Rock Type Drillhole  From  To      Easting      Northing  \
    0  5.370400          2     DH101     0  15  1867.715534  12023.239341
    1  4.765147          1     DH101    15  30  1869.369505  12025.832072
    2  5.151280          1     DH101    30  45  1871.068058  12028.447613
    3  5.127600          2     DH101    45  60  1872.766610  12031.063155
    4  4.936727          3     DH101    60  75  1874.442209  12033.693319

        Elevation
    0  334.637796
    1  319.956883
    2  305.284669
    3  290.612455
    4  275.940241


The data file has been desurveyed, last step is to convert the
alphanumeric drill holes to numeric values and save it out for working
with GSLIB programs.

.. code:: python

    # Convert to numeric drill hole IDs
    dhdict = {'DH101':101,
              'DH102':102}
    rawdatafl.applydict('Drillhole','DH',dhdict)
    print(rawdatafl.data.head())
    # Save out as VTK
    rawdatafl.writefile(flname='desurveyed_data.vtk',variables=['Rock Type','Grade','DH'],fltype='vtk')

.. parsed-literal::

          Grade  Rock Type Drillhole  From  To      Easting      Northing  \
    0  5.370400          2     DH101     0  15  1867.715534  12023.239341
    1  4.765147          1     DH101    15  30  1869.369505  12025.832072
    2  5.151280          1     DH101    30  45  1871.068058  12028.447613
    3  5.127600          2     DH101    45  60  1872.766610  12031.063155
    4  4.936727          3     DH101    60  75  1874.442209  12033.693319

        Elevation   DH
    0  334.637796  101
    1  319.956883  101
    2  305.284669  101
    3  290.612455  101
    4  275.940241  101


A small screen capture of the desurveyed data

.. code:: python

    from IPython.display import Image
    desurveyedvtk = Image(filename='desurveyed_data_vtk.png')
    desurveyedvtk

.. image:: ./figures/desurveyed_data_vtk.png



Writing out a subset of data values
-----------------------------------

Write out just a subset of the data such as grades for a specific
category.

.. code:: python

    print('\nThe bitumen and fines content for fines greater than 40 (first few rows)')
    small_dataset = datafl.data[['Bitumen','Fines']][datafl.data['Fines'] > 40.0]
    print(small_dataset.head())
    datafl.writefile(flname='bit_and_fines_gr_40.dat',fltype='GSLIB',data=small_dataset)


.. parsed-literal::


    The bitumen and fines content for fines greater than 40 (first few rows)
        Bitumen   Fines
    54    2.900  43.500
    73    2.465  44.838
    74    0.000  52.900
    75    0.000  52.900
    76    0.000  52.900


Summary
-------

This documentation has demonstrated some of the capabilities of Python +
pygeostat. As with all of the tasks we do - there are many ways to
accomplish the same thing. Advantages to using Python are that it is a
powerful, high level language with hundreds of packages for everything
from machine learning algorithms to plotting and a very large userbase
which means answers to any question about the language only a
google-away. Everything presented here is also 100% free and open
source, unlike many other tools such as Matlab and commercial software.
There are other advantages for us: Python can dynamically link with C
and Fortran code, manage parallel processes across many computers, send
you an email when your script is completed... The primary disadvantage
is that it is not widely used within CCG so there is limited support and
CCG-specific reference material available. This is a significant
disadvantage. The depth of reference material and example scripts which
are available for Bash and Matlab within CCG is substantial. Python
provides just one more alternative!