.. _python:
.. public
Introduction to Python and pygeostat
====================================
Introduction
------------
This documentation gives a brief overview of python for researchers and
affiliates of CCG. This documentation is *not* an introduction to python
in general, but is designed to help with:
- deciding whether using python would be valuable for your work
- applying python to geostatistics problems using CCG and GSLIB
software
- the basics of using python and pygeostat to analyze data, visualize
results and interface with Paraview
Python is an interpreted, high level general purpose programming and
scripting language which is very well suited to many of the common
scripting tasks where bash or matlab would typically be used, and many
tasks where an entire Fortran program might have previously been used.
The huge collection of libraries for python covering machine learning,
statistics, data management and optimization make it very useful for
geostatistical workflows. There are many excellent tutorials for Python
3 available (http://docs.python.org/3/tutorial/introduction.html) which
do a much better job introducing the language than I would.
Pygeostat is a python module using fortran libraries which was written
to facilitate working with GeoEAS data files (GSLIB format), CSV data
files, and GSLIB and CCG software. The integrated fortran libraries
making pygeostat extremely quick to use for many applications, such as
variogram calculation and modeling requiring no parameter files. For
other GSLIB and CCG programs, generic interface routines are provided to
interface with the programs using standard string interpolation
techniques familiar to anyone who has scripted with Bash.
Loading Modules
---------------
The following code section is normally used as a header for scripts
using pygeostat. pygeostat is imported as well as the python math
package numpy and data analysis package pandas. The standard python
plotting library matplotlib is imported. The %inline option is used to
include figures in this documentation rather than saving images
separately.
.. code:: python
'''pygeostat_examples.ipynb: Demonstrate usage of pygeostat'''
__author__ = 'Jared Deutsch'
__date__ = '2015-11-04'
__version__ = '4.0'
import pygeostat as gs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Super quick introduction to python for CCG researchers
------------------------------------------------------
An extremely quick introduction to python is given here. Python is a
high level interpreted programming language. Variable assignment and
math is simple. The print operator prints to the screen. Whitespace (in
the form of 4 spaces) is used for blocks such as if statements, or
loops.
.. code:: python
print('Hello World!')
print('Math is easy and works as expected')
a = 1
b = 2
print('a =',a,'b =',b)
print('a + b =',a+b)
print('a/b+0.2 =',a/b+0.2)
if (1>2):
print('This statement is unreachable')
else:
print('Note the 4 spaces for block statements like if and loops')
print('Loops can use a list, or a range among other things!')
for i in [1,2,10]:
print(i)
print('Note that everything starts from 0 in python unlike Fortran')
for i in range(3):
print(i)
mystr = 'Hello World!'
print('Lots of helpful options are built in such as string splitting:',mystr.split())
.. parsed-literal::
Hello World!
Math is easy and works as expected
a = 1 b = 2
a + b = 3
a/b+0.2 = 0.7
Note the 4 spaces for block statements like if and loops
Loops can use a list, or a range among other things!
1
2
10
Note that everything starts from 0 in python unlike Fortran
0
1
2
Lots of helpful options are built in such as string splitting: ['Hello', 'World!']
Importing GSLIB files to python with pygeostat
----------------------------------------------
We will use the oilsands.dat training data set for this example. A
pygeostat DataFile object is used to store the data, and relevant
parameters such as the names of x, y and z coordinates. The DataFile
class stores data by default in a Pandas DataFrame called 'data'. Pandas
provides the 'head()' function to print the first few rows of a
DataFrame - we can see that the oilsands data is read in to this object.
.. code:: python
datafl = gs.DataFile(flname='../../examples/data/oilsands.dat',
dh='Drillhole Number',x='East',y='North',z='Elevation')
datafl.data.head()
.. raw:: html
|
Drillhole Number |
East |
North |
Elevation |
Bitumen |
Fines |
Chlorides |
Facies Code |
0 |
2 |
1245 |
10687.09 |
257.5 |
7.378 |
28.784 |
-9 |
-9 |
1 |
2 |
1245 |
10687.09 |
254.5 |
9.176 |
22.897 |
-9 |
-9 |
2 |
2 |
1245 |
10687.09 |
251.5 |
11.543 |
15.144 |
-9 |
-9 |
3 |
2 |
1245 |
10687.09 |
248.5 |
6.808 |
30.598 |
-9 |
-9 |
4 |
2 |
1245 |
10687.09 |
245.5 |
10.657 |
18.011 |
-9 |
-9 |
|
There are a number of special variables in the DataFile class including
the drillhole identifier (dh), and coordinates x, y and z. These can be
accessed by name, or number. Recall that as python uses zero-ordering
(the same as C and C++), column 1 in python refers to column 2 in
GSLIB/Fortran one-ordering.
.. code:: python
print('\nFirst couple rows of Z:\n', datafl.data[datafl.z].head())
print('\nFirst couple rows of DH:\n', datafl.data['Drillhole Number'].head())
print('\nFirst couple rows of the second column: (column 2 if GSLIB/Fortran 1 ordered or 1'
' if Python/C/C++ 0 ordered)\n', datafl.data.icol(1).head())
.. parsed-literal::
First couple rows of Z:
0 257.5
1 254.5
2 251.5
3 248.5
4 245.5
Name: Elevation, dtype: float64
First couple rows of DH:
0 2
1 2
2 2
3 2
4 2
Name: Drillhole Number, dtype: int64
First couple rows of the second column: (column 2 if GSLIB/Fortran 1 ordered or 1 if Python/C/C++ 0 ordered)
0 1245
1 1245
2 1245
3 1245
4 1245
Name: East, dtype: float64
pygeostat provides a number of convenience functions which can be used.
Slicing is easy and unique categories can be identified.
.. code:: python
print('\nUnique categories can be identified, for example facies code:')
print(datafl.unique_cats('Facies Code'))
print('\nSlicing can be done as well, for example data in drill hole 3:')
print(datafl.data[datafl.data['Drillhole Number'] == 3])
print('\nThe bitumen and fines content for fines greater than 40 (first few rows)')
print(datafl.data[['Bitumen','Fines']][datafl.data['Fines'] > 40.0].head())
print('Creating new columns is simple, for example bitumen * fines - 200:')
datafl.data['Bitumen times Fines minus 200'] = datafl.data['Bitumen']*datafl.data['Fines']-200
print(datafl.data.head())
.. parsed-literal::
Unique categories can be identified, for example facies code:
[-9, 23, 30, 31, 40, 50, 60, 61, 69, 70]
Slicing can be done as well, for example data in drill hole 3:
Drillhole Number East North Elevation Bitumen Fines Chlorides \
18 3 1262.5 10821.3 242.5 11.378 15.659 -9
19 3 1262.5 10821.3 239.5 12.436 12.163 -9
20 3 1262.5 10821.3 236.5 8.900 23.700 -9
21 3 1262.5 10821.3 233.5 8.870 23.837 -9
22 3 1262.5 10821.3 230.5 9.733 21.057 -9
23 3 1262.5 10821.3 227.5 10.984 16.931 -9
24 3 1262.5 10821.3 224.5 11.113 16.549 -9
25 3 1262.5 10821.3 221.5 9.054 23.551 -9
26 3 1262.5 10821.3 218.5 15.029 5.716 -9
27 3 1262.5 10821.3 215.5 12.957 10.565 -9
28 3 1262.5 10821.3 212.5 13.589 8.537 -9
Facies Code
18 -9
19 -9
20 -9
21 -9
22 -9
23 -9
24 -9
25 -9
26 -9
27 -9
28 -9
The bitumen and fines content for fines greater than 40 (first few rows)
Bitumen Fines
54 2.900 43.500
73 2.465 44.838
74 0.000 52.900
75 0.000 52.900
76 0.000 52.900
Creating new columns is simple, for example bitumen * fines - 200:
Drillhole Number East North Elevation Bitumen Fines Chlorides \
0 2 1245 10687.09 257.5 7.378 28.784 -9
1 2 1245 10687.09 254.5 9.176 22.897 -9
2 2 1245 10687.09 251.5 11.543 15.144 -9
3 2 1245 10687.09 248.5 6.808 30.598 -9
4 2 1245 10687.09 245.5 10.657 18.011 -9
Facies Code Bitumen times Fines minus 200
0 -9 12.368352
1 -9 10.102872
2 -9 -25.192808
3 -9 8.311184
4 -9 -8.056773
Getting basic statistics from the data set can be done using the
describe command - here just bitumen and fines are summarized:
.. code:: python
datafl.data[['Bitumen','Fines']].describe()
.. raw:: html
|
Bitumen |
Fines |
count |
5808.000000 |
5808.000000 |
mean |
7.708852 |
28.707298 |
std |
5.136709 |
21.247085 |
min |
0.000000 |
0.861000 |
25% |
2.877750 |
10.166000 |
50% |
7.480000 |
24.453000 |
75% |
12.666000 |
42.823250 |
max |
18.428000 |
86.777000 |
|
Correlation matrix:
.. code:: python
datafl.data[['Bitumen','Fines']].corr()
.. raw:: html
|
Bitumen |
Fines |
Bitumen |
1.000000 |
-0.843863 |
Fines |
-0.843863 |
1.000000 |
|
Conditionally setting data in Pandas requires indexing the location in
the array:
.. code:: python
# Set all facies -9 values to -999 values
datafl.data.loc[datafl.data['Facies Code'] == -9,'Facies Code'] = -999
datafl.data.head()
.. raw:: html
|
Drillhole Number |
East |
North |
Elevation |
Bitumen |
Fines |
Chlorides |
Facies Code |
Bitumen above 7 |
0 |
2 |
1245 |
10687.09 |
257.5 |
7.378 |
28.784 |
-9 |
-999 |
1 |
1 |
2 |
1245 |
10687.09 |
254.5 |
9.176 |
22.897 |
-9 |
-999 |
1 |
2 |
2 |
1245 |
10687.09 |
251.5 |
11.543 |
15.144 |
-9 |
-999 |
1 |
3 |
2 |
1245 |
10687.09 |
248.5 |
6.808 |
30.598 |
-9 |
-999 |
0 |
4 |
2 |
1245 |
10687.09 |
245.5 |
10.657 |
18.011 |
-9 |
-999 |
1 |
|
Saving data to CSV and VTK file formats
---------------------------------------
Working with data is only useful if it can be saved. Here, the data is
re-imported to get rid of the changes to the data set.
.. code:: python
datafl = gs.DataFile(flname='../../examples/data/oilsands.dat', dh='Drillhole Number', x='East',
y='North', z='Elevation')
We can create an indicator column which is 1 where the bitumen is above
7% and 0 otherwise.
.. code:: python
datafl.data['Bitumen above 7'] = np.zeros(len(datafl.data['Bitumen']))
datafl.data.loc[datafl.data['Bitumen'] >= 7.0,'Bitumen above 7'] = 1
datafl.data[['Bitumen','Bitumen above 7']].head()
.. raw:: html
|
Bitumen |
Bitumen above 7 |
0 |
7.378 |
1 |
1 |
9.176 |
1 |
2 |
11.543 |
1 |
3 |
6.808 |
0 |
4 |
10.657 |
1 |
|
This file can be saved as a GSLIB file or as a CSV file which can be
opened by Excel.
.. code:: python
outgslibfl = 'oilsands_out.dat'
outcsvfl = 'oilsands_out.csv'
datafl.writefile(flname=outgslibfl, fltype='GSLIB')
.. parsed-literal::
The datafile can be saved as a GSLIB file: oilsands_out.dat
Or as a CSV file which can be opened by Excel A subset of columns, can
be saved with any of these modes - just saving X, Y and Z for example:
oilsands\_out.csv
.. code:: python
datafl.writefile(flname=outcsvfl, variables=['East','North','Elevation'], fltype='CSV')
A VTK file can also be created for visualization with Paraview. This
requires the DataFile class to have the X, Y and Z coordinates (.x, .y
and .z attributes)
.. code:: python
outvtkfl = 'oilsands_out.vtk'
datafl.writefile(flname=outvtkfl, fltype='VTK')
.. parsed-literal::
The datafile can be saved as a VTK file: oilsands_out.vtk
A small screen capture from paraview of the oil sands data points
.. code:: python
from IPython.display import Image
datavtk = Image(filename='oilsands_vtk.png')
datavtk
.. image:: ./figures/oilsands_vtk.png
.. _classes:
Python Classes
--------------
Class objects can be viewed as a container for a structured set of variables and functions. As a
result of the controlled structure where variables containing data or parameters are saved in
specific format, to a specific variable, functions can be used in a semi-automatic fashion. As such,
work flows can be greatly simplified by using classes!
Within the code of classes, variables stored within the class are initialized as
``self.``. The ``self`` name space tells python that the variable should remain accessible
to the user and any other functions within the class. These ``self`` variables stored within the
class can be any permissible python object; which includes but is not limited to: string, list,
dictionary, pandas dataframe, float, integer, ect.
As an example, we've already started a :class:`gs.DataFile ` class above with the
oilsands.dat training data set, saved as ``datafl``.
On the user-end, to call the data stored within the ``datafl`` variogram class, you would replace
``self`` with ``data``. For example, let check out the the first few lines that were loaded from
the data file:
.. code:: python
datafl.data.head()
.. raw:: html
|
Drillhole Number |
East |
North |
Elevation |
Bitumen |
Fines |
Chlorides |
Facies Code |
Bitumen above 7 |
0 |
2 |
1245 |
10687.09 |
257.5 |
7.378 |
28.784 |
-9 |
-9 |
1 |
1 |
2 |
1245 |
10687.09 |
254.5 |
9.176 |
22.897 |
-9 |
-9 |
1 |
2 |
2 |
1245 |
10687.09 |
251.5 |
11.543 |
15.144 |
-9 |
-9 |
1 |
3 |
2 |
1245 |
10687.09 |
248.5 |
6.808 |
30.598 |
-9 |
-9 |
0 |
4 |
2 |
1245 |
10687.09 |
245.5 |
10.657 |
18.011 |
-9 |
-9 |
1 |
|
Within the source code for :class:`gs.DataFile `, the
column IDs for the x, y, and z data are saved within the class as they
were specified when it was created. To view what they are you can call
them as:
.. code:: python
print('The x data column ID is: ', datafl.x)
print('The y data column ID is: ', datafl.y)
print('The z data column ID is: ', datafl.z)
.. parsed-literal::
The x data column ID is: East
The y data column ID is: North
The z data column ID is: Elevation
Much like variables are stored within the class, so are functions unique to the class. Meaning they
are only accessible by the class object. So like you would call a stored variable, you call
functions as an extension of the variogram class name space. For example, when using a
:class:`gs.Variogram `, you can calculate the experimental variograms by calling:
.. code:: python
vario.varcalc()
Within pygestat there are numerous classes set-up. These include:
* :class:`gs.DataFile `
* :class:`gs.GridDef `
* :class:`gs.Variogram `
* :class:`gs.VariogramModel `
* :class:`gs.Program `
Using python + pygeostat for GSLIB scripting
--------------------------------------------
Python (and pygeostat to make things easier) can be used for scripting
together GSLIB and CCG programs. Normally we would use bash for scripts,
but python is very useful for complicated scripts, particularly for
visualization and keeping track of numerous domains, variables and data
files. Python can also be used to call bash scripts or execute shell
code (and vice versa).
The standard method for running a GSLIB program with python + pygeostat
is to create a GSLIBProgram class for each program that will be used. We
can create a string with variables (such as {datafl}) which can then be
interpolated. A small example:
.. code:: python
histpltpar = ''' Parameters for HISTPLT
**********************
START OF PARAMETERS:
{datafl} -file with data
{varcol} 0 - columns for variable and weight
-1.0 1.0e21 - trimming limits
{outfl} -file for PostScript output
0.0 -20.0 -attribute minimum and maximum
-1.0 -frequency maximum (<0 for automatic)
20 -number of classes
0 -0=arithmetic, 1=log scaling
0 -0=frequency, 1=cumulative histogram
0 - number of cum. quantiles (<0 for all)
3 -number of decimal places (<0 for auto.)
{varname} -title
1.5 -positioning of stats (L to R: -1 to 1)
-1.1e21 -reference value for box plot
'''
histplt = gs.Program(program='histplt', parfile='histplt.par')
print('Basic operations like calling programs with string interpolation is fast')
print('A convenience "GSCOL" function is provided for getting the column given a name')
histplt.run(parstr=histpltpar.format(datafl=datafl.flname,varcol=datafl.gscol('Bitumen'),
varname='Bitumen',outfl='histplt_bitumen.ps'))
.. parsed-literal::
Basic operations like calling programs with string interpolation is fast
A convenience "GSCOL" function is provided for getting the column given a name
Calling: ['histplt', 'histplt.par']
HISTPLT Version: 2.905
data file = ../../examples/data/oilsands.dat
columns = 5 0
trimming limits = -1.000000 1.000000E+21
output file = histplt_bitumen.ps
attribute limits = 0.000000E+00 -20.000000
frequency limit = -1.000000
number of classes = 20
log scale option = 0
cumulative frequency option = 0
number of decimal places = 3
title = Bitumen
position of stats = 1.500000
reference value = -1.100000E+21
There are 5808 data with:
mean value = 7.70886
median value = 7.48000
standard deviation = 5.13627
minimum and maximum = .00000 18.42800
HISTPLT Version: 2.905 Finished
Stop - Program terminated.
Parameters can also be saved in a dictionary and used that way. In
addition, looping over multiple variables is easy - they can simply be
referred to by name.
.. code:: python
print('Parameters can also be saved in a dictionary and used that way')
mypars = {'datafl':datafl.flname,
'varcol':datafl.gscol('Fines'),
'varname':'Fines',
'outfl':'histplt_fines.ps'}
histplt.run(parstr=histpltpar.format(**mypars))
print('Looping over multiple variables is easy with this approach')
for variable in ['Bitumen','Fines']:
mypars = {'datafl':datafl.flname,
'varcol':datafl.gscol(variable),
'varname':variable,
'outfl':'histplt_'+variable+'.ps'}
histplt.run(parstr=histpltpar.format(**mypars))
.. parsed-literal::
Parameters can also be saved in a dictionary and used that way
Calling: ['histplt', 'histplt.par']
HISTPLT Version: 2.905
data file = ../../examples/data/oilsands.dat
columns = 6 0
trimming limits = -1.000000 1.000000E+21
output file = histplt_fines.ps
attribute limits = 0.000000E+00 -20.000000
frequency limit = -1.000000
number of classes = 20
log scale option = 0
cumulative frequency option = 0
number of decimal places = 3
title = Fines
position of stats = 1.500000
reference value = -1.100000E+21
There are 5808 data with:
mean value = 28.70733
median value = 24.45354
standard deviation = 21.24525
minimum and maximum = .86100 86.77700
HISTPLT Version: 2.905 Finished
Stop - Program terminated.
Looping over multiple variables is easy with this approach
Calling: ['histplt', 'histplt.par']
HISTPLT Version: 2.905
data file = ../../examples/data/oilsands.dat
columns = 5 0
trimming limits = -1.000000 1.000000E+21
output file = histplt_Bitumen.ps
attribute limits = 0.000000E+00 -20.000000
frequency limit = -1.000000
number of classes = 20
log scale option = 0
cumulative frequency option = 0
number of decimal places = 3
title = Bitumen
position of stats = 1.500000
reference value = -1.100000E+21
There are 5808 data with:
mean value = 7.70886
median value = 7.48000
standard deviation = 5.13627
minimum and maximum = .00000 18.42800
HISTPLT Version: 2.905 Finished
Stop - Program terminated.
Calling: ['histplt', 'histplt.par']
HISTPLT Version: 2.905
data file = ../../examples/data/oilsands.dat
columns = 6 0
trimming limits = -1.000000 1.000000E+21
output file = histplt_Fines.ps
attribute limits = 0.000000E+00 -20.000000
frequency limit = -1.000000
number of classes = 20
log scale option = 0
cumulative frequency option = 0
number of decimal places = 3
title = Fines
position of stats = 1.500000
reference value = -1.100000E+21
There are 5808 data with:
mean value = 28.70733
median value = 24.45354
standard deviation = 21.24525
minimum and maximum = .86100 86.77700
HISTPLT Version: 2.905 Finished
Stop - Program terminated.
Grid definitions can be used the same way as other parameters, ie: a
string that we interpolate into parameter files, but saving it as a
pygeostat GridDef class helps with saving out the file as a VTK file for
visualization with paraview. An example using the grid definition class
and calling kt3dn:
.. code:: python
gridstr = '''35 500. 100. -nx, xmin, xsize
60 5000. 100. -ny, ymin, ysize
15 145. 10. -nz, zmin, zsize'''
kt3dnpar = ''' Parameters for KT3DN
********************
START OF PARAMETERS:
../../examples/data/oilsands.dat
{cols} 0 -columns for DH,X,Y,Z,var,sec var
-1.0e21 1.0e21 - trimming limits
0 -option: 0=grid, 1=cross, 2=jackknife
NOFILE -file with jackknife data
1 2 0 5 0 - columns for X,Y,Z,vr and sec var
0 100 -debugging level: 0,3,5,10; max data for GSKV
temp
{outfl}
{griddef}
4 4 4 -x,y and z block discretization
2 40 -min, max data for kriging
0 0 -max per octant, max per drillhole (0-> not used)
4000.0 4000.0 4000.0 -maximum search radii
0.0 0.0 0.0 -angles for search ellipsoid
1 7.71 0.6 -0=SK,1=OK,2=LVM(resid),3=LVM((1-w)*m(u))),4=colo,5=exdrift, skmean, corr
0 0 0 0 0 0 0 0 0 -drift: x,y,z,xx,yy,zz,xy,xz,zy
0 -0, variable; 1, estimate trend
NOFILE -gridded file with drift/mean
4 - column number in gridded file
2 0.08 -nst, nugget effect
2 0.25 40.0 0.0 0.0 -it,cc,ang1,ang2,ang3
400.0 300.0 25.0 -a_hmax, a_hmin, a_vert
2 0.67 40.0 0.0 0.0 -it,cc,ang1,ang2,ang3
8000.0 4500.0 30.0 -a_hmax, a_hmin, a_vert
'''
print('For grids it is useful to save the grid definition which can be loaded from a string or specified as nx=,ny=,...')
griddef = gs.GridDef(grid_str=gridstr)
print(str(griddef))
print('Values can be accessed directly using this way, for example nx =',griddef.nx)
print('This can be used for scripting, for example with kt3dn:')
kt3dn = gs.Program(program='kt3dn', parstr=kt3dnpar, parfile='kt3dn.par')
kt3dnpars ={'datafl':datafl.flname,
'griddef':str(griddef),
'outfl':'kt3dn.out',
'cols':datafl.gscol([datafl.dh,datafl.x,datafl.y,datafl.z,'Bitumen'],string=True)}
print('Note that the string representation of columns can be returned using the GSCOL function for a list of variables:',
kt3dnpars['cols'])
kt3dn.run(parstr=kt3dn.parstr.format(**kt3dnpars))
.. parsed-literal::
For grids it is useful to save the grid definition which can be loaded from a string or specified as nx=,ny=,...
35 500.0 100.0
60 5000.0 100.0
15 145.0 10.0
Values can be accessed directly using this way, for example nx = 35
This can be used for scripting, for example with kt3dn:
Note that the string representation of columns can be returned using the GSCOL function for a list of variables: 1 2 3 4 5
Calling: ['kt3dn', 'kt3dn.par']
KT3DN Version: 4.202
data file = ../../examples/data/oilsands.dat
columns = 1 2 3 4 5
0
trimming limits = -1.0000000E+21 1.0000000E+21
kriging option = 0
jackknife data file = NOFILE
columns = 1 2 0 5 0
debugging level = 0
debugging file = temp
output file = kt3dn.out
nx, xmn, xsiz = 35 500.0000 100.0000
ny, ymn, ysiz = 60 5000.000 100.0000
nz, zmn, zsiz = 15 145.0000 10.00000
block discretization: 4 4 4
ndmin,ndmax = 2 40
max per octant = 0
max per drillhole = 0
search radii = 4000.000 4000.000 4000.000
search anisotropy angles = 0.0000000E+00 0.0000000E+00 0.0000000E+00
ktype, skmean = 1 7.710000
drift terms = 0 0 0 0 0
0 0 0 0
itrend = 0
external drift file = NOFILE
variable in external drift file = 4
nst, c0 = 2 7.9999998E-02
it,cc,ang[1,2,3]; 2 0.2500000 40.00000 0.0000000E+00
0.0000000E+00
a1 a2 a3: 400.0000 300.0000 25.00000
it,cc,ang[1,2,3]; 2 0.6700000 40.00000 0.0000000E+00
0.0000000E+00
a1 a2 a3: 8000.000 4500.000 30.00000
Checking the data set for duplicates
No duplicates found
Data for KT3D: Variable number 5
Number = 5808
Average = 7.708860
Variance = 26.38120
Presorting the data along an arbitrary vector
Data was presorted with angles: 12.50000 12.50000 12.50000
Setting up rotation matrices for variogram and search
Setting up super block search strategy
Working on the kriging
currently on estimate 3150
currently on estimate 6300
currently on estimate 9450
currently on estimate 12600
currently on estimate 15750
currently on estimate 18900
currently on estimate 22050
currently on estimate 25200
currently on estimate 28350
currently on estimate 31500
KT3DN Version: 4.202 Finished
Saving gridded GSLIB files in VTK format
----------------------------------------
Gridded files can be saved out in VTK format after loading up using
pygeostat making sure to specify a grid definition.
.. code:: python
krigedfl = gs.DataFile(flname='kt3dn.out',readfl='True',dftype='grid',
griddef=griddef)
After that just specify to save as VTK, same as before: kt3dn_out.vtk
.. code:: python
krigedfl.writefile(flname='kt3dn_out.vtk',fltype='vtk')
A small screen capture from paraview of the kriged grid
.. code:: python
from IPython.display import Image
krigedvtk = Image(filename='kt3dn_vtk.png')
krigedvtk
.. image:: ./figures/kt3dn_vtk.png
Parallel GSLIB scripting with pygeostat
---------------------------------------
Running a GSLIB program in parallel using pygeostat is straight forward
using a list of parameter dictionaries. Suppose we wanted histograms of
bitumen, fines and chlorides for this data set. Of course, it would be
faster to just run these one after another but this quick task will be
used to demonstrate how parallelization with pygeostat can be used. A
list of argument dictionaries is created - we specify a different
parameter file name (for example histplt\_bitumen.par,
histplt\_chlorides.par) for each variable and different parameter
strings (string representations of parameter files).
There is another consideration. If GSLIB programs attempt to
simultaneously open and read a file, only one process will get access to
the file causing the other program calls to fail. We can get around this
by specifying a "testfilename" as a parameter. Before executing each
parallel process, pygeostat will attempt to open and read the file. If
the file is busy, pygeostat will wait a short period to time and try
again. This works in most cases, but will not work for massive data
files, or programs with long run times which open a data file many times
such as trans\_trend.
.. code:: python
# Create an empty list of GSLIB calling parameters, callpars
callpars = []
# For each variable we want to run in parallel, assemble a dictionary of parameters and append to callpars
for variable in ['Bitumen','Fines','Chlorides']:
# Establish the parameter file for this variable
mypars = {'datafl':datafl.flname,
'varcol':datafl.gscol(variable),
'varname':variable,
'outfl':'histplt_'+variable+'.ps'}
# Assemble the arguments for the GSLIB call and add the arguments to the list of calls
callpars.append({'parstr':histpltpar.format(**mypars),
'parfile':'histplt_'+variable+'.par',
'testfilename':datafl.flname})
# Now we can run in parallel:
histplt = gs.Program(program='histplt', parfile='histplt.par')
gs.runparallel(histplt, callpars)
.. parsed-literal::
Creating parallel processes
Pool assembled, asynchronously processing
Asynchronous execution finished.
Desurveying and compositing
---------------------------
There is limited functionality in pygeostat to perform linear
desurveying and compositing of raw data
.. code:: python
# Read in the collars
collarfl = gs.DataFile(flname='./collars.csv',readfl=True,x='Easting',
y='Northing',z='Elevation',dh='Drillhole')
print('Collars:\n',collarfl.data.head())
# Surveys
surveyfl = gs.DataFile(flname='./downhole.csv',readfl=True,dh='Drillhole')
print('\nSurveys:\n',surveyfl.data.head())
# Raw data file
rawdatafl = gs.DataFile(flname='./rawdata.csv',readfl=True,dh='Drillhole',
ifrom='From',ito='To')
print('\nRaw data file:\n',rawdatafl.data.head())
The raw data will first be composited to 15 meter lengths using the most
frequent category for rock type and a linear average for grade.
.. code:: python
comps = gs.utils.desurvey.set_comps(rawdatafl,15.0)
print('Composite lengths to use:\n',comps)
upscaled = gs.utils.desurvey.get_comps(comps,rawdatafl,{'Rock Type':'categorical',
'Grade':'continuous'})
rawdatafl.origdata = rawdatafl.data.copy() # Save a copy of the original data just in case...
rawdatafl.data = upscaled # Make the upscaled data the primary data set
print('\nComposited data:\n',upscaled)
.. parsed-literal::
Composite lengths to use:
Drillhole From To
0 DH101 0 15
1 DH101 15 30
2 DH101 30 45
3 DH101 45 60
4 DH101 60 75
5 DH101 75 90
6 DH101 90 105
7 DH102 0 15
8 DH102 15 30
9 DH102 30 45
10 DH102 45 60
11 DH102 60 75
12 DH102 75 90
13 DH102 90 105
14 DH102 105 120
15 DH102 120 135
16 DH102 135 150
Composited data:
Grade Rock Type Drillhole From To
0 5.370400 2 DH101 0 15
1 4.765147 1 DH101 15 30
2 5.151280 1 DH101 30 45
3 5.127600 2 DH101 45 60
4 4.936727 3 DH101 60 75
5 4.376967 3 DH101 75 90
6 4.437598 3 DH101 90 105
7 4.876240 4 DH102 0 15
8 4.926200 4 DH102 15 30
9 5.072747 4 DH102 30 45
10 4.294660 4 DH102 45 60
11 5.264047 4 DH102 60 75
12 5.272420 3 DH102 75 90
13 5.004067 4 DH102 90 105
14 4.906380 2 DH102 105 120
15 4.464113 4 DH102 120 135
16 4.276362 3 DH102 135 150
Now these composites will be desurveyed and X,Y,Z values assigned to the
midpoints of the composites.
.. code:: python
drillholes = gs.utils.set_desurvey(collarfl,surveyfl,'Along','Azimuth','Inclination')
print('Drillhole objects are created which can be used to desurvey:')
print(drillholes['DH101'])
print(drillholes['DH102'])
.. parsed-literal::
Drillhole objects are created which can be used to desurvey:
Drillhole DH101 collared at (1867.0,12022.0,342.0)
Drillhole DH102 collared at (1927.0,12880.0,335.0)
.. code:: python
gs.utils.get_desurvey(rawdatafl,drillholes,x=collarfl.x,y=collarfl.y,z=collarfl.z)
print(rawdatafl.data.head())
.. parsed-literal::
Grade Rock Type Drillhole From To Easting Northing \
0 5.370400 2 DH101 0 15 1867.715534 12023.239341
1 4.765147 1 DH101 15 30 1869.369505 12025.832072
2 5.151280 1 DH101 30 45 1871.068058 12028.447613
3 5.127600 2 DH101 45 60 1872.766610 12031.063155
4 4.936727 3 DH101 60 75 1874.442209 12033.693319
Elevation
0 334.637796
1 319.956883
2 305.284669
3 290.612455
4 275.940241
The data file has been desurveyed, last step is to convert the
alphanumeric drill holes to numeric values and save it out for working
with GSLIB programs.
.. code:: python
# Convert to numeric drill hole IDs
dhdict = {'DH101':101,
'DH102':102}
rawdatafl.applydict('Drillhole','DH',dhdict)
print(rawdatafl.data.head())
# Save out as VTK
rawdatafl.writefile(flname='desurveyed_data.vtk',variables=['Rock Type','Grade','DH'],fltype='vtk')
.. parsed-literal::
Grade Rock Type Drillhole From To Easting Northing \
0 5.370400 2 DH101 0 15 1867.715534 12023.239341
1 4.765147 1 DH101 15 30 1869.369505 12025.832072
2 5.151280 1 DH101 30 45 1871.068058 12028.447613
3 5.127600 2 DH101 45 60 1872.766610 12031.063155
4 4.936727 3 DH101 60 75 1874.442209 12033.693319
Elevation DH
0 334.637796 101
1 319.956883 101
2 305.284669 101
3 290.612455 101
4 275.940241 101
A small screen capture of the desurveyed data
.. code:: python
from IPython.display import Image
desurveyedvtk = Image(filename='desurveyed_data_vtk.png')
desurveyedvtk
.. image:: ./figures/desurveyed_data_vtk.png
Writing out a subset of data values
-----------------------------------
Write out just a subset of the data such as grades for a specific
category.
.. code:: python
print('\nThe bitumen and fines content for fines greater than 40 (first few rows)')
small_dataset = datafl.data[['Bitumen','Fines']][datafl.data['Fines'] > 40.0]
print(small_dataset.head())
datafl.writefile(flname='bit_and_fines_gr_40.dat',fltype='GSLIB',data=small_dataset)
.. parsed-literal::
The bitumen and fines content for fines greater than 40 (first few rows)
Bitumen Fines
54 2.900 43.500
73 2.465 44.838
74 0.000 52.900
75 0.000 52.900
76 0.000 52.900
Summary
-------
This documentation has demonstrated some of the capabilities of Python +
pygeostat. As with all of the tasks we do - there are many ways to
accomplish the same thing. Advantages to using Python are that it is a
powerful, high level language with hundreds of packages for everything
from machine learning algorithms to plotting and a very large userbase
which means answers to any question about the language only a
google-away. Everything presented here is also 100% free and open
source, unlike many other tools such as Matlab and commercial software.
There are other advantages for us: Python can dynamically link with C
and Fortran code, manage parallel processes across many computers, send
you an email when your script is completed... The primary disadvantage
is that it is not widely used within CCG so there is limited support and
CCG-specific reference material available. This is a significant
disadvantage. The depth of reference material and example scripts which
are available for Bash and Matlab within CCG is substantial. Python
provides just one more alternative!