This notebook gives a brief overview of python and pygeostat for researchers and affiliates of CCG. This notebook is not an introduction to python in general, but is designed to help with:
Python is an interpreted, high level general purpose programming and scripting language which is very well suited to many of the common scripting tasks where bash or matlab would typically be used, and many tasks where an entire Fortran program might have previously been used. The huge collection of libraries for python covering machine learning, statistics, data management and optimization make it very useful for geostatistical workflows. There are many excellent tutorials for Python available which do a much better job introducing the language than this notebook would.
Pygeostat is a python module using fortran libraries which was written to facilitate working with GeoEAS data files (GSLIB format), CSV data files, and GSLIB and CCG software. The integrated fortran libraries making pygeostat extremely quick to use for many applications, such as variogram calculation and modeling requiring no parameter files. For other GSLIB and CCG programs, generic interface routines are provided to interface with the programs using standard string interpolation techniques familiar to anyone who has scripted with Bash.
If you wish to run the code in this notebook yourself, a scientific python distribution and pygeostat are required. Otherwise, all output is shown in the notebook below each cell so you can see the output without install python and running the notebook yourself. The Anaconda distribution of Python is recommended and is compatible with Windows, Mac and unix distributions.
To quickly setup python and pygeostat:
Download and run the Anaconda graphical installer of Python 3.* from http://continuum.io/downloads. Have the installer add python to the path. Test that the install worked by opening up a command prompt (cygwin or cmd) and running 'python --version'. The install version should be reported.
Install pygeostat using Python packaging Index (PyPI)
pip install pygeostat
Follow the examples provided in this document
import os, sys
import pygeostat as gs
import matplotlib.pyplot as plt
exe_dir="../pygeostat/executable/"
An extremely quick introduction to python is given here. Python is a high level interpreted programming language. Variable assignment and math is simple. The print operator prints to the screen. Whitespace (in the form of 4 spaces) is used for blocks such as if statements, or loops.
print('Hello World!')
print('Math is easy and works as expected')
a = 1
b = 2
print('a =',a,'b =',b)
print('a + b =',a+b)
print('a/b+0.2 =',a/b+0.2)
if (1>2):
print('This statement is unreachable')
else:
print('Note the 4 spaces for block statements like if and loops')
print('Loops can use a list, or a range among other things!')
for i in [1,2,10]:
print(i)
print('Note that everything starts from 0 in python unlike Fortran')
for i in range(3):
print(i)
mystr = 'Hello World!'
print('Lots of helpful options are built in such as string splitting:',mystr.split())
c, d = 2.561, 10.2
print('c = {}, d = {}'.format(c, d))
print('c = {:.1f}, d = {:.2e}'.format(c, d))
Pygeostat allows for default settings to be specified. This provides added convenience, since many arguments may be left to defaults and not specified repeatedly throughout the course of a notebook/project.
Also, pygeostat has a default PlotStyle that uses Matplotlib's rcParams dictionary to setup a consistent plot style throughout the course of a notebook/project.
# Setting the cat dictionary
gs.Parameters['data.catdict'] = {1: 'Inside', 0: 'Outside'}
gs.Parameters.describe('data.catdict')
gs.Parameters
gs.Parameters['data.griddef'] = gs.GridDef('''
120 5.0 10.0
110 1205.0 10.0
1 0.5 1.0''')
print(gs.Parameters['data.griddef'])
gs.Parameters.describe('data.griddef')
# Default plot settings
gs.PlotStyle['font.size'] = 12
gs.PlotStyle['figure.figsize'] = (10, 10)
We will use the oilsands training data set for this example. A pygeostat DataFile object is used to store the data, and relevant parameters such as the names of x, y and z coordinates. The DataFile class stores data by default in a Pandas DataFrame called 'data'.
Pygeostat contains a collection of example data sets under ~\pygeostat\data\example_data that can be access through ExampleData.
datafl = gs.ExampleData('oilsands')
datafl.head()
datafl.describe()
datafl.variables
datafl.xyz
Working with data is only useful if it can be saved.
out_dir = 'Output'
gs.mkdir(out_dir)
outgslibfl = os.path.join(out_dir,'oilsands_out.dat')
outcsvfl = os.path.join(out_dir,'oilsands_out.csv')
outhdf5fl = os.path.join(out_dir,'oilsands_out.hdf5')
print('\nThe datafile can be saved as a GSLIB file:',outgslibfl)
datafl.write_file(flname=outgslibfl, fltype='GSLIB')
print('\nOr as a hdf5 file which can be opened by Excel')
print('A subset of columns, can be saved with any of these modes - just saving X, Y and Z for example:',outhdf5fl)
datafl.write_file(flname=outhdf5fl, variables=['East','North','Elevation'])
print('\nOr as a CSV file which can be opened by Excel')
print('A subset of columns, can be saved with any of these modes - just saving X, Y and Z for example:',outcsvfl)
datafl.write_file(flname=outcsvfl, variables=['East','North','Elevation'], fltype='CSV')
outvtkfl = os.path.join(out_dir,'oilsands_out.vtk')
print('\nThe datafile can be saved as a VTK file:',outvtkfl)
datafl.write_file(flname=outvtkfl, fltype='VTK')
Explore the spatial aspect of the data using location_plot
datafl = gs.DataFile(outgslibfl)
gs.location_plot(datafl, var='Fines')
gs.location_plot(datafl, var='Bitumen', orient='xz', aspect = 10)
Calculate and visualize data spacing
datafl.spacing(n_nearest=5)
datafl.head()
gs.location_plot(datafl, var='Data Spacing (m)', s=50, cmap='hot')
Visulaize multivariate relationships using pygeostat scatter_plots function that infers variables form the data file and calcultae the bivariate KDE
_ = gs.scatter_plots(datafl)
gridstr = '''40 0.5 1 -nx, xmin, xsize
40 0.5 1 -ny, ymin, ysize
40 0.5 1 -nz, zmin, zsize'''
griddef = gs.GridDef(grid_str=gridstr)
krigfl = gs.ExampleData('3d_grid', griddef=griddef)
krigfl.describe()
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,15))
plt.subplots_adjust(wspace=0.4)
gs.slice_plot(krigfl, cbar=True, ax=ax1)
gs.slice_plot(krigfl.data['value'], griddef, cbar=True, ax=ax2, cmap='inferno')
ax = gs.slice_plot(krigfl.data['value'],griddef)
_ = gs.contour_plot(krigfl.data['value'],griddef,ax=ax)
kt3dn = gs.Program(program=exe_dir+'kt3dn', getpar=True)
# use pygeosta to infer grid definition
datafl.infergriddef(blksize=[100,100,10])
parstr = """ Parameters for KT3DN
********************
START OF PARAMETERS:
{flname} -file with data
{cols} 0 - columns for DH,X,Y,Z,var,sec var
-998.0 1.0e21 - trimming limits
0 -option: 0=grid, 1=cross, 2=jackknife
xvk.dat -file with jackknife data
1 2 0 3 0 - columns for X,Y,Z,vr and sec var
kt3dn_dataspacing.out -data spacing analysis output file (see note)
0 15.0 - number to search (0 for no dataspacing analysis, rec. 10 or 20) and composite length
0 100 0 -debugging level: 0,3,5,10; max data for GSKV;output total weight of each data?(0=no,1=yes)
kt3dn.dbg-nkt3dn.sum -file for debugging output (see note)
{outfl} -file for kriged output (see GSB note)
{griddef}
1 1 1 -x,y and z block discretization
4 40 12 1 -min, max data for kriging,upper max for ASO,ASO incr
0 0 -max per octant, max per drillhole (0-> not used)
8000.0 8000.0 400.0 -maximum search radii
0.0 0.0 0.0 -angles for search ellipsoid
1 -0=SK,1=OK,2=LVM(resid),3=LVM((1-w)*m(u))),4=colo,5=exdrift,6=ICCK
0.0 0.6 0.8 - mean (if 0,4,5,6), corr. (if 4 or 6), var. reduction factor (if 4)
0 0 0 0 0 0 0 0 0 -drift: x,y,z,xx,yy,zz,xy,xz,zy
0 -0, variable; 1, estimate trend
extdrift.out -gridded file with drift/mean
4 - column number in gridded file
onfile -gridded file with keyout (see note)
0 1 - column (0 if no keyout) and value to keep
2 0.0 -nst, nugget effect
1 0.25 0.0 0.0 0.0 -it,cc,ang1,ang2,ang3
400.0 300.0 25.0 -a_hmax, a_hmin, a_vert
1 0.75 0.0 0.0 0.0 -it,cc,ang1,ang2,ang3
800.0 450.0 30.0 -a_hmax, a_hmin, a_vert
"""
griddef = datafl.griddef
kriging_output = os.path.join(out_dir, 'kt3dn.out')
kt3dnpars ={'flname':datafl.flname,
'griddef':griddef,
'outfl': kriging_output,
'cols':datafl.gscol([datafl.dh]+datafl.xyz + ['Bitumen'])}
print('Note that the string representation of columns can be returned using the GSCOL function for a list of variables:',
kt3dnpars['cols'])
kt3dn.run(parstr=parstr.format(**kt3dnpars))
krigfl = gs.DataFile(kriging_output, griddef=datafl.griddef)
krigfl.head()
gs.slice_plot(krigfl, var='Estimate', orient='xz', pointdata=datafl, pointvar='Bitumen', aspect = 10)
gs.slice_plot(krigfl, var='Estimate', orient='yz', pointdata=datafl, pointvar='Bitumen', aspect = 10)
# Clean up
try:
gs.rmfile('temp')
gs.rmfile('kt3dn.dbg')
gs.rmdir(out_dir) #command to delete generated data file
except:
pass