UCL

4. Download and visualise data

4.1. Download

In this section, you will learn how to download data using a web-based app developed using Google Earth Engine (GEE). GEE has access to a large number of Earth Observation and other datasets, and makes it easy to access these and do a some processing (at least for a limited spatial extent if you want high resolution data).

You are provided with a web App built using GEE:

This is available to you as a specific App for the Shenzhen area. The use of the App should be quite intuitive, but we will outline how to use it below.

The app gives you access to 5 bands of Landsat surface reflectance data (see figure below), along with two vegetation indices, NDVI and NDWI. The datasets are annual composites, with each pixel the median of valid surface reflectance data for that year.

loaded

First load the App:

[1]:
%matplotlib inline
from IPython.display import IFrame
IFrame('https://plewis.users.earthengine.app/view/shenzhen','100%',490)
[1]:

By default, a composite dataset for the year 1986 is loaded, along with a rough initial classification.

You can download these datasets from this interface by clicking on:

  • the 1986 link to get the Landsat composite dataset
  • the header link to get the associated envi header files

You should download these files and unzip them into an appropriate directory in your Data directory (e.g. ~/Data/GEOG0027/1986). You can also access the classification files class but there is little point downloading these as we will not use them.

You can change the year of the dataset in the right-hand panel. You can also load data for multiple years, although the App response may become slow (depending on network speeds). You can clear the loaded datasets with the Clear ... button.

For example (N.B. once you have downloaded the files):

[ ]:
%%bash

# where the zip files are now, *after* having downloaded them
in=./files
# where we will put the files
out=~/DATA/GEOG0027/

mkdir -p $out
unzip -u $in/1986.zip -d $out
unzip -u $in/1986_hdr.zip -d $out
mv $out/class1986/* $out

You may find it more straightforward to use any tool prompted by your browser when downloading and extracting these files. In any case, make sure that all files for all archives go into a folder that is named after the year of the dataset (1986 here). This is important for automatic further processing later.

4.2. Getting the data archive

Whilst you should find it instructive to download the datasets from the App, it is a little tedious to do that for all of the datasets.

For this reason, on the UCL Geography system, you can access the entire dataset from a single file.

To use this, open a Unix shell, and type the following:

[ ]:
%%bash

data=/home/plewis/public_html/GEOG0027/gee.tar.Z
out=~/DATA/GEOG0027/

mkdir -p $out
cd $out
tar xzf $data

If you are outside the UCL Geography system, you can access the archive file as a compressed tar archive from either of these links [tar file or zip file], but beware that the compressed file is 2.9G.

4.3. Visualise time series

You may have noticed a time series graph on the left-hand panel of the app. This appears when you click on any location in the image panel, and provides a visualisation of the time series of reflectance (and vegetation indices) for that location. You can pop the graph out to another browser tab (the grey square with arrow in the top left of the plot).

It will look like this:

appgraph

This is useful foir giving you an idea of the land cover history of this location, so you might want to produce graphs for several locations with different cover types, to use in your report.

You can directly save the plot (Download png button) or save the dataset (Download csv button).

If you download the csv file, you can then produce your own plots of the data (for your report).

The example below are given in Python, but you should use whatever graphing package you are familiar with. Please make sure that (unlike below, which are for illustration only) you put full titles, labels and legends on graphs.

[2]:
import matplotlib.pylab as plt
%matplotlib notebook
import numpy as np
import pandas as pd

d = pd.read_csv('files/ee-chart.csv')
d
[2]:
system:time_start B1_blue B2_green B3_red B4_nir B5_swir B6_NDVI B7_NDWI
0 1986-06-06 0.102 0.175 0.238 0.347 0.388 0.187 -0.055
1 1987-06-06 0.103 0.177 0.227 0.299 0.412 0.136 -0.159
2 1988-06-06 0.101 0.157 0.206 0.296 0.315 0.181 -0.030
3 1989-06-06 0.104 0.158 0.192 0.262 0.295 0.154 -0.058
4 1990-06-06 0.154 0.221 0.272 0.330 0.381 0.097 -0.072
5 1991-06-06 0.133 0.187 0.211 0.260 0.300 0.105 -0.072
6 1992-06-06 0.145 0.185 0.205 0.239 0.295 0.078 -0.104
7 1993-06-06 0.144 0.188 0.205 0.249 0.293 0.098 -0.081
8 1994-06-06 0.111 0.139 0.149 0.181 0.227 0.098 -0.114
9 1995-06-06 0.114 0.136 0.137 0.170 0.213 0.109 -0.111
10 1996-06-06 0.107 0.132 0.137 0.159 0.192 0.074 -0.095
11 1997-06-06 0.098 0.121 0.132 0.162 0.190 0.104 -0.079
12 1998-06-06 0.125 0.150 0.160 0.183 0.215 0.064 -0.081
13 1999-06-06 0.123 0.152 0.167 0.183 0.200 0.045 -0.045
14 2000-06-06 0.134 0.153 0.157 0.167 0.203 0.031 -0.096
15 2001-06-06 0.110 0.131 0.140 0.152 0.166 0.038 -0.045
16 2002-06-06 0.115 0.132 0.134 0.146 0.138 0.041 0.027
17 2003-06-06 0.113 0.128 0.138 0.153 0.152 0.054 0.004
18 2004-06-06 0.133 0.169 0.186 0.218 0.257 0.079 -0.082
19 2005-06-06 0.140 0.181 0.199 0.233 0.257 0.079 -0.050
20 2006-06-06 0.141 0.171 0.178 0.221 0.226 0.106 -0.012
21 2007-06-06 0.141 0.186 0.205 0.242 0.257 0.084 -0.029
22 2008-06-06 0.154 0.183 0.186 0.235 0.232 0.117 0.007
23 2009-06-06 0.145 0.170 0.180 0.211 0.217 0.078 -0.014
24 2010-06-06 0.151 0.180 0.184 0.213 0.228 0.073 -0.035
25 2011-06-06 0.144 0.169 0.175 0.202 0.227 0.072 -0.059
26 2012-06-06 0.144 0.169 0.175 0.202 0.227 0.072 -0.059
27 2013-06-06 0.127 0.156 0.165 0.216 0.215 0.135 0.003
28 2014-06-06 0.114 0.145 0.155 0.202 0.197 0.131 0.013
29 2015-06-06 0.128 0.158 0.169 0.226 0.240 0.144 -0.029
30 2016-06-06 0.117 0.146 0.157 0.221 0.240 0.168 -0.040
31 2017-06-06 0.123 0.153 0.161 0.209 0.209 0.129 0.001
32 2018-06-06 0.140 0.176 0.188 0.242 0.253 0.125 -0.023

Note that the time label for each dataset is YYYY-06-06. The day and month (06-06) have no significance here as this is an annual composite dataset.

[3]:
_=d.plot(figsize=(10,4),title='Example data plot')

4.4. Visualise spatial data

You should be able to conveniently explore the spatial datasets in the GEE App. This is particularly useful for data interpretation (e.g. for training or validation) as you can have a map or high resolution image as a background.

You can also load multiple years of data, and visualise the dynamics.

Other than that, you will probably want to load the datasets into envi for visualisation. If you have correctly stored the data from the zip files, you should have in your data directory a series of sub-directories named by year, such as:

1986    1989    1992    1995    1998    2001    2004    2007    2010    2013    2016    2019
1987    1990    1993    1996    1999    2002    2005    2008    2011    2014    2017
1988    1991    1994    1997    2000    2003    2006    2009    2012    2015    2018

Inside each of these, you should see files such as the following:

22014.constant.tif     2014.swir.tif
2014.NDVI.tfw           2014.green.tfw
2014.NDVI.tif           2014.green.tif
2014.NDWI.tfw           2014.nir.tfw
2014.NDWI.tif           2014.nir.tif
2014.blue.tfw           2014.red.tfw
2014.blue.tif           2014.red.tif
2014.constant.tfw       2014.swir.tfw
2014_Shenzhen            2014_Shenzhen.hdr

The file 2014_Shenzhen is particularly important, as this is the file that you will load using envi.

2014_load

2014_view

One further file that may be of interest is the file 2014.constant.tif which you can use as a data mask if you like. The mask has a value of zero for Shenzhen and no data outside of that.

mask_image

You should load and explore the datasets you have downloaded. Provide examples of e.g. interesting spectral profiles, transects, histograms or scatter plots for your report – things that you would normally produce as part of a data exploration exercise.

the histograms are of interest (think about what this means for classification)

histo2014

and the scatterplots in many ways even more interesting:

2014scatter

You will find it particularly interesting to examine scatter plots of NDVI and NDWI for this dataset as you should see that a pretty good classification can be achieved with just these two bands of data. An advantage of this would be that the processing should be faster (fewer bands), but also that it is straightforward to visualise and appreciate the clusters that will be used to describe the land cover classes.

These are not the only ‘bands’ to explore though! Collect plenty of images of scatterplotas, histograms and images for your write up to illustrate particular things you find. Don’t just put images in the report with no reference to what you are using them to show!

You should spend some time exploring these scatterplots for the different years data: are the patterns consistent? (show some evidence). Can you simply visualise (and explain) the clusters and their relationship to land cover types? In exploring the datasets you should save interesting plots to use in your write-up – anything that helps you explain what is going on is of value!