UCL

4. Download and visualise data¶

4.1. Download¶

In this section, you will learn how to download data using a web-based app developed using Google Earth Engine (GEE). GEE has access to a large number of Earth Observation and other datasets, and makes it easy to access these and do a some processing (at least for a limited spatial extent if you want high resolution data).

You are provided with a web App built using GEE:

This is available to you as a specific App for the Shenzhen area. The use of the App should be quite intuitive, but we will outline how to use it below.

The app gives you access to 5 bands of Landsat surface reflectance data (see figure below), along with two vegetation indices, NDVI and NDWI. The datasets are annual composites, with each pixel the median of valid surface reflectance data for that year.

loaded

First load the App:

[1]:

%matplotlib inline
from IPython.display import IFrame
IFrame('https://plewis.users.earthengine.app/view/shenzhen','100%',490)

[1]:

By default, a composite dataset for the year 1986 is loaded, along with a rough initial classification.

You can download these datasets from this interface by clicking on:

the 1986 link to get the Landsat composite dataset
the header link to get the associated envi header files

You should download these files and unzip them into an appropriate directory in your Data directory (e.g. ~/Data/GEOG0027/1986). You can also access the classification files class but there is little point downloading these as we will not use them.

You can change the year of the dataset in the right-hand panel. You can also load data for multiple years, although the App response may become slow (depending on network speeds). You can clear the loaded datasets with the Clear ... button.

For example (N.B. once you have downloaded the files):

[ ]:

%%bash

# where the zip files are now, *after* having downloaded them
in=./files
# where we will put the files
out=~/DATA/GEOG0027/

mkdir -p $out
unzip -u $in/1986.zip -d $out
unzip -u $in/1986_hdr.zip -d $out
mv $out/class1986/* $out

You may find it more straightforward to use any tool prompted by your browser when downloading and extracting these files. In any case, make sure that all files for all archives go into a folder that is named after the year of the dataset (1986 here). This is important for automatic further processing later.

4.2. Getting the data archive¶

Whilst you should find it instructive to download the datasets from the App, it is a little tedious to do that for all of the datasets.

For this reason, on the UCL Geography system, you can access the entire dataset from a single file.

To use this, open a Unix shell, and type the following:

[ ]:

%%bash

data=/home/plewis/public_html/GEOG0027/gee.tar.Z
out=~/DATA/GEOG0027/

mkdir -p $out
cd $out
tar xzf $data

If you are outside the UCL Geography system, you can access the archive file as a compressed tar archive from either of these links [tar file or zip file], but beware that the compressed file is 2.9G.

4.3. Visualise time series¶

You may have noticed a time series graph on the left-hand panel of the app. This appears when you click on any location in the image panel, and provides a visualisation of the time series of reflectance (and vegetation indices) for that location. You can pop the graph out to another browser tab (the grey square with arrow in the top left of the plot).

It will look like this:

appgraph

This is useful foir giving you an idea of the land cover history of this location, so you might want to produce graphs for several locations with different cover types, to use in your report.

You can directly save the plot (Download png button) or save the dataset (Download csv button).

If you download the csv file, you can then produce your own plots of the data (for your report).

The example below are given in Python, but you should use whatever graphing package you are familiar with. Please make sure that (unlike below, which are for illustration only) you put full titles, labels and legends on graphs.

[2]:

import matplotlib.pylab as plt
%matplotlib notebook
import numpy as np
import pandas as pd

d = pd.read_csv('files/ee-chart.csv')
d

[2]:

	system:time_start	B1_blue	B2_green	B3_red	B4_nir	B5_swir	B6_NDVI	B7_NDWI
0	1986-06-06	0.102	0.175	0.238	0.347	0.388	0.187	-0.055
1	1987-06-06	0.103	0.177	0.227	0.299	0.412	0.136	-0.159
2	1988-06-06	0.101	0.157	0.206	0.296	0.315	0.181	-0.030
3	1989-06-06	0.104	0.158	0.192	0.262	0.295	0.154	-0.058
4	1990-06-06	0.154	0.221	0.272	0.330	0.381	0.097	-0.072
5	1991-06-06	0.133	0.187	0.211	0.260	0.300	0.105	-0.072
6	1992-06-06	0.145	0.185	0.205	0.239	0.295	0.078	-0.104
7	1993-06-06	0.144	0.188	0.205	0.249	0.293	0.098	-0.081
8	1994-06-06	0.111	0.139	0.149	0.181	0.227	0.098	-0.114
9	1995-06-06	0.114	0.136	0.137	0.170	0.213	0.109	-0.111
10	1996-06-06	0.107	0.132	0.137	0.159	0.192	0.074	-0.095
11	1997-06-06	0.098	0.121	0.132	0.162	0.190	0.104	-0.079
12	1998-06-06	0.125	0.150	0.160	0.183	0.215	0.064	-0.081
13	1999-06-06	0.123	0.152	0.167	0.183	0.200	0.045	-0.045
14	2000-06-06	0.134	0.153	0.157	0.167	0.203	0.031	-0.096
15	2001-06-06	0.110	0.131	0.140	0.152	0.166	0.038	-0.045
16	2002-06-06	0.115	0.132	0.134	0.146	0.138	0.041	0.027
17	2003-06-06	0.113	0.128	0.138	0.153	0.152	0.054	0.004
18	2004-06-06	0.133	0.169	0.186	0.218	0.257	0.079	-0.082
19	2005-06-06	0.140	0.181	0.199	0.233	0.257	0.079	-0.050
20	2006-06-06	0.141	0.171	0.178	0.221	0.226	0.106	-0.012
21	2007-06-06	0.141	0.186	0.205	0.242	0.257	0.084	-0.029
22	2008-06-06	0.154	0.183	0.186	0.235	0.232	0.117	0.007
23	2009-06-06	0.145	0.170	0.180	0.211	0.217	0.078	-0.014
24	2010-06-06	0.151	0.180	0.184	0.213	0.228	0.073	-0.035
25	2011-06-06	0.144	0.169	0.175	0.202	0.227	0.072	-0.059
26	2012-06-06	0.144	0.169	0.175	0.202	0.227	0.072	-0.059
27	2013-06-06	0.127	0.156	0.165	0.216	0.215	0.135	0.003
28	2014-06-06	0.114	0.145	0.155	0.202	0.197	0.131	0.013
29	2015-06-06	0.128	0.158	0.169	0.226	0.240	0.144	-0.029
30	2016-06-06	0.117	0.146	0.157	0.221	0.240	0.168	-0.040
31	2017-06-06	0.123	0.153	0.161	0.209	0.209	0.129	0.001
32	2018-06-06	0.140	0.176	0.188	0.242	0.253	0.125	-0.023

Note that the time label for each dataset is YYYY-06-06. The day and month (06-06) have no significance here as this is an annual composite dataset.

[3]:

_=d.plot(figsize=(10,4),title='Example data plot')

4.4. Visualise spatial data¶

You should be able to conveniently explore the spatial datasets in the GEE App. This is particularly useful for data interpretation (e.g. for training or validation) as you can have a map or high resolution image as a background.

You can also load multiple years of data, and visualise the dynamics.

Other than that, you will probably want to load the datasets into envi for visualisation. If you have correctly stored the data from the zip files, you should have in your data directory a series of sub-directories named by year, such as:

  1989    1992    1995    1998    2001    2004    2007    2010    2013    2016    2019
  1990    1993    1996    1999    2002    2005    2008    2011    2014    2017
  1991    1994    1997    2000    2003    2006    2009    2012    2015    2018

Inside each of these, you should see files such as the following:

constant.tif     2014.swir.tif
NDVI.tfw           2014.green.tfw
NDVI.tif           2014.green.tif
NDWI.tfw           2014.nir.tfw
NDWI.tif           2014.nir.tif
blue.tfw           2014.red.tfw
blue.tif           2014.red.tif
constant.tfw       2014.swir.tfw
2014_Shenzhen            2014_Shenzhen.hdr

The file 2014_Shenzhen is particularly important, as this is the file that you will load using envi.

2014_load

2014_view

One further file that may be of interest is the file 2014.constant.tif which you can use as a data mask if you like. The mask has a value of zero for Shenzhen and no data outside of that.

mask_image

You should load and explore the datasets you have downloaded. Provide examples of e.g. interesting spectral profiles, transects, histograms or scatter plots for your report – things that you would normally produce as part of a data exploration exercise.

the histograms are of interest (think about what this means for classification)

histo2014

and the scatterplots in many ways even more interesting:

2014scatter

You will find it particularly interesting to examine scatter plots of NDVI and NDWI for this dataset as you should see that a pretty good classification can be achieved with just these two bands of data. An advantage of this would be that the processing should be faster (fewer bands), but also that it is straightforward to visualise and appreciate the clusters that will be used to describe the land cover classes.

These are not the only ‘bands’ to explore though! Collect plenty of images of scatterplotas, histograms and images for your write up to illustrate particular things you find. Don’t just put images in the report with no reference to what you are using them to show!

You should spend some time exploring these scatterplots for the different years data: are the patterns consistent? (show some evidence). Can you simply visualise (and explain) the clusters and their relationship to land cover types? In exploring the datasets you should save interesting plots to use in your write-up – anything that helps you explain what is going on is of value!