1 Introduction

A comparison of JULES-ES-1p0 wave01 members against the original ensemble (wave00).

Wave 01 input parameter sets were picked using History matching to fall within Andy Wiltshire’s basic constraints on NBP, NPP, cSoil and cVeg stocks at the end of the 20th century. We use 300 of the 500 members, keeping back 2/5ths for emulator validation later.

We answer some basic questions.

What proportion of the new ensemble match AW’s constraints?

What do the timeseries of carbon cycle properties look like with and without AW’s constraints?

How good is a GP emulator? Does it get better overall with the new ensemble members added? In particular, does it get better for those members within the AW constraints?

Does comparison of the ensemble with Atmospheric growth observations give more of a constraint?

To do:

Does the sensitivity analysis change?

2 Preliminaries

Load libraries, functions and data.

2.1 How many run failures were there?

There are no NAs but some relative humidity values are infinite. There are no “low NPP” ensemble members

[1] 117464.6
[1] FALSE
     row col
[1,] 140   9
[2,] 232   9
[3,] 249   9
[4,] 300   9
[1] Inf Inf Inf Inf
[1] "rh_lnd_sum"

2.2 Wave00 level1a Ensemble behaviour in all outputs

Need units

2.3 Wave00/Wave01 Ensemble behaviour in key (constraining) outputs.

Global mean for the 20 years at the end of the 20th Century. There is still a significant low bias on cVeg output.

2.4 What proportion of models now fall within Andy’s constraints?

Just over a third! Better than before, but still not great. Pointing at a significant model discrepency in cVeg

Of the 400 members of the wave01 ensemble, 128 pass Andy Wiltshire’s Level 2 constraints.

[1] 128

Pairs plot of the inputs that pass the constraints with respect to the limits of the original ensemble.

2.5 Timeseries of mean carbon cycle properties over whole run.

This is a plot of timeseries of Wave00, Wave01, and level2-constrained wave01 on top of one another. We see that the wave01 is closer to the standard than wave00, and the level-2 constrained wave01 ensemble is often closer again. However, there are still quite large discrepancies. For example, baresoilfrac is often way too high, shrubfrac is often too low (though both these span the standard). Treefrac is away from zero, but still often too low or too high. While fHarvest looks good, fLuc does not appear constrained by the process at all. RH (soil respiration) looks well constrained, whereas lai is often too low.

One thing we could do next is constrain input space again, using observations or “tolerance to error” on some or all of these outputs.

We could also extend sensitivity analysis to work out what controls e.g. treefrac.

A summary of what gets constrained when you constrain the top-level variables would be a useful aanalysis.

2.6 Anomaly timeseries

Similar to above, but output anomaly

3 Emulator fits

We hope that running the new ensemble gives us a better emulator, and allows us to rule out more input space. We particularly hope that the emulator is better for those members that are inside AW’s constraints.

First, we can look at the emulator errors in two cases: The level1a data (a basic carbon cycle), and then with the Wave01 data, which should have similar characteristics. (We should have eliminated really bad simulations, but wave01 is not constrained the data perfectly to be within AW constraints.)

Found the outlier - looks like it’s 440

integer(0)

3.1 Leave-one-out analyses of emulator prediction accuracy

The top row shows the leave-one-out prediction accuracy of the original wave00 ensemble, and the lower row the entire wave00 AND wave01 ensemble combined.

3.2 Emulator accuracy of members from wave 00 and wave 01 that pass level 2 (AW’s) constraints

We see that the error stats for some of the outputs from wave01 are worse, but there are many more ensemble members that lie within the constraints for wave 01.

“pmae” is “proportional mean absolue error”, which is the mean absolute error expressed as a percentage of the original (minimally constrained) ensemble range in that output.

3.3 Does the emulator improve is you look at only the 37 members that pass level 2 constraints in wave 00?

This gives us an idea of how good the emulator is where it really matters, and as the members are consistent, gives us a fairer idea of whether the emulators have improved with more members.

Good news is, the emulators are more accurate for wave01.

These leave-one-out prediction accuracy plots rank the ensemble members from largest underprediction to largest overprediction using the wave00 predictions. A perfect prediction would appear on the horizontal “zero” line.

Many of the wave01 predictions are closer to the horizontal line, and therefore more accurate predictions.

None of the predictions are outside the uncertainty bounds, which suggests they are overconservative (should be smaller).

Looking at the proportional mean absolute error (pmae), expressed in percent, we can see that it doesn’t improve much for the whole ensemble, but does improve significantly for the subset of ensemble members that fall within AW’s constraints from the first ensemble (marked "_sub").

pmae_wave00 <- lapply(loostats_km_Y_level1a, FUN = function(x) x$pmae )
pmae_wave01 <- lapply(loostats_km_Y_level1a_wave01, FUN = function(x) x$pmae )

pmae_wave00_sub <- lapply(loostats_km_Y_level1a_sub, FUN = function(x) x$pmae )
pmae_wave01_sub <- lapply(loostats_km_Y_level1a_wave01_sub, FUN = function(x) x$pmae )

pmae_table <- cbind(pmae_wave00, pmae_wave01, pmae_wave00_sub, pmae_wave01_sub)

print(pmae_table)

4 Constraining to level 2 with the emulator

     pmae_wave00 pmae_wave01 pmae_wave00_sub pmae_wave01_sub
[1,] 5.03455     4.92543     7.598085        4.290227       
[2,] 4.229792    3.839274    4.757898        3.803634       
[3,] 3.612739    3.612914    4.69579         3.638157       
[4,] 4.225543    4.445785    4.727673        3.679528       
[1] 11.732

Next, check emulators of all the other outputs and apply the constraints to them. See how the constraints change.

4.1 Input space with emulated members passing Level 2 constraints.

4.2 Input space with emulated members passing Level 2 constraints AND low atmospheric growth error

Emulated members passing level2 constraints AND having lower error in atmospheric growth than standard.

Red point indicates the standard input.

The position of the standard input with regards to the histograms give us an idea of what we might do to improve the simulation - at least in terms of atmospheric growth. If we were to move the input towards areas of higher density, we might expect a better model performance.

5 Comparing atmospheric growth in wave00, wave01 and observations

5.1 Andy asks - what constraint does that give us in cumulative NBP?

5.2 Eddy suggests measuring cumulative NBP against atmospheric growth rate

Calculate the atmospheric growth rate of 1984- 2013 using a simple linear fit

Interannual variability and cumulative NBP

(correlations are close to zero, especially in the later wave)

5.3 How close can we get the model to reality?

Using Atmospheric Growth Rate as an example, how close can we get the model to observations? Can we do better than standard? What are the trade offs of doing so? How does getting close in AGR affect performance in other outputs?

We’ve established that most of the original ensemble have an ME/MAE/RMSE larger than the standard run. More (but few) of the wave01 perform better than standard.

A map of the 2D projections of parameter space where the ensemble member performs better than standard.

The blue part is the first wave, and not subject to constraint so may be removed in the second wave (wave01).

5.4 Build emulators and find parts of parameter space that to better than standard at atmospheric growth.

Having trouble fitting RMSE, to trying mean error.

Why is there an odd collection at just under 1?

This next pairs plot looks at all the ensemble members that have a better mean atmospheric growth error than standard.

This next plot looks at all the ensemble members that have a better mean atmospheric growth error than standard AND pass the level 2 constraints.

The number is small (41/300), but the ensemble members seem spread across parameter space.

5.5 Input space with low Atmospheric Growth Error

This pairs plot shows the 2d and marginal density of emulated input points where the emulated atmospheric growth is closer to the observations than the standard model.

This technique might provide a useful set of points for optimising the model (at least to atmospheric growth).

Next, check emulators of all the other outputs and apply the constraints to them. See how the constraints change.

5.6 Input space with emulated members passing Level 2 constraints.

5.7 Input space with emulated members passing Level 2 constraints AND low atmospheric growth error

Emulated members passing level2 constraints AND having lower error in atmospheric growth than standard.

Red point indicates the standard input.

The position of the standard input with regards to the histograms give us an idea of what we might do to improve the simulation - at least in terms of atmospheric growth. If we were to move the input towards areas of higher density, we might expect a better model performance.

6 Exploring further constraints

It’s pretty clear that the “initial” (level2) constraints can be improved on. How about tree, shrub and bare soil fractions?

Build an ensemble of modern vegetation fractions.

6.1 Build emulators for the fraction data

It might be useful to try emulating log (fractions)

6.2 Leave-one-out analyses of emulator prediction accuracy

6.2.1 How far from data is the model?

           used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  1543377  82.5    2759507  147.4   2759507  147.4
Vcells 21620389 165.0  230925504 1761.9 349236820 2664.5
null device 
          1 

6.3 Sequentially add constraints

6.3.1 individual constraint

6.4 Sequential constraint

Each constraint is added to the previous constraint

6.4.1 How similar are each of the constraints?

