Optimize Your Performance Intervals!

Use ANNs with custom loss functions to predict probable ceilings and floors for workers’ future job performance

9 min readMar 12, 2023

This article is the second in a three-part series on “Advanced modelling of workers’ future performance ranges through ANNs with custom loss functions.” Part 1 explored why it’s useful to predict the probable ceiling and floor for an employee’s future performance — and why it’s difficult to do so effectively, using conventional methods based on mean absolute error or standard deviation. Part 2 investigates how one can model the likely ceiling and floor using separate artificial neural networks with custom loss functions. And Part 3 combines those ceiling and floor models to create a composite prediction interval that can outperform simpler models based on MAE or SD in forecasting the probable range of workers’ future job performance.

The utility of explicitly modelling ceilings and floors

We’ve previously investigated (1) why it isn’t enough for HR predictive analytics systems to simply generate a single predicted value for a worker’s future performance and (2) why simple approaches to predicting the likely range of a worker’s future performance often yield results that have an unnecessarily high degree of error and are difficult to interpret.

Here we’ll explore an alternative approach that treats the prediction interval for a worker’s forecasted future behavior as a complex phenomenon with a potentially non-normal distribution that can (and should) be modelled in its own right through machine learning — rather than simply generating a single predicted value for a worker’s future performance and then deriving a probable range by taking that value “plus or minus some number.”

As in Part 1, we’ll be using a synthetic dataset that includes 119,005 observations of factory workers’ daily performance and behavior, generated by Synaptans WorkforceSim, a free, open-source Python package for simulating the dynamics of an organizational workforce. The observations are for a factory with 268 employees and cover a period of 300 calendar days. The dataset file was imported into the Comport_AI web app, another open-source Python-based tool for HR predictive analytics that makes it possible to compare the effectiveness of various machine-learning algorithms for forecasting the likely range of workers’ future behavior. As part of that process, Comport_AI transformed the dataset into a “person-day” format in which each row represents the behaviors observed for a particular employee on a single day, along with various historical metrics for the worker as they existed at that point in time — and future values for the worker that can be used as “targets” or “y-values” for machine learning.

Modelling the probable ceiling of a worker’s future performance

We can begin by using Comport_AI to build models whose sole purpose is to predict the likely ceiling for the mean efficacy that a worker can be expected to demonstrate during the next 30 days.

What makes for a “good” ceiling prediction? When predicting the target value itself, a model’s goal is simply to output a number that’s as close as possible to the actual future target value. However, when making a prediction for the ceiling of the probable range, a model’s aim should be to output a number that’s as close to the actual future target value as possible without being less than that value. If a prediction is truly to represent a “ceiling,” then its first and most important job is to be at least as high as the actual future target value.

For this reason, our evaluation of a ceiling model’s quality should heavily penalize instances in which an actual target value ended up being higher than the predicted ceiling. Another element that should be given much less weight — but still be taken into account — is the extent to which a predicted ceiling “overshot” the actual target value, in cases where the ceiling was (safely) higher than the actual figure. All things being equal, we would like a predicted ceiling value to match or exceed the actual target value — but by the smallest margin possible. Comport_AI helps us assess ceiling models in such ways by employing four metrics:

Portion of Actual Targets Greater Than Ceiling (PATGTC) reflects the share of actual target values that were greater than their predicted ceiling value. For a given ceiling model, the value of PATGTC will range from 0.0 to 1.0, with 0.0 indicating that no target values exceeded their predicted ceiling and 1.0 indicating that all of the target values exceeded their predicted ceiling.
Adjusted Mean Out-of-Range Proportional Distance (AMORPDAC) is a complex metric that takes into account the distance by which actual target values exceeded their predicted ceiling value, in those cases when they exceeded it. It more heavily penalizes larger distances and a larger number of cases of actual values exceeding their predicted ceiling. The higher the number, the less effective a ceiling model is (because many actual target values are exceeding their predicted ceiling by a large amount).
Adjusted Mean In-Range Proportional Distance Below Ceiling (AMIRPDBC) is a complex metric that takes into account the distance by which actual target values were below the predicted ceiling value, in those cases when they were less than it. The larger the number, the less effective a ceiling model is (because it’s generating ceiling predictions that are unnecessarily high in value).
Overall Ceiling Error (OCE) is the sum of AMORPDAC and AMIRPDBC for a given ceiling model; it offers an overall measure of a ceiling model’s effectiveness. In the modelling of performance ranges, our goal is to minimize this number.

By default, Comport_AI will general one ceiling model produced by taking the Base Target Model’s predicted target values + the model’s MAE; nine ceiling models produced by taking the Base Target Model’s predicted target values + (n×SD) for the standard deviation of particular workers’ historical efficacy results and several values of n; and 20 ceiling models in the form of artificial neural networks with different custom loss functions. Of the 30 ceiling models generated for this analysis, we’ll highlight three of them here.

First, we have the ceiling model that predicts the probable ceiling for each worker’s mean efficacy during the next 30 days by taking the predicted target value for each worker and then adding the model’s MAE:

**A ceiling model based on predicted target values + MAE, generated using Comport_AI**

In just over 80% of cases, the actual target value was indeed less than or equal to the predicted ceiling, which seems like a decent (but hardly spectacular) result.

Next, we have the ceiling model that predicts the probable ceiling for each worker by taking the worker’s predicted target value and adding 1.0×SD, where SD is the standard deviation of a worker’s historical efficacy values.

**A ceiling model based on predicted target values + 1.0×SD, generated using Comport_AI**

In this case, only 0.955% of the actual target values exceeded their predicted ceilings. While that’s far superior to the MAE-based model’s performance, the SD-based model’s results were also more “diffuse,” with many of the ceiling values resting at a significant distance above the actual target values. However, in keeping with our criterion that it’s more important for a ceiling model’s predictions to exceed the actual target values than to be close to them, the Overall Ceiling Error of the SD-based model (0.37777) is indeed calculated as being much lower than that of the MAE-based model (14.11161).

Finally, we have one of our more advanced ceiling models in the form of an ANN with a custom loss function. In this case, the loss function was calculated as:

from keras import backend as K
loss_pred_safely_high = K.square(y_pred - y_true)
loss_pred_too_low = 200.0 * K.square(y_pred - y_true)
loss = K.switch(
    K.greater_equal(y_pred, y_true),
    loss_pred_safely_high,
    loss_pred_too_low
    )

This ceiling model’s performance is reflected in the plot below:

**A ceiling model in the form of an ANN with a custom loss function, generated using Comport_AI**

This model combines advantageous traits of the MAE- and SD-based models: it shares the MAE-based model’s propensity to minimize the distance between actual target values and ceiling values (i.e., the black dots are bunched up close to the diagonal line, rather than being dispersed widely throughout the green region), while at the same time sharing the SD-based model’s ability to keep the overwhelming majority of actual target values less than their predicted ceilings. This superior performance is reflected in the fact that the custom ANN model’s Overall Ceiling Error (0.37289) is less than that of either of the other two models.

Modelling the probable floor of a worker’s future performance

We can now use Comport_AI to model the probable floor for workers’ future mean efficacy values for the next 30 days. The process here is very similar to that of ceiling models, except that — as is often the case in the kinds of dynamics modelled in HR predictive analytics — it isn’t possible for a worker’s efficacy to fall below zero. All of the models thus have an arbitrary adjustment introduced into them that replaces any predicted floor value of less than 0.0 with a value of 0.0. Comport_AI helps us to evaluate and compare floor models through the use of four built-in metrics:

Portion of Actual Targets Less Than Floor (PATLTF) reflects the share of actual target values that were less than their predicted floor value.
Adjusted Mean Out-of-Range Proportional Distance Below Floor (AMORPDBF) is a complex metric that takes into account the distance by which actual target values fell below their predicted floor value, in those cases when they did so. It more heavily penalizes larger distances and a larger number of cases of actual values falling below their predicted floor. The larger the number, the less effective a floor model is.
Adjusted Mean In-Range Proportional Distance Above Floor (AMIRPDAF) is a complex metric takes into account the distance by which actual target values exceeded the predicted floor value, in those cases when they were greater than it. The larger the number, the less effective a floor model is.
Overall Floor Error (OFE) is the sum of AMORPDBF and AMIRPDAF for a given floor model; it offers an overall measure of a ceiling model’s effectiveness. In the modelling of performance ranges, our goal is to minimize this number.

Of the 30 floor models generated in Comport_AI for this analysis, we’ll present three here. First, we have the floor model that forecasts the probable floor for each worker’s mean efficacy during the next 30 days by taking the predicted target value for each worker and subtracting the model’s MAE:

**A floor model based on predicted target values — MAE, generated using Comport_AI**

Next, we have the floor model that predicts the probable floor for each worker by taking the worker’s predicted target value and subtracting 1.0×SD, where SD is the standard deviation of a worker’s historical efficacy values:

**A floor model based on predicted target values — 1.0×SD, generated using Comport_AI**

And finally, we have one of our more advanced floor models in the form of an ANN with the custom loss function calculated as:

loss = K.mean(math_ops.squared_difference(y_pred, y_true), axis=-1) \
    - 1.1 * K.mean(y_true - y_pred)

This floor model’s performance is reflected in the plot below:

**A floor model in the form of an ANN with a custom loss function, generated using Comport_AI**

Once again, the MAE-based model manages to keep its predicted floor values close to the actual target values — but with the problem that many of the actual target values fall below their predicted floor. The SD-based model does a better job of generating predicted floors that are indeed lower than or equal to the actual target values, but its predicted floors are dispersed more loosely throughout the green (safe) region. The custom ANN model manages to keep its predicted floor values grouped relatively tightly, close to the diagonal line — but with almost none of them falling onto the “wrong” side of the line (i.e., being greater than their actual target value). The custom ANN model’s Overall Floor Error is significantly less than that of either of the more conventional floor models.

The next step

We’ve built ANN-based ceiling and floor models that — at least in some situations — seem capable of performing better than conventional MAE- and SD-based methods at predicting the likely ceiling and likely floor of a worker’s future job performance. So what’s next?

In the third and final part of this series, we’ll combine our ceiling and floor models to create joint prediction intervals — and assess how the joint range models generated by our ANN models compare with those derived from simpler MAE- and SD-based approaches. I hope that you’ll join us for Part 3 of the series!