Just a few remarks about models I tried that didn't lead to any improvements, and some that did.
The first three outcomes are often described as being 'continuous' data, but in fact have a small number of fixed points. So I tried models like 'ordered logit' as well as OLS ... and their predictions were worse (on the public leaderboard).
OK, OLS did better than ordered models, but there are clear 'floors' and 'ceilings' in the data for those first three outcomes (e.g. gpa is between 1 and 4, material hardship between 0 and 1). So I tried models like Tobits with max/min values. Again, more complexity than OLS but no improvement (though not worse).
OLS models were sufficient to get the grit and material hardship scores into the top 5-10 scores, or so. But the better scores required a shift to more 'machine learning' approaches using 'trees', though still drawing on a small-ish pool of independent variables ('features').
Of course it is still possible that the linear regression approach could be improved upon with better features, and which would have the benefit of being easier to interpret ... And perhaps those marginal improvements achieved on the public leaderboard will simply be 'over-fitting' rather than real improvements.