History index model

From HandWiki
Short description: Model in functional data analysis


In statistical analysis, the standard framework of varying coefficient models (also known as concurrent regression models), where the current value of a response process is modeled in dependence on the current value of a predictor process,[1][2][3] is disadvantageous when it is assumed that past and present values of the predictor process influence current response. In contrast to these approaches, the history index model includes the effect of recent past values of the predictor through the history index function. Specifically, the influence of past predictor values is modeled by a smooth history index functions, while the effects on the response are described by smooth varying coefficient functions.[4][5]

Definition

In Functional data analysis, functional data are considered as realizations of a Stochastic process [math]\displaystyle{ X(t), t\in \mathcal{I} }[/math] that is an [math]\displaystyle{ L^{2} }[/math] process on a bounded and closed interval [math]\displaystyle{ \mathcal{I} }[/math].[6][7]

Let the current functional response process [math]\displaystyle{ Y(t) }[/math] at time [math]\displaystyle{ t }[/math] depends on the recent history of the predictor process [math]\displaystyle{ X }[/math] in a sliding window of length [math]\displaystyle{ \Delta }[/math].

Then the history index model is defined as

[math]\displaystyle{ \mathrm{E}\{Y(t)|X(t)\}=\beta_{0}+\beta_{1}(t)\int_{0}^{\Delta}\gamma(u)X(t-u)du, }[/math] (1)

for [math]\displaystyle{ t \in [\Delta,T] }[/math] with a suitable [math]\displaystyle{ T\gt 0 }[/math]. Then, a ''history index function'' is [math]\displaystyle{ \gamma(\cdot) }[/math] defining the history index factor at [math]\displaystyle{ \beta_{1}(\cdot) }[/math] by quantifying the influence of the recent history of the predictor values on the response. In most cases, [math]\displaystyle{ \gamma(\cdot) }[/math] is assumed to be smooth. For identifiability, [math]\displaystyle{ \gamma(\cdot) }[/math] is normalized by requiring that [math]\displaystyle{ \int_{0}^{\Delta} \gamma^{2}(u) du = 1 }[/math] and that [math]\displaystyle{ \gamma(0)\gt 0 }[/math], which is no real restriction as [math]\displaystyle{ \{-\beta_{1}(t)\}\{-\gamma(u)\}=\beta_{1}(t)\gamma(u) }[/math].[5]

Estimation of the history index model

Estimation of the history index function

At each fixed time point [math]\displaystyle{ t }[/math], the model in (1) reduces to a functional linear model between the scalar response [math]\displaystyle{ Y(t) }[/math] and the functional predictor [math]\displaystyle{ X(t), t-\Delta \leq s \leq t. }[/math] Also, [math]\displaystyle{ X^{C}(s)=X(s)-\mathrm{E}\{X(s)\} }[/math] is a centered functional covariate and [math]\displaystyle{ Y^{C}(s)=Y(s)-\mathrm{E}\{Y(s)\} }[/math] is a centered response process. Writing the model as

[math]\displaystyle{ \mathrm{E}\{Y^{C}(t)|X^{C}(t)\}=\beta_{1}(t)\int_{0}^{\Delta}\gamma(s)X^{C}(t-s)ds=\int_{0}^{\Delta}\alpha_{t}(s)X^{C}(t-s)ds, }[/math] (2)

with regression parameter functions [math]\displaystyle{ \alpha_{t}(s)=\beta_{1}(t)\gamma(s), }[/math] the functions [math]\displaystyle{ \alpha_{t}(s) }[/math] contain the factor [math]\displaystyle{ \gamma(s) }[/math] for each [math]\displaystyle{ t }[/math]. To satisfy the constraint [math]\displaystyle{ \int_{0}^{\Delta}\gamma^{2}(u)du=1 }[/math] and stabilize resulting estimators, over an equidistant grid of time points [math]\displaystyle{ (t_{1},\ldots,t_{R}) }[/math] in [math]\displaystyle{ [\Delta,T], }[/math] we can define

[math]\displaystyle{ \gamma(s)=\frac{\Sigma_{r=1}^{R}\alpha_{t_{r}}(s)}{[\int_{0}^{\Delta}\{\Sigma_{r=1}^{R}\alpha_{t_{r}}(s)\}^{2}ds]^{1/2}} }[/math]. (3)

When the history index function is recovered, model (1) reduces to a varying coefficient model.[5]

Estimation of the varying coefficient function

Once the estimate of [math]\displaystyle{ \gamma(s) }[/math] has been obtained, the remaining unknown component in model (2) is the varying coefficient function [math]\displaystyle{ \beta_{1} }[/math]. Define [math]\displaystyle{ \tilde{X}(t)=\int_{0}^{\Delta}\gamma(s)X^{C}(t-s)ds. }[/math] From (2),

[math]\displaystyle{ \mathrm{cov}\{X(t),Y(t)\}=\mathrm{cov}[\mathrm{E}\{X^{C}(t)|X\},\mathrm{E}\{Y^{C}(t)|X\}]+\mathrm{E}[\mathrm{cov}(X^{C}(t),Y^{C}(t)|X)]=\beta_{1}(t)\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds }[/math],

[math]\displaystyle{ \mathrm{cov}\{X(t),\tilde{X}(t)\}=\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds, }[/math]

and therefore [math]\displaystyle{ \beta_{1}(t)=\mathrm{cov}\{X(t),Y(t)\}/\int_{0}^{\Delta}\gamma(s)\mathrm{cov}\{X(t-s),X(t)\}ds. }[/math][5]

Application of the history index model

The applications of the varying coefficient model, which considers both the past and present information at the same time, have received an increasing attention in recent years. For example, Sentürk et al.[8] proposes a time varying lagged regression model to assess the association between predictors, such as cognitive and functional impairment scores, with the frequency of clinic visits of older adults. Also, Zemplenyi et al.[9] suggests a function-on-function regression model that leverages data from nearby DNA methylation probes to identify epigenetic regions that exhibit windows of susceptibility to ambient particulate matter less 2.5 microns (PM2.5). In this trend, the history index model have also been used in various situations.

Delay differential equation

The modeling of time dynamical systems is of interest in multiple scientific fields. A delay differential equation (DDE) is a natural extension of a variety of differential equations, such as ordinary differential equation, random differential equation and stochastic differential equation,[10] when observed processes have an aftereffect.

For dynamic learning of random differential equations with a delay (RDED), Dubey et al.[11] utilize functional linear regression with history index to learn the distributed delay, where the regression parameter function then corresponds to a history index function for the process of interest.

Let [math]\displaystyle{ (X(\cdot),\mathbf {U}(\cdot)) }[/math] denote multivariate stochastic process where [math]\displaystyle{ X(\cdot) }[/math] is a continuously differentiable process of interest. [math]\displaystyle{ \mathbf {U}(\cdot)=(U_{1}(\cdot),\ldots,U_{J}(\cdot))^{T} }[/math] is a vector function of additional covariates and [math]\displaystyle{ [t_{0},T] }[/math] is a time window of interest. The model is defined as

[math]\displaystyle{ \frac{dX(t)}{dt}=\alpha(t)+\int_{0}^{\tau_{0}}\gamma(s,t)X(t-s)ds + \int_{0}^{\tau_{1}} \gamma_{1}(s,t)U(t-s)ds+Z(t), t\in [t_{0},T], }[/math]

[math]\displaystyle{ X(t)=g(t), t\in[t_{0}-\tau_{0},t_{0}], }[/math]

where [math]\displaystyle{ g }[/math] is an initial condition process, [math]\displaystyle{ \tau_{0} }[/math], [math]\displaystyle{ \tau_{1} }[/math] are delays, [math]\displaystyle{ \alpha(t) }[/math] is a smooth function, [math]\displaystyle{ \gamma(s,t),\gamma_{1}(s,t) }[/math] are history index functions, and [math]\displaystyle{ Z(\cdot) }[/math] is a random drift process that is independent of [math]\displaystyle{ (X(\cdot),\mathbf {U}(\cdot)) }[/math]. For the purpose of illustration and technical derivations, we assume that [math]\displaystyle{ U(\cdot) }[/math] is a univariate process: the corresponding multivariate generalization is straightforward. By using the RDED described above, it is utilized to predict the growth rate of COVID-19 cases in the United States.[11]

References

  1. Cardot, Hervé; Ferraty, Frédéric; Sarda, Pascal (1999), "Functional Linear Model", Statistics & Probability Letters 45: pp. 11–22 
  2. Morris, Jeffrey S. (2015-04-10). "Functional Regression" (in en). Annual Review of Statistics and Its Application 2 (1): 321–359. doi:10.1146/annurev-statistics-010814-020413. ISSN 2326-8298. Bibcode2015AnRSA...2..321M. https://www.annualreviews.org/doi/10.1146/annurev-statistics-010814-020413. 
  3. Yao, Fang; Müller, Hans-Georg; Wang, Jane-Ling (2005-12-01). "Functional linear regression analysis for longitudinal data". The Annals of Statistics 33 (6). doi:10.1214/009053605000000660. ISSN 0090-5364. 
  4. Malfait, Nicole; Ramsay, James O. (2003). "The Historical Functional Linear Model". The Canadian Journal of Statistics 31 (2): 115–128. doi:10.2307/3316063. ISSN 0319-5724. 
  5. 5.0 5.1 5.2 5.3 Şentürk, Damla; Müller, Hans-Georg (2010). "Functional Varying Coefficient Models for Longitudinal Data". Journal of the American Statistical Association 105 (491): 1256–1264. doi:10.1198/jasa.2010.tm09228. ISSN 0162-1459. https://doi.org/10.1198/jasa.2010.tm09228. 
  6. Ramsay, J. O.; Silverman, B. W. (2005). "Functional Data Analysis" (in en-gb). Springer Series in Statistics. doi:10.1007/b98888. ISBN 978-0-387-40080-8. ISSN 0172-7397. https://link.springer.com/book/10.1007/b98888. 
  7. Müller, Hans-Georg (2016). "Peter Hall, Functional Data Analysis and Random Objects". The Annals of Statistics 44 (5): 1867–1887. doi:10.1214/16-AOS1492. ISSN 0090-5364. 
  8. Sentürk, Damla; Ghosh, Samiran; Nguyen, Danh V. (2014-05-01). "Exploratory time varying lagged regression: modeling association of cognitive and functional trajectories with expected clinic visits in older adults". Computational Statistics & Data Analysis 73: 1–15. doi:10.1016/j.csda.2013.11.001. ISSN 0167-9473. PMID 24436504. 
  9. Zemplenyi, M.; Meyer, M.; Cardenas, A.; Hivert, M.; Rifas-Shiman, S.; Gibson, Heike; Kloog, I.; Schwartz, J. et al. (2021). "Function-on-function regression for the identification of epigenetic regions exhibiting windows of susceptibility to environmental exposures". The Annals of Applied Statistics 15 (3): 1366–1385. doi:10.1214/20-aoas1425. PMID 36313278. 
  10. Imkeller, Peter; Schmalfuss, Björn (2001-04-01). "The Conjugacy of Stochastic and Random Differential Equations and the Existence of Global Attractors" (in en). Journal of Dynamics and Differential Equations 13 (2): 215–249. doi:10.1023/A:1016673307045. ISSN 1572-9222. https://doi.org/10.1023/A:1016673307045. 
  11. 11.0 11.1 Dubey, Paromita; Chen, Yaqing; Gajardo, Álvaro; Bhattacharjee, Satarupa; Carroll, Cody; Zhou, Yidong; Chen, Han; Müller, Hans-Georg (2021). "Learning delay dynamics for multivariate stochastic processes, with application to the prediction of the growth rate of COVID-19 cases in the United States". Journal of Mathematical Analysis and Applications 514 (2): 125677. doi:10.1016/j.jmaa.2021.125677. PMID 34642503.