Datasets:

LLM360
/

TxT360

License:

Dataset card Viewer Files Files and versions Community

Dataset Preview

Full Screen Viewer

Full Screen

Split (1)

train

The full dataset viewer is not available (click to read why). Only showing a preview of the rows.

The dataset generation failed

Error code:   DatasetGenerationError
Exception:    ArrowNotImplementedError
Message:      Cannot write struct type 'dup_signals' with no child field to Parquet. Consider adding a dummy child field.
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1869, in _prepare_split_single
                  writer.write_table(table)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_writer.py", line 578, in write_table
                  self._build_writer(inferred_schema=pa_table.schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_writer.py", line 399, in _build_writer
                  self.pa_writer = self._WRITER_CLASS(self.stream, schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 1010, in __init__
                  self.writer = _parquet.ParquetWriter(
                File "pyarrow/_parquet.pyx", line 2157, in pyarrow._parquet.ParquetWriter.__cinit__
                File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
                File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
              pyarrow.lib.ArrowNotImplementedError: Cannot write struct type 'dup_signals' with no child field to Parquet. Consider adding a dummy child field.
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1885, in _prepare_split_single
                  num_examples, num_bytes = writer.finalize()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_writer.py", line 597, in finalize
                  self._build_writer(self.schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_writer.py", line 399, in _build_writer
                  self.pa_writer = self._WRITER_CLASS(self.stream, schema)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 1010, in __init__
                  self.writer = _parquet.ParquetWriter(
                File "pyarrow/_parquet.pyx", line 2157, in pyarrow._parquet.ParquetWriter.__cinit__
                File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
                File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
              pyarrow.lib.ArrowNotImplementedError: Cannot write struct type 'dup_signals' with no child field to Parquet. Consider adding a dummy child field.
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1387, in compute_config_parquet_and_info_response
                  parquet_operations, partial, estimated_dataset_info = stream_convert_to_parquet(
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 980, in stream_convert_to_parquet
                  builder._prepare_split(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1740, in _prepare_split
                  for job_id, done, content in self._prepare_split_single(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1896, in _prepare_split_single
                  raise DatasetGenerationError("An error occurred while generating the dataset") from e
              datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

text string	meta dict	subset string
abstract: The Liénard equation is of a high importance from both mathematical and physical points of view. However a question about integrability of this equation has not been completely answered yet. Here we provide a new criterion for integrability of the Liénard equation using an approach based on nonlocal transformations. We also obtain some of previously known criteria for integrability of the Liénard equation as a straightforward consequences of our approach's application. We illustrate our results by several new examples of integrable Liénard equations. author: Nikolai A. Kudryashov, Dmitry I. Sinelshchikov, `firstname.lastname@example.com`, `firstname.lastname@example.com`, Department of Applied Mathematics, National Research Nuclear University MEPhI, 31 Kashirskoe Shosse, 115409 Moscow, Russian Federation title: On the criteria for integrability of the Liénard equation Key words: The Liénard equation; integrability conditions; nonlocal transformations; elliptic functions; general solutions. # Introduction In this work we study the Liénard equation $$y_{zz}+f(y)y_{z}+g(y)=0, \label{eq:L1}$$ where $f(y)$ and $g(y)$ are arbitrary functions, which do not vanish. Eq. is widely used in various applications such as nonlinear dynamics, physics, biology and chemistry (see, e.g. ). For example, some famous nonlinear oscillators such as the van der Pol equation, the Duffing oscillator and the Helmholtz oscillator (see, e.g. ) belong to family of equations . What is more, the Liénard equation often appears as a traveling wave reduction of nonlinear partial differential equations. Examples include but are not limited to the Fisher equation , the Burgers–Korteweg–de Vries equation and the Burgers–Huxley equation . Therefore, it is an important problem to find subclasses of Eq. which can be analytically solved. A problem of the construction of analytical closed–form solutions of Eq. has been considered in a few works. For example, integrability of equations of type using the Prelle-Singer method was studied in and for some particular cases of general solutions were obtained. Lie point symmetries of was studied in and families of equations which can be either linearized by point transformations or integrated by the Lie method were found. Authors of reduced the Liénard equation to the Abel equation and used the Chiellini lemma to find a criterion for integrability of Eq. . A connection given by non–local transformations between a second–order linear differential equation and equation of type was studied in . However, in the above mentioned works not all possible integrable cases of Eq. have been found. In this work we find a new criterion for integrability of the Liénard equation. In other words we present a new class of the Liénard equations which can be analytically solved. To this end we use an approach that has recently been proposed in . Main idea of this approach is to find a connection between studied nonlinear differential equation and some other nonlinear differential equation which has the general closed–form analytical solution. Here we suppose that such a connection is given by means of nonlocal transformations that generalize the Sundman transformations . Then we use these transformations in order to convert Eq. into a subcase of Eq. which general solution is expressed in terms of the Jacobian elliptic functions. We illustrate effectiveness of our approach by providing several new examples of integrable Liénard equations. Furthermore, we show that some of previously obtained integrability conditions are consequences of the fact that under these conditions the Liénard equation can be linearized by means of nonlocal transformations. To the best of our knowledge our results are new. The rest of this work is organized as follows. In the next section we present a new criterion for integrability of Eq. . We also discuss previously obtained criteria for integrability of Eq. in the context of our approach's application. In Section 3 we illustrate our results by several new examples of integrable Liénard equations and construct the general solutions of them. In the last Section we briefly summarize and discuss our results. # Main results In this section we consider a connection between Eq. and an equation that is subcase of Eq. and its general solution can be expressed in terms of the Jacobian elliptic functions. This connection is given by means of the following transformations $$w=F(y), \quad d\zeta=G(y)dz, \quad F_{y}G\neq0, \label{eq:L3}$$ where $\zeta$ and $w$ are new independent and dependent variables correspondingly. Among equations there is an equation that is of the Painlevé type and can be solved in terms of the elliptic functions (see, e.g. ). In this work we study a connection between Eq. and this equation that has the form $$w_{\zeta\zeta}+3w_{\zeta}+w^{3}+2w=0. \label{eq:L1_3}$$ The general solution of is given by $$w=e^{-(\zeta+\zeta_{0})}\mbox{cn}\{e^{-(\zeta+\zeta_{0})}-C_{1},1/\sqrt{2}\}, \label{eq:L1_7}$$ where $\mbox{cn}$ is the Jacobian elliptic cosine, $\zeta_{0}$ and $C_{1}$ are arbitrary constants. It is worth noting that by scaling transformations Eq. can be cast into the from $w_{\zeta\zeta}+3\mu w_{\zeta}+\nu w^{3}+2\mu^{2}w=0$, where $\mu$ and $\nu$ are arbitrary nonzero parameters. Since these scaling transformations can be included into , without loss of generality we can assume that $\mu=\nu=1$. Note also that there is a singular solution of which has the form $w=\pm \sqrt{2}i e^{-\zeta}/(e^{-\zeta}-C_{1})$. Let us finally remark that Eq. is invariant under the transformation $w\rightarrow -w$ and, therefore, we can use either plus or minus sign in the right–hand side of formula . Now we are in position to present our main results. Theorem 1. Eq. can be transformed into by means of transformations with $$F(y)=\gamma\left(\int f(y) dy+\delta\right), \quad G(y)=\frac{1}{3}f(y), \label{eq:L1_5}$$ if the following correlation on functions $f(y)$ and $g(y)$ holds $$g(y)=\frac{f(y)}{9} \, \left[\int f(y) dy+\delta\right] \Big( \gamma^{2}\left[\int f(y) dy+\delta\right]^{2}+2 \Big) , \label{eq:L1_1}$$ where $\gamma\neq0$ and $\delta$ are arbitrary constants. Proof. One can express $y_{z}$, $y_{zz}$ via $w_{\zeta}$, $w_{\zeta\zeta}$ with the help of . Substituting these expressions along with into Eq. and requiring that the result is we obtain a system of two ordinary differential equations on functions $F$ and $G$ and an algebraic correlation on functions $f(y)$ and $g(y)$. Solving these equations with respect to $F$, $G$ and $g$ we obtain formulas and . Note that one can substitute transformations into and require that the result is , which leads to the same formulas and . This completes the proof. One can see that integrability condition does not coincide with integrability conditions previously obtained in . Using results from works it can be seen that Eq. under condition admits less than two Lie symmetries, and, thus it can be neither integrated by the Lie method nor linearized by point transformations. Moreover, below we show that condition is different from a condition for linearizability of Eq. via nonlocal transformations. Therefore, condition give us a completely new class of integrable Liénard equations. In the next section with the help of we find several new examples of integrable Liénard equations. Now we consider transformations of given by means of into a linear second order differential equation. Theorem 2. Eq. can be transformed into $$w_{\zeta\zeta}+\sigma w_{\zeta}+w=0, \label{eq:L3_3}$$ via transformations with $$F(y)=\lambda \left(\int f(y)dy+\kappa \right), \quad G(y)=\frac{1}{\sigma}f(y), \label{eq:L3_5}$$ if the following correlation on functions $f(y)$ and $g(y)$ holds $$g(y)=\frac{f(y)}{\sigma^{2}}\left[\int fdy +\kappa \right], \label{eq:L3_1}$$ where $\sigma\neq0$, $\lambda \neq0$ and $\kappa$ are arbitrary parameters. Proof. One can express $y_{z}$, $y_{zz}$ via $w_{\zeta}$, $w_{\zeta\zeta}$ with the help of . Substituting these expressions along with into Eq. and requiring that the result is we obtain a system of two ordinary differential equations on functions $F$ and $G$ and an algebraic correlation on functions $f(y)$ and $g(y)$. Solving these equations with respect to $F$, $G$ and $g$ we obtain formulas and . Note that one can substitute transformations into and require that the result is , which leads to the same formulas and . This completes the proof. One can see that correlation coincide with one of integrability criteria obtained in with the help of the Chiellini lemma. Therefore, we see that this integrability condition can be found directly from without transforming it into the Abel equation. What is more, substituting $f(y)=ay^{q}+b$ or $f(y)=ay+b$, where $a,q \neq 0$ and $b$ are arbitrary parameters, into we get some of integrability conditions obtained in . In the case of $f(y)=ay^{q}+b$ we have a subcase of equation which admits two Lie symmetries, while in the case of $f(y)=ay+b$ we obtain equation with maximal Lie point symmetries. Thus, we see that some of previously known integrability conditions for the Liénard equation are consequences of the fact that this equation can be linearized by nonlocal transformations providing that condition holds. It is also worth noting that correlation can be found from sufficient conditions for the equation $y_{zz}+\lambda_{2}(z,y) y_{z}^{2}+\lambda_{1}(z,y) y_{z}+\lambda_{0}(z,y)=0$ to be linearizable via nonlocal transformations . In this Section a new condition for integrability of the Liénard equation has been obtained. It has also been shown that some of previously known conditions for integrability of the Liénard equation are consequences of linearizability of the corresponding Liénard equation by nonlocal transformations. # Examples In this section we provide three new families of integrable Leinard equations. We consider three different cases of the coefficient function $f(y)$: a linear function, a rational function and an exponential function. Example 1: a generalized Emden–type equation. Let us suppose that $f(y)=\alpha y+\beta$, $\gamma=2$ and $\delta=0$, where $\alpha\neq0$ and $\beta$ are arbitrary parameters. Then from we have that $$\begin{gathered} F(y)=y(\alpha y+2\beta), \quad G(y)=(\alpha y+\beta)/3, \label{eq:L5} \end{gathered}$$ and from , we find corresponding Liénard equation $$y_{zz}+(\alpha y+\beta) y_{z}+\frac{y}{18}(\alpha y+\beta)(\alpha y +2\beta)(y^{2}[\alpha y+2\beta]^{2}+2)=0. \label{eq:L5_1}$$ Using , and we get the general solution of Eq. $$y=\frac{1}{\alpha}\left[\pm \left(\beta^{2}+\alpha e^{-(\zeta-\zeta_{0})}\mbox{cn}\left\{e^{-(\zeta-\zeta_{0})}-C_{1},\frac{1}{\sqrt{2}}\right\}\right)^{1/2}-\beta\right], \, z=\int \frac{3}{\alpha y+\beta} d\zeta. \label{eq:L5_3}$$ Let us remark that Eq. can be considered as a generalization of the modified Emden equation or as a traveling wave reduction of the generalized Burgers–Huxley equation . Plots of solution corresponding to different initial conditions and at different values of $\alpha$ and $\beta$ are presented in Fig.. From Fig. one can see that Eq. admits kink–type and pulse–type analytical solutions. Note that all solutions presented in Fig. correspond to the plus sign in formula . Example 2: an equation with rational nonlinearity. In this example we assume that $f(y)=\alpha/y^{2} +\beta$, $\gamma=1/2$ and $\delta=0$, where $\alpha$ and $\beta$ are arbitrary nonzero parameters. As a result, from , and we find functions $F$ and $G$ $$\begin{gathered} F(y)=\frac{1}{2y}(\beta y^{2}-\alpha), \quad G(y)=\frac{\beta y^{2}+\alpha}{3y^{2}}, \label{eq:L5_5} \end{gathered}$$ and corresponding Liénard equation $$y_{zz}+ \left(\frac{\alpha}{y^{2}} +\beta\right)y_{z}+\frac{\beta^{2}y^{4}-\alpha^{2}}{36y^{5}}\left[(\beta y^{2}-\alpha)^{2}+8y^{2}\right]=0. \label{eq:L5_7}$$ The general solution of is given by $$\begin{gathered} y=\frac{1}{\beta}\big(e^{-(\zeta-\zeta_{0})}\mbox{cn}\{e^{-(\zeta-\zeta_{0})}-C_{1},1/\sqrt{2}\}\pm \hfill \quad \quad \quad \quad \quad \quad \\ \quad \quad \quad\pm\sqrt{e^{-2(\zeta-\zeta_{0})}\mbox{cn}^{2}\{e^{-(\zeta-\zeta_{0})}-C_{1},1/\sqrt{2}\}+\beta\alpha}\big),\\ z=\int \frac{3y^{2}}{\beta y^{2}+\alpha} d\zeta. \label{eq:L5_9} \end{gathered}$$ Formula describes different types of solutions of Eq. depending on values of the parameters $\alpha$ and $\beta$ and initial conditions (i.e. values of $\zeta_{0}$ and $C_{1}$). In Fig. we demonstrate two pulse–type solutions of which correspond to the plus sign in . Example 3: an equation with exponential nonlinearity. Let us assume that $f(y)=\beta e^{\alpha y}$, $\gamma=1$, $\delta=-1$. Then from we have that $$F(y)=\frac{\beta}{\alpha}e^{\alpha y}-1, \quad G(y)=\frac{\beta}{3}e^{\alpha y}, \label{eq:L5_11}$$ and from , we find corresponding Liénard equation $$y_{zz}+ \beta e^{\alpha y} y_{z}+\frac{\beta}{9\alpha^{3}}e^{\alpha y}(\beta e^{\alpha y}-\alpha)([\beta e^{\alpha y}-\alpha]^{2}+2\alpha^{2})=0. \label{eq:L5_15}$$ The general solution of can be written as follows $$\begin{gathered} y=\frac{1}{\alpha}\ln\left\{\frac{\alpha}{\beta} \left(e^{-(\zeta-\zeta_{0})}\mbox{cn}\left\{e^{-(\zeta-\zeta_{0})}-C_{1},\frac{1}{\sqrt{2}}\right\}+1\right)\right\}, \, z=\int\ \frac{3}{\beta} e^{-\alpha y} d\zeta \label{eq:L5_17} \end{gathered}$$ We demonstrate plots of solution corresponding to different values of the parameters $\alpha$ and $\beta$ and to different initial conditions in Fig.. It can be seen that Eq. admits pulse–type and kink–type solutions. In this section we have presented three new integrable Lienard equations. The general closed–form solutions of these equations have been found. We have demonstrated that these solutions describe various types of dynamical structures. # Conclusion In this work we have studied the Liénard equation. We have obtained a new integrability condition for this equation. It is worth noting that class of the Liénard equations corresponding to this condition can be neither integrated by the Lie method nor linearized by point or nonlocal transformations. We have demonstrated effectiveness of our approach by presenting three new examples of integrable Liénard equations. The general solutions of these equations have been constructed and analyzed. We have also shown that some previously obtained integrability conditions follow from linearizabily of the corresponding Liénard equations by nonlocal transformations. # Acknowledgments This research was supported by Russian Science Foundation grant No. 14–11–00258.	{ "dup_signals": {}, "filename": "out/1608.01636_extract_LEC_7.tex.md" }	arxiv
abstract: Polarons, which arise from the self-trapping interaction between electrons and lattice distortions in a solid, have been known and extensively investigated for nearly a century. Nevertheless, the study of polarons continues to be an active and evolving field, with ongoing advancements in both fundamental understanding and practical applications. Here, we present a microscopic model that exhibits a diverse range of dynamic behavior, arising from the intricate interplay between two excitation-phonon coupling terms. The derivation of the model is based on an experimentally feasible Rydberg-dressed system with dipole-dipole interactions, making it a promising candidate for realization in a Rydberg atoms quantum simulator. Remarkably, our analysis reveals a growing asymmetry in Bloch oscillations, leading to a macroscopic transport of non-spreading excitations under a constant force. Moreover, we compare the behavior of excitations, when coupled to either acoustic or optical phonons, and demonstrate the robustness of our findings against on-site random potential. Overall, this work contributes to the understanding of polaron dynamics with their potential applications in coherent quantum transport and offers valuable insights for research on Rydberg-based quantum systems. author: Arkadiusz Kosior; Servaas Kokkelmans; Maciej Lewenstein; Jakub Zakrzewski; Marcin Płodzień bibliography: bibliography_polaron.bib title: Phonon-assisted coherent transport of excitations in Rydberg-dressed atom arrays # Introduction Polarons are quasi-particles that emerge from the coupling between electrons (or holes) with ions of a crystalline structure in polarizable materials. The idea of electron self-trapping due to lattice deformations dates back to Landau's seminal 1933 paper , but the modern concept of a polaron as an electron dressed by phonons was formulated in 1946 by Pekar , and developed later by Fröhlich , Feynman , Holstein , and Su, Schrieffer and Heeger . Since their discovery, polarons have been extensively investigated, both theoretically and experimentally, not only in the field of condensed matter physics (for reviews see Refs. ), but also in various chemical and biological contexts, e.g., in protein propagation . In particular, in the modeling of charge migration in DNA molecules, it is assumed that a localized polaron is formed in the helix near a base due to an interaction between a charge carrier and a phonon. When a uniform electric field is applied, the polaron moves at a constant velocity, and a current flows through the chain . The charge carrier transport takes place due to coupling between carrier and phonons; in contrast, in the absence of phonons, an external constant force induces Bloch oscillations , where the mean position of the carrier is constant while its width periodically changes in time. Polarons have been studied in many, seemingly different experimental setups, ranging from ultracold ions , polar molecules , mobile impurities in Bose and Fermi gases , ultracold Rydberg atoms , to quantum dots on a carbon nanotube . Although each of these platforms possesses its unique strengths and benefits, recently there has been an exceptional outburst of interest in quantum simulation and computation with Rydberg atoms, which provide a remarkable level of flexibility for executing quantum operations and constructing quantum many-body Hamiltonians . While the latter can contribute to our comprehension of the static properties of many-body systems, their main benefits are centered around exploring the complex dynamics displayed by these systems. In particular, in the context of polarons, it has been demonstrated that the dipole-dipole interactions between distinct Rydberg-dressed states can result in coherent quantum transport of electronic-like excitations , which can further be coupled to optical phonons . The paradigmatic one-dimensional topological Su-Schrieffer-Heeger (SSH) model describing the soliton formation in long-chain polyacetylene due to excitation-phonon coupling, has been realized in Rydberg arrays . In this paper, we continue along this path and present theoretical studies of an implementation of a microscopic model featuring the interplay of Su-Schrieffer-Heeger (SSH) and Fröhlich electron-phonon interaction terms under the influence of an external force and disorder. In particular, we focus on the directional transport of an excitation interacting with phonons. We indicate an excitation-phonon coupling regime where the competition between Bloch oscillations and interactions results in the coherent transport of a well-localized wave packet over a long distance. Moreover, we show the robustness of such a coherent transport of well-localized wave packets to the on-site random potential, indicating that a relatively strong disorder does not affect significantly the transport properties. The paper is divided into three parts. In the first part, Sec. , we describe the physical setup and derive the effective Hamiltonian in Rydberg-dressed atomic arrays. The second part, described in Section , focuses on the dynamics of the system under experimentally relevant parameters. In this section, we observe the macroscopic transport of the center of mass and a transition between Bloch oscillations and moving polaron regimes. In the third part, Sec. , we comprehensively analyze the previously derived microscopic model, which exhibits a rich phase diagram due to the interplay of two different electron-phonon coupling mechanisms. Finally, we compare the behavior of excitations with acoustic and optical phonons and demonstrate the robustness of our results. # The model and its Hamiltonian We consider a one-dimensional chain of $N$ equidistant Rydberg atoms with lattice constant $x_0$ and positions $x_j = j x_0$, confined in a periodic trap, implemented either by an optical lattice , an optical tweezer array , a Rydberg microtrap , or a painted potential . We assume that the spatial motion of the atoms is suppressed by the strong confinement of each Rydberg atom in local potential minima. Although the atomic motion is frozen, it is remarkable that such a Rydberg system can display highly non-trivial dynamics. In particular, the induced dipole-dipole interactions between distinct Rydberg-dressed states can lead to the emergence of coherent quantum transport of electronic-like excitations . In the following, we first briefly repeat the derivation of the Hamiltonian that characterizes the dynamics of single excitations . The purpose of this recap is to modify the setup in order to incorporate nearly arbitrary on-site potential terms. Next, after introducing phonons into the system , we derive an effective nearest-neighbor Hamiltonian that includes two excitation-phonon coupling terms, which we comprehensively study in the forthcoming sections, focusing on the dynamics in the presence of an external constant field. ## Single excitation Hamiltonian in arbitrary potentials We assume that each Rydberg atom of can be initially found in one the ground state hyperfine levels, $\|g\rangle$ or $\|g'\rangle$. By applying far-detuned dressing laser fields, with effective Rabi frequencies $\Omega_{s}$, $\Omega_{p}$ and detunings $\Delta_{s}$, $\Delta_{p}$ respectively, these two hyperfine states can be coherently coupled to selected highly excited Rydberg states, $\|s\rangle$ or $\|p\rangle$, with principal quantum number $n\gg 1$ and different angular momenta. Consequently, each atom can be found in one of the two Rydberg dressed states , which are a slight admixture of Rydberg states to the atomic ground states, $$\|0\rangle_j \approx \|g\rangle_j + \alpha_s \|s\rangle_j \quad \mbox{or} \quad \|1\rangle_j \approx \|g'\rangle_j + \alpha_p \|p\rangle_j ,$$ with $\alpha_{s/p} = \Omega_{s/p}/[2\Delta_{s/p}]$ and $j$ denoting the position of an atom. Treating $\alpha_{s}$, $\alpha_{p}$ as perturbation parameters in van Vleck perturbation theory, Wüster at al. have shown that the dipole-dipole interaction can exchange the internal states of a neighboring pair, e.g. $\|1\rangle_1 \|0\rangle_2 \rightarrow \|0\rangle_1 \|1\rangle_2$. This process can be viewed as a hopping of an excitation from $j=1$ to $j=2$ lattice site, which conserves the number of excitations. The perturbation analysis can be extended to a chain of $N$ atoms, where the effective Hamiltonian in the single excitation manifold (up to the fourth order in $\alpha_{s}$ and $\alpha_{p}$) reads $$\label{Hamiltonian_alltoall} \hat H_0 = \sum_j \hat n_j (E_2 + E_4+ A_j) + \sum_{j,k} A _{jk} \hat a_j^\dagger \hat a_k,$$ where $\hat a_j$ ($\hat a^\dagger_j$) denote an annihilation (creation) operator of excitation on site $j$, while $$\label{eq:coefficents} \begin{align} \label{Aj} A _j &= \hbar \alpha_s^2 \alpha_p^2 \left(\sum_{k \ne j} \frac{1}{1-\bar U_{kj}^2} \right) (\Delta_s + \Delta_p), \\ \label{Ajk} A_{jk} &= \hbar \alpha_s^2 \alpha_p^2 \frac{\bar U_{jk}}{1-\bar U_{jk}^2} (\Delta_s + \Delta_p), \end{align}$$ with $\bar U_{jk} = C_3/[\hbar \left\| x_i- x_j\right\|^{3} (\Delta_s + \Delta_p)]$ and $C_3$ quantifying the transition dipole moment between the Rydberg states, describe perturbative dipole-dipole interactions. Finally, $E_2$ and $E_4$ are constant energy shifts of the second and fourth order, respectively, $$\label{eq:E1E2} \begin{align} \label{E2} E_2 /\hbar &= (N-1)\alpha_s^2 \Delta_s + \alpha_p^2 \Delta_p,\\ \label{E4} E_4 /\hbar&= (N-1) \alpha_s^4 \Delta_s+ \alpha_p^4 \Delta_p \nonumber \\ & + (N-1)\alpha_s^2 \alpha_p^2 (\Delta_s+ \Delta_p) . \end{align}$$ Although in principle constant energy terms could be always ignored as they do not contribute to the dynamics of excitations, let us consider now a scenario where the Rabi frequency $\Omega_p$ depends on the atomic position on the lattice, i.e., we assume that $$\Omega_p \rightarrow \Omega_p(j) \equiv \Omega_p \left[1 + \delta\Omega(j) \right],$$ where $\delta\Omega(j)$ is arbitrary, but small correction of the order $(\alpha_{p/s})^2$. With this assumption, and by retaining terms up to the fourth order, the effective Hamiltonian in Eq. acquires an additional term, namely $$\label{ham_with_force} \hat H = \hat H_0 + \hbar \alpha_p^2 \Delta_p \sum_j \delta \Omega(j) \hat n_j \;.$$ Because the term proportional to $\alpha_p^2\delta\Omega(j)$ is of the same order as $A_j$, it can be incorporated into the definition of $A_j$ in Eq. . With this simple modification, we have gained a position-dependent effective potential term that can strongly affect the dynamics of excitations. Although the potential term can be tailored almost arbitrarily, from now on we consider one of its simplest forms, i.e., we choose $$\delta\Omega(j) = 2 \alpha_s^2 \left(F j + \epsilon_j \right).$$ The first term in the parentheses, being linearly proportional to position $j$, emulates the presence of a constant external field $F$. The second term, with $\epsilon_j$ being a random variable, gives rise to the on-site potential disorder. Note that both terms lead to localization of the excitation either due to Stark localization in a constant tilt, $F$, or Anderson localization due to random $\epsilon_j$. As explained in the next part, the situation is not so straightforward. ## Excitation-phonon Hamiltonian In this part, we relax our previous assumption that the atoms of the array are completely immobile. Although we still assume that no atom can move through the lattice, we now let them vibrate in the vicinity of their local equilibrium points. This will affect, as we shall see, the dynamics of excitations. We consider now a scenario where an atom in the $j$-th lattice site and with mass $m$ may oscillate with a frequency $\omega_0 = \sqrt{k/m}$ inside a local potential well, that can be approximated by a quadratic potential $$\frac{k}{2} (x-j x_0)^2 \equiv \frac{k x_0^2}{2} (u_j)^2,$$ with $k$ being the force constant and where $u_j$ denotes dimensionless distortion from the local equilibrium position. The motion of atoms can be quantized $u_j \rightarrow \hat u_j$ and described by a simple quantum harmonic oscillator. This vibrational motion is responsible for the distortion of an atomic array and can be considered as a phonon. Since the Hamiltonian of the previous section describing the motion of single excitations strongly depends on the position of atoms, phonons can propagate through space due to the coupling to excitations. Before proceeding to derive the effective Hamiltonian of the system with phonon-excitation coupling, for clarity and simplicity we assume that $$\alpha \equiv \alpha_s=\alpha_p , \quad \Delta \equiv \Delta_s=\Delta_p .$$ Moreover, from now on we also fix the time and energy scales and go to the dimensionless units by dividing all the energy scales by $2\hbar \alpha^4 \Delta$. Although the setup described in Section admits only dispersionless optical phonons that correspond to local vibrations of atoms around local minima, we consider here two different types of phonons. We proceed by writing the phononic Hamiltonian explicitly in terms of the dimensionless position and momentum operators $\hat{u}_j$, $\hat{p}_j$ of local distortions, $$\label{ham_phonons} \hat{H}_\text{ph} = \sum_j \frac{\hat{p}^2_j}{2 m_{\text{eff}}} + \frac{m_{\text{eff}}\omega_{\text{eff}}^2}{2} (\hat{u}_j - \eta \hat{u}_{j-1})^2,$$ with the effective dimensionless mass, $$\label{eq:eff_mass} m_{\text{eff}} = 2 m x_0^2 \alpha^4 \Delta / \hbar,$$ and the effective oscillator frequency, $$\label{eq:eff_omega} \omega_{\text{eff}} =\omega_0 / (2 \alpha^4 \Delta), \quad \omega_0 = \sqrt{k/m},$$ where $\omega_0$ is the bare frequency. By changing the parameter $\eta$ in Eq. , diverse phonon types can be achieved. In particular, $\eta=0$ corresponds to the aforementioned local vibrations (i.e., dispersionless optical phonons), and $\eta=1$ describes acoustic phonons. These two phonon types are characterized by the dispersion relation $$\label{eq:disp_relation} \epsilon_{q} = % \left\{ \begin{cases} \omega_{\text{eff}} , & \mbox{(optical phonons, $\eta = 0$)} \\ 2 \omega_{\text{eff}} \left\|\sin(q x_0/2)\right\|, & \mbox{(acoustic phonons, $\eta = 1$)} \end{cases}, % \right.$$ which can be readily found by writing the phononic Hamiltonian in terms of its eigenmodes, $$\hat{H}_\text{ph} = \sum_q \epsilon_{q} \left(\hat{b}^\dagger_q\hat{b}_q+\frac{1}{2}\right),$$ where $\hat{b}^\dagger_q$ ($\hat{b}_q$) creates (annihilates) the phonon with quasi-momentum $q$, and are related to the local dimensionless momentum and position operators $\hat{p}_i$, $\hat{u}_i$ of distortion by $$\begin{split} \hat{u}_j &= \sum_q \frac{1}{\sqrt{2 N \epsilon_{q} m_{\text{eff}}}}(\hat{b}_q+ \hat{b}^\dagger_{-q})e^{iqjx_0},\\ \hat{p}_j &= -i\sum_q \sqrt{\frac{ \epsilon_{q} m_{\text{eff}}}{ 2 N }}(\hat{b}_q - \hat{b}^\dagger_{-q}) e^{iqjx_0}. \end{split}$$ Having discussed the phononic degrees of freedom, we can now write the fully effective Hamiltonian governing the motion of single excitations coupled to phonons. The derivation is straightforward and requires: (i) the expansion of the position-dependent coefficients \[given by Eq. \] in the Hamiltonian of the previous section up to the first order in $\hat u_j$, and (ii) dropping the next-to-nearest neighbor contributions . By following these steps, we obtain the effective excitation-phonon Hamiltonian \[cf. Fig. \], which consists of four parts, i.e., $$\label{eq:Hamiltonian_final} \hat{H}_\text{eff} = \hat{H}_\text{ph} + \hat{H}_\text{ex} + \hat{H}_\text{J} + \hat{H}_\text{W},$$ where $\hat{H}_\text{ph}$ is the phononic Hamiltonian, Eq. , $$\label{H_ex} \hat{H}_\text{ex} = J_0(\hat{a}^\dagger_{j+1}\hat{a}_j + \mbox{H.c.}) + \sum_j (j F +\epsilon_j)\hat{a}^\dagger_j\hat{a}_j ,$$ describes excitations with the hopping amplitude $$\label{J0} J_0 = \kappa/(1-\kappa^2), \quad \kappa = C_3/(2\hbar \Delta x_0^3),$$ experiencing an external constant force $F$, and a local on-site disorder $\epsilon_j$. Finally, $$\label{eq:phonon-coupling} \begin{align} \label{SSH} \hat{H}_\text{J} &= g_J\sum_j (\hat{u}_{j+1} - \hat{u}_{j}) \hat{a}^\dagger_{j+1}\hat{a}_j + \mbox{H.c.} \, , \\ \label{Frohling} \hat{H}_\text{W} &= g_W \sum_j (\hat{u}_{j+1} - \hat{u}_{j-1})\hat{a}^\dagger_j\hat{a}_j, \end{align}$$ are the notable SSH and Fröhling Hamiltonians , respectively, that correspond to two different mechanisms of excitation-phonon couplings, with dimensionless coupling parameters $$\label{gjgw} \begin{align} \label{gj} g_J &= -3\kappa(1+\kappa^2)/(\kappa^2-1)^2, \\ \label{gw} g_W &= -6\kappa^2/(\kappa^2-1)^2 . \end{align}$$ ## Equations of motion The full numerical analysis of the polaron dynamics on the many-body level is one of the most challenging computational tasks, due to the non-conserved total number of phonons in the system, which prevents it from working in a restricted, fixed particle-number Hilbert space sector of the phononic degrees of freedom. Additionally, even without a force $F$ the effective Hamiltonian of the systems depends, in principle, on many parameters, namely $J_0, g_W, g_J, \omega_{\rm eff}$ and $m_{\rm eff}$, making the full analysis of the system even more challenging. To analyze the dynamical properties of the considered system, in the following, we assume that the phononic degrees of freedom are independent in each lattice site. We make the semiclassical approximation by applying the Davydov Ansatz , i.e., we assume that phonons are in a coherent state and that the full wave function is a product state of the excitation and coherent phonons part, as $$\label{Davydov_anzatz} \| \Psi (t)\rangle = \left(\sum_j\psi_j(t)\hat{a}_j^\dagger\right)\otimes\left( e^{-i \sum_n\left[ u_j(t)\hat{p}_j-p_j(t)\hat{u}_j\right]}\right)\|\mathtt{vac}\rangle,$$ where $\|\psi_j(t)\|^2$ is a probability of finding an excitation at a site $j$, $u_j(t)$ and $p_j(t)$ are expectation values of phononic position and momentum operators. The equation of motion for $\psi_j(t)$ and $u_j(t)$ can be subsequently derived from a classical conjugate variable Heisenberg equations of motions using the generalized Ehrenfest theorem, see, for example, Ref. . By following these steps, we obtain a closed set of coupled differential equations for the excitation amplitude $\psi_j(t)$ and classical field $u_j(t)$. The equations can be written in a concise form, as $$\label{eq:DavidovEqs} \begin{align} \label{eq:DavidovEq1} i \dot{\psi_j} &= J_j \psi_{j+1} + J_{j-1} \psi_{j-1} + W_j \psi_j , \\ \label{eq:DavidovEq2} \ddot{u}_j &= -\omega_{\text{eff}}^2 \, {\cal D}[u_j] + {\cal S}[\psi_j], \end{align}$$ where the effective potential experienced by an excitation $W_j(t) = j F +\epsilon_j + g_W[u_{j+1}(t) - u_{j-1}(t)]$, and the effective hopping amplitude $J_j(t) = J_0 + g_J[u_{j+1}(t) - u_j(t)]$ are both time-dependent functions due to the coupling to the gradient of phononic field $u_j(t)$. As such, both $W_j(t)$ and $J_j(t)$ are responsible for the self-trapping of an excitation. Similarly, the phononic equation also depends on the excitation amplitude $\psi_i(t)$ through the ${\cal S}[\psi_j]$ operator, given by $$\begin{split} {\cal S}[\psi_j] = - \frac{g_W}{m_{\text{eff}}} (\|\psi_{j+1}\|^2 - \|\psi_{j-1}\|^2)\\ - \frac{g_J}{m_{\text{eff}}}[\psi_j^(\psi_{j+1} - \psi_{j-1}) + \mbox{c.c.}], \end{split}$$ which acts as a time-dependent source for the phonon field $u_j(t)$. Finally, the phononic dispersion relation, given by Eq. , is necessarily present in the phononic equation through the ${\cal D}[u_j]$ operator, $${\cal D}[u_j]= \begin{cases} u_j , & \eta = 0, \\ 2u_j - u_{j+1}-u_{j-1}, & \eta = 1, \end{cases}\;,$$ which introduces a crucial difference in the propagation of optical ($\eta=0$) and acoustic ($\eta=1$) phonons , which we investigate in the next sections. ## Analysed observables Throughout this article we choose the initial conditions $\psi_j(0) = \delta_{j,0}$ and $u_j(0) = \dot u_j(0) = 0$ for the equations of motion, Eq. , that correspond to a single excitation on a central lattice site and initially unperturbed lattice. Without a phonon-coupling and for $F=0$, these initial conditions simply correspond to a quantum particle that spreads symmetrically in both lattice directions characterized by a constant Lieb-Robinson velocity , so that its center of mass remains localized at the initial position. Contrary to the classical case, a quantum particle on a lattice will not even move in the presence of a constant force $F$, but instead, it starts to perform Bloch oscillations . The situation is different in interacting systems, either in a case of particle-particle interactions , which may further lead to disorder-free many-body localization , or in the presence of phonons, which can induce transient polarons at the end of Bloch oscillation periods (see also Ref. ). In this study, we investigate how the propagation of a single excitation is influenced by the two competing phonon-coupling mechanisms under the applied, constant force. Specifically, we aim at answering the two following questions: (i)* how much does the excitation spread due to the coupling with phonons, and (ii) does its center of mass move in the presence of the constant force $F$? In order to respond to these questions we focus on three simple observables that can be calculated based on the local density measurements. First, we consider the participation ratio (PR), defined as : $$\label{eq:PR} \mbox{PR}(t) = \bigg(\sum_j\left\|\psi_j(t)\right\|^4\bigg)^{-1},$$ where we have assumed a unit normalization of the wavefunction $\sum_j\|\psi_j\|^2=1$. The participation ratio PR is equal to $1$ where excitation is localized on a single lattice site and equals $N$ when is completely delocalized over the whole lattice. The second observable is the center of mass position of the wave packet, i.e., $$\label{eq:xcm} x(t) = \sum_{j=-N/2}^{N/2} j \left\|\psi_j(t)\right\|^2.$$ Moreover, in some cases, analyzing the ratio of the two quantities mentioned above can provide valuable insights. We define this ratio, denoted as $\xi$, as: $$\label{eq:xi} \xi(t) = \frac{\|x(t)\|}{\mbox{PR}(t)}.$$ $\xi$ is a quantity ranging from 0 to $\xi_{\textrm{max}} = N/2$. The maximum value $\xi_{\textrm{max}}$ corresponds to a moving, maximally-localized, non-dispersive solution that has reached the boundary of the system. As such, $\xi$ can be viewed as an indicative measure for selecting well-localized solutions moving in one direction. Finally, it is worth mentioning that it is often not necessary to analyze the entire time range of the above observables. In fact, to discern various dynamic behaviors, it is usually sufficient to look at $\mbox{PR}(t)$, $x(t)$ and $\xi(t)$ at the final evolution time $t_f \gg 1$. For example, large $\mbox{PR}(t_f)$ (relative to the system size $N$) suggests that excitation is not stable and has delocalized over a lattice. # Polaron dynamics: experimental considerations In this section we elaborate on the results of the previous sections and study the dynamics of a Rydberg excitation under the presence of the external force $F$, solving the equations of motion for a physically relevant range of parameters. The effective Hamiltonian of the system relies on several effective, dimensional parameters, including $m_{\text{eff}} = 2 m x_0^2 \alpha^4 \Delta / \hbar$, $\omega_{\text{eff}} =\omega_0 / (2 \alpha^4 \Delta)$, as well as $J_0$, $g_J$, $g_W$, given by Eq. and Eqs. . However, it is worth noting that the latter three parameters are not independent within our setup, and their values are determined by a single parameter $\kappa = C_3 / (2\hbar \Delta x_0^3)$. This provides us with significant flexibility in selecting appropriate physical parameters for our convenience. In the following, we choose the highly excited Rydberg states $\|s\rangle$, $\|p\rangle$ of Rubidium-87 with principal quantum number $n=50$ and angular momentum equal to 0 or $\hbar$, for which $C_3 = 3.224$ GHz$\times\mu m^{-3}$. We fix the lattice spacing $x_0 = 2\, \mu \mbox{m}$, and the local trap frequency $\omega_0 =20$ kHz. In the numerical simulations, we vary the dimensionless parameter $\kappa$ between $0.80-0.86$, which is equivalent to the change of the detuning $\Delta \sim 234-252$ MHz, and corresponds to the dressing parameter $\alpha \sim 0.04$. Importantly, by increasing $\kappa$ we also increase the phonon coupling strength from around $g_J/m_{\textrm{eff}} \sim g_W/m_{\textrm{eff}} \sim-4.5$ to $g_J/m_{\textrm{eff}} \sim g_W/m_{\textrm{eff}} \sim-8$. Furthermore, we remind the reader that in our setup only the optical phonons (i.e., dispersionless vibrations) are experimentally relevant and, therefore, in this section we set $\eta=0$. Finally, we fix the value of the force at $F=0.2$, and we choose the system size to $N=401$. In order to characterize the transport properties of an excitation $\psi_i(t)$, in the top panel of Fig. we plot its center of mass position $x(t)$ and the corresponding participation ratio $\text{PR}(t)$, see Eqs. - for the respective definitions. In the bottom panel, we additionally illustrate the ratio $\xi = \|x\|/\text{PR}$. All these quantities are plotted as a function of $\kappa$, at a fixed time $t_f=2.1 \, T_B \approx 66$, where $T_B = 2\pi/F$ is the Bloch oscillation period. We find that up to $\kappa \sim 0.83$ both $x(t_f)$ and $\text{PR}(t_f)$ are small (relative to the system size $N$) which corresponds to the Bloch oscillation-like dynamics where the phonon-influence is minimal. In contrast, phonons play important role above $\kappa \sim 0.83$ where the system dynamics is quite sensitive to the choice of microscopic parameters. Within the chaotic-like regime, the typical Bloch oscillation dynamics is completely disrupted, as the majority of solutions become delocalized across the lattice, leading to large values of $\text{PR}(t)$. However, amidst this chaotic behavior, we also discover intervals of stability, characterized by peaks of $\xi(t_f)$, where a substantial portion of the wave packet becomes well-localized and exhibits near-constant velocity motion. We illustrate those different dynamical behaviours in Fig. , where the first column, i.e., panels (a)-(d), show the time evolution of the excitation density $\|\psi_j(t)\|^2$, while the second column \[panels (e)-(h)\] illustrates the corresponding time evolution of the center of mass position $x(t)$ and the participation ratio PR$(t)$. In the first row ($\kappa = 0.8$), we observe almost perfect Bloch oscillations. However, upon closer examination, a subtle asymmetry becomes apparent, which is evident by a non-zero $x(t)$. The asymmetry is enhanced for a higher $\kappa = 0.83$, as depicted in the second row of Fig. . Finally, the last two rows of Fig. illustrate the time evolution of the excitation density in the chaotic-like regime above $\kappa \sim 0.83$, cf. Fig. , where most of the solutions are delocalized over a lattice, as in Fig. (d) for $\kappa = 0.86$. In contrast, in Fig. (c) we illustrate a regular behaviour for $\kappa=0.834$, which lies inside one of the aforementioned stability windows. In this scenario, due to constructive interference after one Bloch oscillation period, a prominent portion of the wave function coalesces into a very narrow non-dispersive wave packet that moves with a nearly constant velocity. Overall, Fig. offers a comprehensive visual representation of the dynamic phenomena investigated in this section, shedding light on the varying dynamical behaviors and properties of the system with increasing phonon interaction. # Dynamical phase diagrams of the effective Hamiltonian In the previous sections, we have derived and then analysed a microscopic Hamiltonian , governing the dynamics of an excitation coupled to phonons through two different mechanisms, i.e., the SSH and Fröhling Hamiltonians, see Eq. . While maintaining a close connection to the experimental platform, it is important to note that in the considered Rydberg setup, the phonon coupling strengths $g_J$ and $g_W$ are not independent. Instead, they can both be expressed in terms of a single parameter $\kappa$, as demonstrated in Eq. . Consequently, investigating the interplay between these two competing phonon-coupling mechanisms within the current Rydberg platform becomes challenging. To address this limitation and explore the complete phase diagram in a more general context, in this section, we treat $g_J$ and $g_W$ as completely independent and fix other parameters. In the initial phase, as described in Section , our primary objective is to identify a stable polaron regime. Specifically, we aim to find a regime in which an initially localized excitation does not spread during the course of time evolution. Subsequently, in Section , we demonstrate the existence of stable islands where polarons can exhibit non-dispersive motion when subjected to a constant force, even in the presence of substantial disorder. Furthermore, in this part, we thoroughly examine the quantitative differences in dynamics of optical and acoustic phonons. In the following, we set the system size to $N = 401$, and solve the equations of motions in a fixed time interval $t\in [0,t_f = 16.5]$. Unless explicitly stated otherwise, we also set $m_{\rm eff}=0.5$, $\omega_{\rm eff} = 10$, and $J_0 = 1$. ## Polaron formation In the preceding section, we have already witnessed the emergence of a non-dispersive, self-trapped polaron through the excitation-phonon coupling. Building upon this observation, here we independently vary the two coupling strengths, $g_J$, and $g_W$, to identify a stable polaron regime. It is worth noting that the Hamiltonian of the system, as described by Eq. , is invariant under the simultaneous transformation: $u_j \rightarrow - u_j$, $g_J \rightarrow - g_J$, and $g_W \rightarrow - g_W$. Therefore, without loss of generality, we can assume $g_J\geq0$. In Fig. , we present a phase diagram of the participation ratio PR calculated at the final evolution time for a broad range of values: $g_J\in [0,45]$ and $g_W\in [-16,20]$. Each panel of Fig. corresponds to distinct values of $m_{\rm eff}$ and $\eta$, as specified in the figure caption. In terms of the layout, the left (right) column corresponds to the optical (acoustic) phonons, and $m_{\rm eff}$ increases from top to bottom. In all panels of Fig. , we observe wide regions with both extended states (warm colors) and well-localized solutions (dark blue colors), with the latter corresponding to stable, stationary polarons. We discover a non-trivial dependence of the participation ratio on both coupling strengths. Moreover, we find qualitatively similar behavior for both types of phonon, however, the acoustic phonons exhibit greater dynamic stability. This is evident from the presence of a chaotic-like region (the light blue dotted area), \[compare with Fig. and see the discussion in Sec. \]. Finally, we indicate that a decrease of effective mass $m_{\rm eff}$ stabilizes the excitation supporting localized polaron formation. ## Robustness of coherent transport against disorder In this paragraph, we focus on the parameters regime, where a well-localized excitation can be transported over a long distance. Namely, after identifying stable polaron regimes, we proceed to apply a constant force to investigate the propagation of non-spreading solutions. For this analysis, we fix $F=0.2$, $m_{\rm eff}=0.5$ and select the coupling strengths within the range $g_J \in [4,16]$ and $g_W \in [8,20]$. These regions are indicated by a dashed square in the bottom panels of Fig. . The results are presented in Fig. . The top row of Fig. illustrates the participation ratio, PR$(t_f)$, for both optical \[panel (a)\] and acoustic \[panel (b)\] phonons. In both panels, we observe a shift in the boundary between extended and localized states due to the presence of the applied force. However, the prevalence of dark blue colors, indicating localized regimes, remains evident. The bottom row of Fig. displays $\xi(t_f)$, as given by Eq. . This quantity serves as a measure for selecting well-localized solutions propagating in a single direction. We observe stable transport islands of such solutions, indicated by warm colors. Panel (c) corresponds to optical phonons, while panel (d) corresponds to acoustic phonons. Finally, in Fig., we examine the robustness of the non-dispersive moving solutions against on-site disorder, $\epsilon_j$, as in Eq.. The disorder is introduced by assuming $\epsilon_j$ to be a pseudorandom variable drawn from a uniform distribution in $[-W/2,W/2]$. Panels (a) and (b) depict the time propagation of excitations for optical and acoustic phonons, respectively. Panel Fig. (c) illustrate the center of mass position, while Fig. (d) presents the participation ratio evaluated at the final evolution time, plotted as functions of the disorder amplitude $W$. The results are averaged over 200 independent realizations of disorder. Notably, the participation ratio for both acoustic and optical phonons remains relatively constant, providing evidence for the robustness of the polaron self-trapping mechanism, while the center of mass positions takes place on a significant distance. # Summary and conclusions In summary, we propose a quantum simulator with Rydbeg-dressed atom arrays for SSH-Frölich Hamiltonian allowing studies of polaron formation and dynamics. The interplay between two competing excitation-phonon coupling terms in the model results in a rich dynamical behavior, which we comprehensively analyze. In particular, our findings reveal the presence of asymmetry in Bloch oscillations allowing coherent transport of a well-localized excitation over long distances. Moreover, we compare the behavior of excitations coupled to either acoustic or optical phonons and indicate similar qualitative behavior. Finally, we demonstrate the robustness of phonon-assisted coherent transport to the on-site random potential. = Our analysis is restricted to weak lattice distortions related to a small number of phonons per lattice site, however, the proposed quantum simulator allows the studies of the excitation dynamics in strong distortion limit, as well as studies of a plethora of different scenarios, such as bi- and many-polaron dynamics, and investigation of the quantum boomerang effect affected by the presence of phonons, both in a single-particle and many-body scenario. We believe, that our work opens up new avenues for research in Rydberg-based quantum simulators. A.K. acknowledges the support of the Austrian Science Fund (FWF) within the ESPRIT Programme ESP 171-N under the Quantum Austria Funding Initiative. S.K. acknowledges the Netherlands Organisation for Scientific Research (NWO) under Grant No. 680.92.18.05. ICFO group acknowledges support from: ERC AdG NOQIA; MICIN/AEI (PGC2018-0910.13039/501100011033, CEX2019-000910-S/10.13039/501100011033, Plan National FIDEUA PID2019-106901GB-I00, FPI; MICIIN with funding from European Union NextGenerationEU (PRTR-C17.I1): QUANTERA MAQS PCI2019-111828-2); MCIN/AEI/ 10.13039/501100011033 and by the "European Union NextGeneration EU/PRTR" QUANTERA DYNAMITE PCI2022-132919 within the QuantERA II Programme that has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 101017733Proyectos de I+D+I "Retos Colaboración" QUSPIN RTC2019-007196-7); Fundació Cellex; Fundació Mir-Puig; Generalitat de Catalunya (European Social Fund FEDER and CERCA program, AGAUR Grant No. 2021 SGR 01452, QuantumCAT U16-011424, co-funded by ERDF Operational Program of Catalonia 2014-2020); Barcelona Supercomputing Center MareNostrum (FI-2023-1-0013); EU (PASQuanS2.1, 101113690); EU Horizon 2020 FET-OPEN OPTOlogic (Grant No 899794); EU Horizon Europe Program (Grant Agreement 101080086 — NeQST), National Science Centre, Poland (Symfonia Grant No. 2016/20/W/ST4/00314); ICFO Internal "QuantumGaudi" project; European Union's Horizon 2020 research and innovation program under the Marie-Skłodowska-Curie grant agreement No 101029393 (STREDCH) and No 847648 ("La Caixa" Junior Leaders fellowships ID100010434: LCF/BQ/PI19/11690013, LCF/BQ/PI20/11760031, LCF/BQ/PR20/11770012, LCF/BQ/PR21/11840013). The work J.Z. was funded by the National Science Centre, Poland under the OPUS call within the WEAVE programme 2021/43/I/ST3/01142 as well as via project 2021/03/Y/ST2/00186 within the QuantERA II Programme that has received funding from the European Union Horizon 2020 research and innovation programme under Grant agreement No 101017733. A partial support by the Strategic Programme Excellence Initiative within Priority Research Area (DigiWorld) at Jagiellonian University is acknowledged. M.P. acknowledges the support of the Polish National Agency for Academic Exchange, the Bekker programme no: PPN/BEK/2020/1/00317. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union, European Commission, European Climate, Infrastructure and Environment Executive Agency (CINEA), nor any other granting authority. Neither the European Union nor any granting authority can be held responsible for them.	{ "dup_signals": {}, "filename": "out/2307.04471_extract_manuscript_arxiv_v1.tex.md" }	arxiv
abstract: We study the existence of the stochastic flow associated to a linear stochastic evolution equation $${\rm d}X= AX{\rm d}t +\sum_{k} B_k X{\rm d}W_k,$$ on a Hilbert space. Our first result covers the case where $A$ is the generator of a $C_0$-semigroup, and $(B_k)$ is a sequence of bounded linear operators such that $\sum_k\\|B_k\\|<+\infty$. We also provide sufficient conditions for the existence of stochastic flows in the Schatten classes beyond the space of Hilbert-Schmidt operators. Some new results and examples concerning the so-called commutative case are presented as well. address: School of Mathematics and Statistics, The University of Sydney, Sydney 2006, Australia; Institute of Mathematics, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland author: Beniamin Goldys; Szymon Peszat title: On Linear Stochastic Flows [^1] [^2] # Introduction Consider the following linear evolution equation $$\label{E11} {\rm d}X= AX{\rm d}t + \sum_{k} B_k X{\rm d}W_k,\qquad t\ge s,\,\,X(s)=x,$$ where $(W_k)$ is a sequence of independent standard real valued Wiener processes defined on a probability space $(\Omega, \mathfrak{F},\mathbb{P})$, $A$ is the generator of a $C_0$-semigroup $\left({\rm e}^{tA}\right)$ on a Hilbert space $(H,\langle \cdot,\cdot\rangle_H)$ and $B_k$, $k=1, \ldots$, are possibly unbounded linear operators on $H$. By the solution to $\eqref{E11}$ we understand the so-called mild solution defined as an adapted to the filtration generated by $(W_k)$ and having continuous trajectories in $H$ process $X^x_s(t)$, $t\ge s$, satisfying the integral equation $$X^x_s(t)= {\rm e}^{(t-s)A}x +\sum_{k} \int_s^t {\rm e}^{(t-r)A} B_k X^x_s(r){\rm d}W_k(r), \qquad t\ge s.$$ For more details see the books of Da Prato and Zabczyk and Flandoli . Let us denote by $L(H,H)$ the space of all bounded linear operators on $H$. We denote by $\\|\cdot \\|$ the operator norm on $L(H,H)$ and by $\vert \cdot \vert _H$ the norm on $H$. Let $\Delta := \{(s,t)\colon 0\le s\le t<+\infty\}$. Definition 1. We say that defines a stochastic flow if there exists a mapping $\mathcal{X}\colon\Delta\times \Omega\mapsto L(H,H)$ such that: - for every $s\ge 0$ and $x\in H$, the process $\mathcal{X}(s,t;\cdot)(x)$, $t\ge s$, has continuous trajectories in $H$, $\mathbb P$-a.s., - for every $s\ge 0$ and $x\in H$, we have $\mathcal{X}(s,t)(x)= X^x_s(t)$, for all $t\ge s$, $\mathbb{P}$-a.s., - for all $0\le s\le t\le r$ and $\omega\in \Omega$, $\mathcal{X}(t,r;\omega)\circ \mathcal{X}(s,t;\omega)= \mathcal{X}(s,r;\omega)$. Remark 2. Property $(i)$ means that for each $s$ and $x$, the solution $X^x_s(t)= \mathcal{X}(s,t;\cdot)(x)$ has continuous trajectories. This holds in the most of interesting cases by means of the so-called Da Prato-Kwapień-Zabczyk factorization, see e.g. . In particular, the solution has the required property if $(B_k)$ is a finite sequence of bounded operators on $H$. Note, that property $(iii)$ of the stochastic flow follows from the existence and uniqueness of the solution. Finally, note that for all $0\le s\le t\le r\le h$ and $x,y\in H$, the random variables $\mathcal{X}(s,t)(x)$ and $\mathcal{X}(r,h)(y)$ are independent. Stochastic flows with this property are known as Brownian flows. The theory of stochastic flows for linear and nonlinear stochastic differential equations in finite dimensional spaces is well established, see for example . In particular, it is known that the nonlinear SDE $$\label{E12} {\rm d}X= b(X){\rm d}t + \sigma(X){\rm d}W, \qquad X(s)=x,$$ defines a (nonlinear) stochastic flow if $b$ and $\sigma$ are $C^1$ with bounded derivatives. The proof of this result requires a Sobolev embedding theorem. The case of additive noise, where $\sigma$ does not depend on $X$, is easier. Namely, let $Y^x_s(t)$, $t\ge s$, be the solution to the ordinary differential equation with random coefficients $$\label{E13} {\rm d}Y(t)= b\left(Y(t)+\sigma (W(t)-W(s))\right){\rm d}t, \qquad Y(s)=x.$$ Then the solution $X^x_s$ to is given by $X^x_s(t)= Y^x_s(t)+ \sigma (W(t)-W(s))$, and consequently defines a stochastic flow if defines a flow. It is natural to ask a question about the existence of stochastic flow associated to a stochastic evolution equation $$\label{E111} {\rm d}X=(AX+F(X)){\rm d}t+\sum_k\sigma_k(X){\rm d}W_k,\quad X(0)=x\,,$$ where $H$ is an infinite-dimensional Hilbert space and $A$ is the generator of a $C_0$-semigroup on $H$. If $\sigma$ does not depend on $X$, then one can apply the same argument as in in the finite dimensional case. In this paper we focus on a linear equation , where $F=0$ and $\sigma_k(x)=B_kx$. Skorokhod showed in his famous example that the question of the existence of the flow for equation is much more intricate than in finite dimensions. In Section 4 we consider a more general version of the Skorokhod example to obtain new and interesting phenomena. In general, it is not possible to prove the existence of the flow in infinite dimensions using the Sobolev embedding theorems that are not available in infinite dimensions. However, for many nonlinear equations of the form the corresponding flows can be obtained as continuous transformations of the flows corresponding to ordinary or partial differential equations with random coefficients (see e.g. ). In fact, we will use a similar method in the proof of our main result concerning linear equations. A very general approach to the question of existence of stochastic flows for non-linear stochastic partial differential equations can be found in . It is tempting to obtain the existence of the flow $\mathcal X$ corresponding to linear equation by solving a linear equation $${\rm d}\mathcal X= A\mathcal X{\rm d}t + \sum_{k} B_k \mathcal X\,{\rm d}W_k,\qquad t\ge s,\,\,\mathcal X(s)=I\,,$$ on the space of bounded operators. Unfortunately, the theory of stochastic integration on the space of all bounded linear operators on infinite dimensional Hilbert space does not exist. It is possible however to integrate in some smaller spaces, such as the Schatten classes. This approach has been applied by Flandoli ) in the case of Hilbert-Schmidt operators. In Section of this paper we use the theory of stochastic integration on $M$-type 2 Banach spaces to extend the Flandoli result to the scale of Schatten classes $\mathbb S^p$ for all $p\in[2,\infty)$. We will show that in the diagonal case solutions in the Schatten classes can exist $\mathbb P$-a.s. but not in the sense of expected value. In particular, in Proposition 4.3.3 we give conditions under which $\mathrm{Tr}(\mathcal X(t))<+\infty$, $\mathbb P$-a.s. while $\mathbb E\,\mathrm{Tr}(\mathcal X(t))=+\infty$. It would be interesting to find more general conditions for such a behaviour of stochastic flows. The paper is organized as follows: in the next section we prove a theorem dealing with equation with a sequence of bounded operators $B_k$. We do not assume that the operators commute, therefore there are no explicit solutions. Then in Sections we examine the case of commuting $B_k$. In this case the solution is given in an explicit form, and this allows us to construct interesting examples that lead to new questions that remain open. In Section we study the existence of stochastic flows in Schatten classes. # Main results In this section we consider a linear stochastic equation on a separable Hilbert space $H$: $$\label{E21} {\rm d}Y= B_0Y{\rm d}t + \sum_{k}B_kY{\rm d}W_k$$ driven by a sequence of independent Wiener processes $W_k$. In theorems below we assume that $B_0, B_1, \ldots$ is a sequence of bounded linear operators on $H$ such that $$\label{Assumption} M:= \sum_{k} \\|B_k\\|<+\infty.$$ We start with a result on the existence and regularity of the flow. Theorem 3. * Assume . Then equation defines a stochastic flow $\mathcal{Y}$ on $H$. Moreover, for any $T\in (0,+\infty)$, and $q\ge 1$, $$\label{E22} \sup_{0\le s\le t\le T} \mathbb{E}\left\\| \mathcal{Y}(s,t)\right\\|^{q} <+\infty.$$ Moreover, for every $T\in (0,+\infty)$, and $L= 2,3,\ldots$, there exists a constant $C$ such that for all for $0\le s\le t\le u\le T$ we have $$\label{E23} \mathbb{E}\left\\| \mathcal{Y}(s,t)- \mathcal{Y}(s, u)\right\\|^{2L} \le C\left\vert t-u\right\vert ^{L-1}.$$ Consequently, by the Kolmogorov test, for any $s\ge 0$, $\mathbb{P}$-a.s. $\mathcal{Y}(s,t)$ is a Hölder continuous $L(H,H)$-valued mapping of $t\ge s$, with exponent $\gamma <1/2$. Finally, for any $T\in (0,+\infty)$, and $L= 2,3,\ldots$, there exists a constant $C$ such that for all for $0\le s\le t\le u\le T$ we have $$\label{E24} \mathbb{E}\left\\| \mathcal{Y}(s,t)- \mathcal{Y}(u, v)\right\\|^{2L} \le C\left( \left \vert u-s\right\vert + \left\vert t-v\right\vert\right) ^{L-1}.$$ Consequently, by the Kolmogorov test, for any $s\ge 0$, $\mathbb{P}$-a.s. $\mathcal{Y}(s,t)$ is a Hölder continuous $L(H,H)$-valued mapping of $(t,s)\in \Delta$, with exponent $\gamma <1/2$.* Theorem 4. * Assume . Let $\tilde B= \sum_{k\ge 1}B_k^2$, and let $\mathcal{Y}$ be the flow defined by . Then for all $0\le s\le t$, $\mathcal{Y}(s,t)$ is an invertible operator $\mathbb{P}$-a.s. and $\mathcal{Z}(s,t) = \left(\mathcal{Y}^{-1}(s,t)\right)^\star$ is the flow defined by the equation $$\label{E25} {\rm d}Z= \left( -B_0^\star+ \tilde B^\star\right) Z{\rm d}t - \sum_{k} B_k^\star Z {\rm d}W_k,\quad Z(s)=z\,.$$* Theorem 5. * Assume Then for any generator $A$ of a $C_0$-semigroup on $H$, equation defines stochastic flow on $H$.* Theorem 6. * Assume that $B_0, B_1,\ldots, B_N$ is a finite sequence of commuting generators of $C_0$-groups on $H$. Then the equation $${\rm d}X= B_0X{\rm d}t + \sum_{k=1}^N B_kX{\rm d}_S W_k,\quad X(s)=x$$ considered in the Stratonovich sense defines stochastic flow on $H$. Moreover, $$\mathcal{X}(s,t)= \exp\left\{(t-s) B_0 + \sum_{k=1}^N \left(W_k(t)-W_k(s)\right)B_k\right\}.$$* ## Proof of Theorem Write $W_0(t)\equiv t$. Then can be written as follows $${\rm d}Y= \sum_{k= 0}^{+\infty} B_kY{\rm d}W_k, \qquad Y(s)=x.$$ Let $$\Theta:= \bigcup_{n=1}^{+\infty} \{0,1, \ldots\}^n.$$ Given $\alpha \in \{0,1,\ldots\}^n$ we set $\vert \alpha \vert =n$ and $B^\alpha := B_{\alpha _1}B_{\alpha_2}\ldots B_{\alpha_n}$. Let us fix an $s\ge 0$. We define by induction the iterated stochastic integrals. If $\vert \alpha \vert = 1$ and $\alpha = (k):= k$, $k=0,1,\ldots$, then $I_k(t)=W_k(t)-W_k(s)$. For $\alpha \in \{0,1,\ldots\}^n$ we set $$I_\alpha(s,t):= \int_s^t I_{\widehat{\alpha }}(s,r){\rm d}W_{\alpha_1}(r),$$ where ${\widehat{\alpha }}:=(\alpha _2,\ldots, \alpha_n)$. Then the solution $Y^x_s$ to is given by $$Y^x_s(t) = x+ \sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} I_\alpha(s,t)B^\alpha x.$$ Therefore the corresponding flow $\mathcal{Y}$ exists and $$\mathcal{Y}(s,t;\omega)= I+ \sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} I_\alpha(s,t;\omega)B^\alpha$$ provided the series $$\sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} I_\alpha(s,t;\omega)B^\alpha$$ converges $\mathbb{P}$-a.s in $L(H,H)$. We have $$\sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} \\| I_\alpha(s,t;\omega)B^\alpha \\|\le \xi(s, t;\omega),$$ where $$\xi(s,t;\omega) := \sum_{n=1}^{+\infty} \sum_{\vert \alpha \vert =n}^{+\infty} \left\vert I_\alpha (s, t;\omega)\right\vert \, \\|B^\alpha\\|.$$ Therefore the stochastic flow exists if $\xi(s,t)<+\infty$, $\mathbb{P}$-a.s. Moreover, $\left\\|\mathcal{Y}(s,t;\omega)\right\\| \le 1+ \xi(s,t;\omega)$. In fact we will show that for any $T\in (0,+\infty)$ and $L=1,2,3, \ldots$, we have $$\label{E26} \sup_{0\le s \le t\le T} \mathbb{E}\, \xi^{2L}(s,t) < +\infty,$$ which obviously guarantees . To this end note that for $\alpha \colon \vert \alpha \vert=n \ge 1$ we have $$\mathbb{E} \left\vert I_\alpha (s,t)\right\vert ^{2L}\le C_L^{2Ln} \frac {\max\{ (1, (t-s)^{2Ln}\}}{n!}(t-s)^{L-1}=:\eta_{n,L}(s,t),$$ where $C_L:= \frac{2L}{2L-1}$. Let $\alpha^1, \alpha^2,\ldots, \alpha^{2L} \in \Theta$. By the Schwarz inequality we have $$\mathbb{E} \prod_{i=1}^{2L} \left\vert I_{\alpha^i} (s,t) \right\vert\le \prod_{i=1}^{2L} \left( \mathbb{E} \left\vert I_{\alpha^i}(s,t)\right\vert^{2L}\right)^{1/(2L)}.$$ Therefore $$\begin{aligned} \mathbb{E}\, \xi^{2L}(s,t)& = \mathbb{E}\left( \sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} \left\vert I_\alpha(s,t)\right\vert \\| B^\alpha\\|\right)^{2L}\\ &\le \sum_{n_1, \ldots n_{2L}=1}^{+\infty} \sum_{\alpha^i \colon \vert \alpha^i \vert =n_i} \prod _{i=1}^{2L} \\|B^{\alpha ^i} \\|\, \mathbb{E} \prod_{i=1}^{2L} \left\vert I_{\alpha^i}(s,t)\right\vert\\ &\le \left( \sum_{n=1}^{+\infty} \sum_{\alpha \colon \vert \alpha \vert =n}^{+\infty} \\|B^\alpha \\| \left( \mathbb{E} \left\vert I_{\alpha}(s,t)\right\vert^{2L}\right)^{1/(2L)}\right)^{2L} \\ &\le \left( \sum_{n=1}^{+\infty} \left(\eta _{n,L}(s,t)\right)^{1/(2L)} \sum_{\alpha \colon \vert \alpha \vert =n}^{+\infty} \\|B^\alpha \\| \right)^{2L} \end{aligned}$$ Since $$\begin{aligned} \sum_{\alpha \colon \vert \alpha \vert =n}^{+\infty} \\|B^\alpha \\| &\le \sum_{\alpha \colon \vert \alpha \vert =n}^{+\infty} \\|B_{\alpha_1}\\|\\|B_{\alpha _2}\\|\ldots \\|B_{\alpha _n}\\| \\ &\le \left( \sum_{k=1}^{+\infty} \\|B_k\\|\right)^n=:M^n, \end{aligned}$$ we have $$\begin{aligned} \left( \mathbb{E}\, \xi^{2L}(s,t)\right)^{1/(2L)} &\le \sum_{n=1}^{+\infty} M^n \left(\eta _{n,L}(s,t)\right)^{1/(2L)}\\ &\le \sum_{n=1}^{+\infty} M^n C_L^{n} \left( \frac {\max\{ (1, (t-s)^{2Ln}\}}{n!}\right)^{1/(2L)} (t-s)^{\frac 12 -\frac{1}{2L}} <+\infty, \end{aligned}$$ which gives . We will show using calculations from the proof of . Namely, we have $$\begin{aligned} &\mathbb{E}\left \\| \mathcal{Y}(s,t)- \mathcal{Y}(s,t+h)\right\\|^{2L} = \mathbb{E}\left\\| \sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} \left[ I_\alpha(s,t)- I_\alpha (s,t+h)\right] B^\alpha \right\\|^{2L} \\ &\le \left( \sum_{n=1}^{+\infty} \sum_{\alpha\colon \vert \alpha \vert =n}^{+\infty} \\|B^\alpha \\| \left( \mathbb{E} \left\vert I_\alpha(s,t)- I_\alpha (s,t+h)\right\vert ^{2L}\right)^{1/(2L)}\right)^{2L}. \end{aligned}$$ Since $$\mathbb{E} \left\vert I_\alpha(s,t)- I_\alpha (s,t+h)\right\vert^{2L} = \mathbb{E} \left\vert I_\alpha(0,h)\right\vert^{2L},$$ and holds, for $0\le s\le t\le u\le T$ we have $$\begin{aligned} \mathbb{E}\left \\| \mathcal{Y}(s,t)- \mathcal{Y}(s,u)\right\\|^{2L} &= \mathbb{E}\left \\| \mathcal{Y}(0,0)- \mathcal{Y}(0,u-t)\right\\|^{2L}\\ &= \mathbb{E}\, \xi^{2L}(0,u-t)\\ &\le \left( \sum_{n=1}^{+\infty} \left(MC_L\right)^{n}\left( \frac{\max\{1, T^{2Ln}\}}{n!}\right)^{1/(2L)} \right)^{2L} (u-t)^{L-1}. \qquad \square \end{aligned}$$ We are showing now . We can assume that $0\le s \le u$. Then there are three cases $(i)\colon s\le t\le u\le v$, $(ii)\colon s\le u\le t\le v$ and $(iii) \colon s\le u\le v\le t$. The first case follows from by elementary calculations. For, we have $$\begin{aligned} \mathbb{E}\, \\|\mathcal{Y}(s,t)-\mathcal{Y}(u,v)\\|^{2L} &\le 2^{2L}\left( \mathbb{E}\, \\|\mathcal{Y}(s,t)\\|^{2L} + \mathbb{E}\, \\|\mathcal{Y}(u,v)\\|^{2L}\right)\\ &\le C\left( \vert t-s\vert ^{L-1}+ \vert v-u\vert ^{L-1}\right)\\ &\le C \left( \vert t-s\vert + \vert v-u\vert \right)^{L-1}= C \left( t-s + v-u \right)^{L-1}\\ &\le C \left( v-t + u-s \right)^{L-1}= C \left( \vert t-v\vert + \vert s-u \vert \right)^{L-1}. \end{aligned}$$ The cases $(ii)$ and $(iii)$ follow also from , , and the flow property of $\mathcal Y$. Indeed consider $(ii)$. Then $$\begin{aligned} \\|\mathcal{Y}(s,t)-\mathcal{Y}(u,v)\\|&= \\|\mathcal{Y}(u,t)\circ \mathcal{Y}(s,u)-\mathcal{Y}(t,v)\circ \mathcal{Y}(u,t)\\|\\ &=\left\\|\left( \mathcal{Y}(u,t) \circ (\mathcal{Y}(s,u)- I\right) - \left( \mathcal{Y}(t,v)-I\right) \circ \mathcal{Y}(u,t) \right\\|\\ &\le \left\\|\mathcal{Y}(u,t)\right\\| \left\\|\mathcal{Y}(s,u)- \mathcal{Y}(s,s)\right\\| + \left\\| \mathcal{Y}(t,v)-\mathcal{Y}(t,t)\right\\| \left\\|\mathcal{Y}(u,t)\right\\| . \end{aligned}$$ Therefore applying the Schwarz inequality, and and for $4L$, we can find a constant $C$ such that $$\begin{aligned} \mathbb{E}\, \\|\mathcal{Y}(s,t)-\mathcal{Y}(u,v)\\|^{2L} &\le C\left( \|s-u\|^{2L-1} + \|t-v\|^{2L-1}\right)^{1/2}\le C'\left( \|s-u\| + \|t-v\|\right)^{L-1}. \end{aligned}$$ Finally in the last case (case $(iii)$), we have $$\begin{aligned} \\|\mathcal{Y}(s,t)-\mathcal{Y}(u,v)\\|&= \\|\mathcal{Y}(v,t)\circ \mathcal{Y}(u,v)\circ \mathcal{Y}(s,u)-\mathcal{Y}(u,v)\\|\\ &=\left\\| \mathcal{Y}(v,t)\circ \mathcal{Y}(u,v)\circ \left( \mathcal{Y}(s,u)- I\right)+ \left( \mathcal{Y}(v,t)-I\right)\circ \mathcal{Y}(u,v) \right\\|\\ &\le \left\\|\mathcal{Y}(v,t)\right\\| \left\\|\mathcal{Y}(u,v)\right\\| \left\\|\mathcal{Y}(s,u)- \mathcal{Y}(s,s)\right\\| + \left\\| \mathcal{Y}(v,t)-\mathcal{Y}(v,v)\right\\| \left\\|\mathcal{Y}(u,v)\right\\|. \end{aligned}$$ Therefore applying the Schwarz inequality, and and for $4L$ and $8L$, respectively, we can find a constant $C$ such that $$\begin{aligned} \mathbb{E}\, \\|\mathcal{Y}(s,t)-\mathcal{Y}(u,v)\\|^{2L} &\le C\left( \|s-u\|^{2L-1} + \|t-v\|^{2L-1}\right)^{1/2}\le C'\left( \|s-u\| + \|t-v\|\right)^{L-1}. \end{aligned}$$ $\square$. ## Proof of Theorem From the first part we can easily deduce the invertibility of $\mathcal{Y}(s,t)$. For, let us fix $0\le s\le t<+\infty$. Consider, the partition $s=t^n_0< t^n_1<\ldots t_n^n=t$ such that $t^n_{k+1}-t^n_k = \frac{t-s}{n}$. From the Hölder continuity of $\mathcal{Y}$ it follows that with probability $1$, for any $\omega\in \Omega$, there is an $n(\omega)$ such that $$\left\\| \mathcal{Y}(t^n_k,t^{n}_{k+1};\omega)- \mathcal{Y}(t^n_{k}, t^n_k; \omega)\right\\|\le \frac 12.$$ Hence, since $\mathcal{Y}(t^n_{k}, t^n_k; \omega)=I$, each $\mathcal{Y}(t^n_k,t^{n}_{k+1};\omega)$ is invertible. Since $$\mathcal{Y}(s,t;\omega)= \mathcal{Y}(t^n_{n-1},t^{n}_n;\omega)\circ \mathcal{Y}(t^n_{n-2},t^{n}_{n-1};\omega)\circ \ldots \circ \mathcal{Y}(t^n_{0},t^{n}_{1};\omega),$$ the invertibility of $\mathcal{Y}(s,t;\omega)$ follows. From the first part it also follows that defines stochastic flow $\mathcal{Z}$. We will first show that $\mathcal{Z}^\star(s,t) \mathcal{Y}(s,t)=I$. To this end note that for all $z,y\in H$, and $0\le s \le t$, we have $$\begin{aligned} \langle \mathcal{Z}^\star(s,t) \mathcal{Y}(s,t)y,z\rangle _H&= \langle \mathcal{Y}(s,t)y, \mathcal{Z}(s,t)z\rangle_H = \langle Y^y_s(t), Z^z_s(t)\rangle_H. \end{aligned}$$ Next $$\begin{aligned} {\rm d}\langle Y^y_s(t), Z^z_s(t)\rangle_H&= \langle B_0 Y^y_s(t), Z^z_s(t)\rangle _H{\rm d}t\\ &\quad + \langle Y^y_s(t), (- B_0^\star +\tilde B^\star)Z^z_s(t)\rangle _H{\rm d}t -\sum_k \langle B_kY^y_s(t), B_k^\star Z^z_s(t)\rangle _H{\rm d}t \\ &\quad + \sum_{k} \langle B_kY^y_s(t), Z^z_s(t)\rangle _H{\rm d}W_k(t) -\sum_{k}\langle Y^y_s(t), B^\star_k Z^z_s(t)\rangle _H {\rm d}W_k(t)\\ &=0. \end{aligned}$$ Hence for all $y,z\in H$ and $0\le s\le t$, $$\langle \mathcal{Z}^\star(s,t) \mathcal{Y}(s,t)y,z\rangle _H= \langle y,z\rangle _H,$$ and therefore $\mathcal{Z}^\star (s,t)\mathcal{Y} (s,t)=I$, $\mathbb{P}$-a.s. Since $\mathcal{Y}(s,t)$ is invertible, $\mathcal{Z}^\star (s,t)= \left(\mathcal{Y}(s,t)\right)^{-1}$ as required. $\square$ ## Proof of Theorem Recall that $(A,D(A))$ is the generator of a $C_0$-semigroup ${\rm e}^{tA}$, $t\ge 0$, on a Hilbert space $H$. Let $A_\lambda= \lambda A(\lambda I -A)^{-1}$ be the Yosida approximation of $A$. Consider the following approximation of , $$\label{E27} {\rm d}X=A_\lambda X{\rm d}t+\sum_{k} B_kX{\rm d}W_k,\quad X(s)=x\in H.$$ Let $X^{x,\lambda}_s$ be the solution to . Recall that $X^x_s$ is the solution to . It is easy to show that $$\label{E28} \lim_{\lambda\to +\infty}\mathbb{E}\left\vert X^{x,\lambda}_s(t)-X^x_s(t)\right\vert ^2_H=0, \qquad \forall\, t\ge s.$$ Define $B_0 =\frac 12 \sum_{k=1}^{+\infty} B_k^2$. Consider the following linear stochastic differential equation. $$\label{E29} {\rm d}Y= B_0Y {\rm d}t+\sum_{k} B_kY{\rm d}W_k,\qquad Y(s)=x\in H.$$ From the first part of the theorem we know that and define stochastic flows, $\mathcal{X}_\lambda$ and $\mathcal{Y}$, respectively. Consider the following equation with random coefficients $$\label{E210} {\rm d}\mathcal{G}(t)= \mathcal{Y}(s,t)^{-1}\left(A_\lambda - B_0 \right)\mathcal{Y}(s,t)\mathcal{G}(t){\rm d}t, \quad \mathcal{G}(s)=I.$$ Taking into account Hölder continuity of $\mathcal{Y}$ one can see that the solution $\mathcal{G}_\lambda$ exists and $$\mathcal{G}_\lambda(t,s)= \exp\left\{\int_s^t \mathcal{Y}(s,r)^{-1}\left(A_\lambda - B_0 \right)\mathcal{Y}(s,r){\rm d}r\right\}.$$ Moreover, see , there is a strongly continuous two parameter evolution system $\mathcal{G}(s,t)$ of bounded linear operators on $H$, such that for any $x\in H$, $$\label{E211} \lim_{\lambda \to +\infty} \mathcal{G}_\lambda(s,t)x= \mathcal{G}(s,t)x, \qquad \mathbb{P}-a.s.$$ We will show that defines stochastic flow and $\mathcal {X}(s, t)= \mathcal{Y}(s,t) \mathcal{G}(s,t)$. Taking into account and it is enough to show that $X^{x,\lambda}_s(t)= \mathcal{Y}(s,t) \mathcal{G}_\lambda (s,t)x$, $t\ge s$, that is that $\mathcal{Y}(s,t) \mathcal{G}_\lambda (s,t)x$, $t\ge s$, solves . Clearly $\mathcal{Y}(s,s) \mathcal{G}_\lambda (s,s)x=x$. Next, for any $y\in H$ we have $${\rm d}\mathcal{Y}^\star(s,t)y= \mathcal{Y}^\star (s,t)B_0^\star y {\rm d}t + \sum_{k=1}^{+\infty} \mathcal{Y}^\star (s,t)B_k^\star y{\rm d}W_k(t).$$ Therefore $$\begin{aligned} {\rm d}\left \langle \mathcal{G}_\lambda (s,t)x, \mathcal{Y}^\star(s,t) y\right \rangle_H &= \left\langle \mathcal{Y}(s,t)^{-1}\left(A_\lambda - B_0 \right)\mathcal{Y}(s,t)\mathcal{G}_\lambda (s, t)x, \mathcal{Y}^\star(s,t) y\right \rangle_H \\ &\quad + \left\langle \mathcal{G}_\lambda (s,t)x, \mathcal{Y}^\star (s,t)B_0^\star y \right\rangle _H {\rm d}t\\ &\quad + \sum_{k=1}^{+\infty} \left\langle \mathcal{G}_\lambda (s,t)x, \mathcal{Y}^\star (s,t)B_k^\star y\right\rangle _H{\rm d}W_k(t), \end{aligned}$$ and consequently we have the desired conclusion $${\rm d}\mathcal{Y}(s,t) \mathcal{G}_\lambda (s,t)x= A_\lambda \mathcal{Y}(s,t) \mathcal{G}_\lambda (s,t)x{\rm d}t + \sum_{k=1}^{+\infty} B_k \mathcal{Y}(s,t) \mathcal{G}_\lambda(s,t)x{\rm d}W_k (t). \qquad \square$$ ## Proof of Theorem This part is well known. We present it only for complete presentation. Remark 7. The trick used in the proof of Theorem is well-known in finite dimensional case and in some infinite dimensional cases and is known as the Doss–Sussman transformation. There are important and interesting questions: $(i)$ whether the flow $\mathcal{X}$ defined by is Hölder continuous $L(H,H)$-valued mapping of $(s,t)\in \Delta$, $(ii)$ if the flow is invertible. Clearly, see the proof of Theorem (ii), $(i)$ implies $(ii)$. Unfortunately, if $A$ is unbounded it is probably impossible that the flow is continuous in the operator norm. Let us recall that the semigroup ${\rm e}^{tA}$ generated by $A$ is continuous in the operator norm if and only if $A$ is bounded. In the case when ${\rm e}^{tA}$, $t>0$, are compact it is however possible that ${\rm e}^{tA}$ is continuous in the operator norm for $t>0$. Therefore, we can expect that in some cases the flow is continuous in the operator norm on the open set $0<s<t<+\infty$, see Section . Finally note that if the flow $\mathcal{X}$ is invertible, than formally $\mathcal{Z}= \left(\mathcal{X}^{-1}\right)^$ is the flow for the equation $${\rm d}Z= \left(-A^\star +\sum_{k} \left(B_k^2\right)^\star\right)Z{\rm d}t-\sum_{k} B_k^\star Z{\rm d}W_k.$$ Unfortunately, only in the case when $A$ generates a group, $-A^\star$ generates a $C_0$-semigroup, and therefore the equation is often ill posed. Example 8. This example is an extension of a model that is important in the theory of random evolution of spins in ferromagnetic materials, see . Let $\mathcal{O}\subset\mathbb R^3$ be a bounded domain. For a sequence $g_k\in C\left(\mathcal{O},\mathbb R^3\right)$ we define operators $$B_k\colon L^2\left(\mathcal{O};\mathbb R^3\right)\to L^2\left(\mathcal{O};\mathbb R^3\right),\quad B_kx(\xi)=x(\xi)\times g_k\left(\xi\right),\quad \xi\in\mathcal{O}\,,$$ where $\times$ stands for vector product in $\mathbb R^3$. It is easy to see that each $B_k$ is skew-symmetric, that is $B_k^\ast=-B_k$. Assume that $\sum_k \\|g_k\\|_{\infty}<+\infty$, hence $\sum_{k} \\|B_k\\|<+\infty$. Then, by Theorem , the stochastic differential equation $${\rm d}Y=\sum_{k=1}^{+\infty}B_kY{\rm d}W_k,$$ defines stochastic flow ${\mathcal Y}$ on $\mathbb L^2$ and, by Theorem , $\mathcal{Z}(s,t) = \left(\mathcal{Y}^{-1}(s,t)\right)^\star$ is the flow defined by the equation $${\rm d}Z= \tilde B Z{\rm d}t + \sum_{k} B_k Z {\rm d}W_k.$$ where $\tilde B= \sum_{k\ge 1}B_k^2$. # Nonlinear case The method based on the Doss–Susmann transformation can be generalized to some non-linear equations, see . Assume that $B_0,\ldots B_N \in C^1(\mathbb{R})$ is a finite sequence of functions with bounded derivatives. Let $\mathcal{O}$ be an open subset of $\mathbb{R}^d$ and let $(A,D(A))$ be the generator of a $C_0$-semigroup on $H=L^2(\mathcal{O})$. Consider the following SPDE and SDE in the Stratonovich sense $$\label{EN1} {\rm d}X= \left[ AX+ B_0(X)\right]{\rm d}t +\sum_{k=1}^N B_k(X){\rm d}_S W_k,\qquad X(s)=x\in L^2(\mathcal{O}),$$ $$\label{EN2} {\rm d}y= B_0(y){\rm d}t +\sum_{k=1}^N B_k(y){\rm d}_S W_k(t),\qquad y(s)=\xi\in \mathbb{R}.$$ By the classical theory of SDEs, see e.g. , defines stochastic flow $\eta(s,t)$, $0\le s\le t$, of diffeomorphism of $\mathbb{R}$. Consider now the following stochastic evolution equation on the Hilbert space $L^2(\mathcal{O})$, $$\label{EN3} {\rm d}Y= B_0(Y){\rm d}t +\sum_{k=1}^N B_k(Y){\rm d}_S W_k,\qquad Y(s)=x\in L^2(\mathcal{O}).$$ Then defines stochastic flow $\mathcal{Y}$ on $L^2(\mathcal{O})$ and $$\mathcal{Y}(s,t)(x)(\xi)= \eta(s,t)(x(\xi)), \qquad x\in L^2(\mathcal{O}), \quad \xi\in \mathcal{O}.$$ Let $G_s^x(t)$ be the solution to the evolution equation with random coefficients $$\frac{{\rm d}}{{\rm d}t} G_s^x(t)= \mathcal{U}(s,t ,G_s^x(t)), \qquad G_s^x(s)=x,$$ where $$\mathcal{U}(s,t,y)= \left[ D\mathcal{Y}(s,t)(y)\right]^{-1} A \mathcal{Y}(s,t)(y).$$ Above, $D\mathcal{Y}(s,t)(y)$ is the derivative with respect to initial condition $y$. Clearly we need to assume that $\mathcal{Y}(s,t)(G_s^x(t))$ is in the domain of $A$, and $D\mathcal{Y}(s,t)(G_s^x(t))$ is an invertible operator. Then by the Itô–Vencel formula $$X^x_s(t)= \mathcal{Y}(s,t)(G^x_s(t)).$$ In , this method was applied to equations of the type $${\rm d}X= \mathcal{A}(X){\rm d}t + \sum_{k=1}^N B_k X{\rm d}_S W_k,$$ where $\mathcal{A}$ is a monotone operator, and $(B_k)$ are first order differential linear operators. # Diagonal and commutative case In this section we will first consider an extension of the Skorokhod example . Let $(e_k)$ be an orthonormal basis of an infinite-dimensional Hilbert space $H$ and let $(\sigma_k)$ and $(\alpha_k)$ be sequences of real numbers. We assume that $\sigma_k\ge 0$. For every $k\ge 1$ we define bounded linear operators $B_k = \sigma _k e_k\otimes e_k$, and a possibly unbounded operator $$A= \sum_{j=1}^{+\infty} \alpha_j e_j\otimes e_j$$ with the domain $$D(A)=\left\{ x\in H\colon \sum_{j=1}^{+\infty} \alpha _j^2 \langle x,e_j\rangle ^2_H<+\infty\right\}.$$ Let us recall that in the Skorokhod example $\alpha_k=0$, $k=1,2,\ldots$, and $\sigma_k=\sigma$ does not depend on $k$. Proposition 9. Consider with $A, B_1, \ldots$ as above. Then the following holds. $(i)$ For each initial value $x\in H$, has a square integrable solution $X^x_s$ in $H$ if and only if $$\label{E31} \sup_{k} \left(2\alpha _k + \sigma_k^2\right)<+\infty.$$ $(ii)$ Assume . Then defines stochastic flow $\mathcal{X}$ if and only if $$\label{E32} \rho(s,t):= \sup_{k} \left\{\left[ \alpha _k -\frac{ \sigma_k^2}{2}\right] \sqrt{t-s} + \sigma_k \sqrt{2\log k}\right)<+\infty, \quad \forall\, 0\le s<t.$$* Proof. Define $$\zeta_k(s,t;\omega):= \exp\left\{ \sigma_k\left(W_k(t;\omega)-W_k(s;\omega)\right)+ \left(\alpha _k -\frac{\sigma_k^2}{2}\right)(t-s)\right\}.$$ Clearly random variable $\zeta_k(s,t)$, $k=1\ldots,$ are independent and $$\label{E33} \mathbb{E}\, \zeta_k(s,t)= {\rm e}^{\alpha_k(t-s)}\qquad\text{and}\qquad \mathbb{E}\, \zeta^2_k(s,t)= {\rm e}^{(2\alpha_k+\sigma_k^2)(t-s)}.$$ Consider a finite dimensional subspace $V=\text{linspan}\{e_{i_1},\ldots, e_{i_M}\}$ of $H$. Then for any $x\in V$, the solution exists and $$\label{E34} X^x_s(t)= \sum_{k=1}^{+\infty} \zeta_k(s,t)\langle x,e_k\rangle_H e_k.$$ Next, note that for any $x\in H$, if $X^x_s$ is a solution, then it has to be of form . For, if $\Pi\colon H\mapsto \text{linspan}\{e_{i_1},\ldots, e_{i_M}\}$ is a projection then $\Pi X^x_s= X^{\Pi x}_s$. Therefore, the first claim follows from . Next, defines a stochastic flow $\mathcal X$ if and only if $$\mathbb{P}\left\{ \sup_{\|x\|_H\le 1} \left\vert X^x_s(t)\right\vert_H<+\infty\right\} = 1.$$ Clearly, $$\begin{aligned} &\mathbb{P}\left\{ \sup_{\|x\|_H\le 1} \left\vert X^x_s(t)\right\vert_H<+\infty\right\} = \mathbb{P}\left\{ \sup_k \zeta(s,t)<+\infty\right\} \\&\qquad = \mathbb{P}\left\{ \sup_{k} \left[ \sigma_k\left(W_k(t)-W_k(s)\right)+ \left(\alpha _k -\frac{\sigma_k^2}{2}\right)(t-s) \right]<+\infty\right\}. \end{aligned}$$ Therefore the desired conclusion follows from the fact that if $(Z_k)$ is a sequence of independent $\mathcal{N}(0,1)$ random variables, then $$\limsup_{k\to+\infty} \frac{Z_k}{\sqrt{2\log k}}=1,\qquad \mathbb{P}-a.s.$$ ◻ Remark 10. Assume and . Let $\mathcal{X}$ be the stochastic flow. Then $$\mathbb{E}\, \text{Tr} \, \mathcal{X}(s,t)= \sum_{k=1}^{+\infty}\mathbb{E}\, \zeta_k(s,t)= \sum_{k=1}^{+\infty} {\rm e}^{\alpha_k(t-s)}.$$ Note that and $\sum_{k=1}^{+\infty} {\rm e}^{\alpha_k(t-s)}<+\infty$ imply . ## Beyond integrability As above, $(\sigma_k)$ is a sequence of nonnegative real numbers. Assume now that $\alpha_k=0$. Thus $$\zeta_k(s,t)= \exp\left\{ -\frac{\sigma^2_k}{2}(t-s)+ \sigma_k \left(W_k(t)-W_k(s)\right)\right\}.$$ Let us assume that $$\label{E35} \rho(s,t):= \sup_{k} \left\{-\frac{ \sigma_k^2}{2} \sqrt{t-s} + \sigma_k \sqrt{2\log k}\right)<+\infty, \quad \forall\, 0\le s<t.$$ We do not assume however . Then $$\mathcal{X}(s,t;\omega)(x)= \sum_{k=1}^{+\infty}\zeta_k(s,t;\omega)\langle x,e_k\rangle _He_k, \qquad 0\le s\le t, \ x\in H,$$ is well defined, and since $\sup_{k} \zeta_k(s,t)<+\infty$, $\mathcal{X} \colon \Delta\times \Omega \mapsto L(H,H)$. Obviously $\mathcal{X}(s,t)$ is a symmetric positive definite operator with eigenvectors $(e_k)$ and eigenvalues $(\zeta_k(s,t))$. Note that it can happen that $\mathbb{E}\left\vert \mathcal{X}(s,t)(x)\right\vert ^2_H=+\infty$. Moreover, if $\alpha_k=0$ then necessarily $\mathbb E\,\mathrm{Tr}\left(\mathcal{X}(s,t)\right)=+\infty$. Proposition 11. * Assume . Then the following holds:* - If $l\in[0,+\infty]$ is an accumulation point of the sequence $(\sigma_k)$, then either $l=0$ or $l=+\infty$. - If $\sigma_k\to 0$ then $\sup_k \sigma_k\sqrt{\log k} <+\infty$. In particular, $\mathbb{P}$-a.s. the sequence $\left(\zeta_k(s,t)\right)$ of eigenvalues of $\mathcal{X}(s,t)$ has accumulation points different from zero, hence $\mathcal{X}(s,t)$ is not compact $\mathbb P$-a.s. - If $\sigma_k\to+\infty$ then $\frac{\sigma_k}{\sqrt{\log k}}\to+\infty$ and $\mathcal{X}(s,t)$ is Trace class $\mathbb{P}$-a.s. Proof. Statement $(i)$ is obvious as $\sigma_k\ge 0$. The first part of $(ii)$ is obvious. Assume that $\sigma_k\to 0$ and that $\sup_{k}\sigma_k\sqrt{\log k}<+\infty$. We have to show that $\mathbb{P}$-a.s. the sequence $$\exp\left\{ \sigma_k\left(W_k(t)-W_k(s)\right)\right\}, \qquad k=1,\ldots$$ has accumulation points different from zero, or equivalently that $$\liminf_{k\to +\infty} \sigma_k\left(W_k(t)-W_k(s)\right)>-\infty, \qquad \mathbb{P}-a.s.$$ Since $$\liminf_{k\to +\infty} \frac{W_k(t)-W_k(s)}{\sqrt{2 \log k}}= -\sqrt{t-s}, \qquad \mathbb{P}-a.s.$$ we have $$\liminf_{k\to +\infty} \sigma_k \left(W_k(t)-W_k(s)\right)= -\sqrt{t-s} \limsup_{k\to +\infty} \sigma_k \sqrt{2\log k}>-\infty, \quad \mathbb{P}-a.s.$$ We are proving now the last statement od the proposition. Condition can be rewritten in the form $$\sup_k\sigma_k^2\left(\frac{\sqrt{2\log k}}{\sigma_k}-\frac{\sqrt{t-s}}{2}\right)<+\infty,\qquad\forall\, 0\le s<t.$$ Since $\sigma_k\to +\infty$, we have $$\limsup_{k\to +\infty} \frac{\sqrt{2\log k}}{\sigma_k}< \frac{\sqrt{t-s}}{2}, \qquad\forall\, 0\le s<t,$$ which leads to the desired conclusion that $\frac{\sigma_k}{\sqrt{\log k}}\to+\infty$. We will use the Kolmogorov three series theorem to show that $$\text{Tr}\, \mathcal{X}(s,t)=\sum_{k=1}^{+\infty}\zeta_k(s,t)<+\infty,\qquad\mathbb{P}-a.s.$$ Let us fix $s$ and $t$. Define $Y_k=\zeta_k(s,t)\chi_{[0,1]}\left(\zeta_k(s,t)\right)$. We need to show that $$\begin{aligned} \sum_{k=1}^{+\infty}\mathbb{P}\left( \zeta_k(s,t)>1\right)&<+\infty,\label{E36}\\ \sum_{k=1}^{+\infty}\mathbb{E}\, Y_k&<+\infty, \label{E37}\\ \sum_{k=1}^{+\infty}\mathrm{Var}\, Y_k&<+\infty\label{E38} \end{aligned}$$ Denoting by $Z$ a normal $\mathcal{N}(0,1)$ random variable and putting $b=\frac{\sqrt{t-s}}{2}$ we obtain $$\begin{aligned} \sum_{k=1}^{+\infty}\mathbb{P}\left( \zeta_k(s,t)>1\right) &=\sum_{k=1}^{+\infty}\mathbb{P}\left(Z>b\sigma_k\right) \\ &\le c+\dfrac{1}{\sqrt{2\pi}}\sum_{k=2}^{+\infty}\dfrac{1}{b\sigma_k}{\rm e}^{-b^2\sigma_k^2/2}=c+ \dfrac{1}{\sqrt{2\pi}}\sum_{k=2}^{+\infty}\dfrac{1}{b\sigma_k}{\rm e}^{-\delta_k\log k}\\ &\le c+C\sum_{k=2}^{+\infty}\dfrac{1}{k^{\delta_k}} \end{aligned}$$ with $\delta_k=\frac{b^2\sigma_k^2}{2\log k}$. Since $\delta_k\to+\infty$, follows. Let $\left.\frac{{\rm d}\mathbb Q}{{\rm d}\mathbb{P}}\right\|_{\mathfrak {F}_t}=\zeta_k(s,t)$, where $\mathfrak{F}_t=\sigma\left(W_k(r), k=1,\ldots, r\le t\right)$. Then $W_k^{\mathbb Q}(r):=W_k(r)-\sigma_kr$, $r\le t$, are independent Wiener processes under $\mathbb Q$ and $$\log\zeta_k(s,t)=\frac{1}{2}\sigma_k^2(t-s)+\sigma_k\left(W_k^{\mathbb Q}(t)-W_k^{\mathbb{Q}}(s)\right).$$ Therefore, $$\mathbb{E}\, Y_k =\mathbb{E}\, \zeta_k(s,t)\chi_{[0,1]}\left(\zeta_k(s,t)\right)= {\mathbb Q}\left(\zeta_k(s,t)\le 1 \right) =\mathbb Q\left( Z>\frac{1}{2}\sigma_k\sqrt{t-s}\right)$$ and by the same arguments as above we find that holds as well. Finally, since $$\mathbb{E}\, Y_k^2=\mathbb{E}\, \zeta^2_k(s,t)\chi_{[0,1]}\left(\zeta_k(s, t)\right) \le \mathbb{E}\, Y_k,$$ condition is satisfied and the proof is complete. ◻ ## The case of commuting operators Let us recall that in the Skorokhod example $A=0$ and $B_j=e_j\otimes e_j$ are commuting operators. We have $$\begin{aligned} X^x_s(t)&= \sum_{j=1}^{+\infty} \exp\left\{ W_j(t)-W_j(s)-\frac{t-s}{2}\right\} e_j\otimes e_j(x)\\ &= \lim_{N\to+ \infty} \sum_{j=1}^N \exp\left\{ W_j(t)-W_j(s)-\frac{t-s}{2}\right\} e_j\otimes e_j(x)\\ &= \lim_{N\to +\infty} \exp\left\{ \sum_{j=1}^N e_j\otimes e_j \left( W_j(t)-W_j(s)-\frac{t-s}{2}\right)\right\} (x). \end{aligned}$$ The convergence is not uniform in $x$. In the last line we calculate the exponent of a bounded operator $$\sum_{j=1}^N e_j\otimes e_j \left( W_j(t)-W_j(s)-\frac{t-s}{2}\right).$$ A proof the following simple generalization of Proposition is left to the reader. Recall that a sequence of bounded operators $S_n$ converges to a bounded operator $S$ strongly if $S_nx\to Sx$ for any $x\in H$. Proposition 12. * Assume that $(B_k)$ is an infinite sequence of bounded commuting operators on a Hilbert space $H$. Then:* - For any $x\in H$ and $s\ge 0$ there is a square integrable solution $X^x_s$ to if and only for all $0\le s\le t$, the sequence $\exp\left\{(t-s)\sum_{k=1}^n B_k^2\right \}$ converges strongly as $n\to +\infty$. Moreover, $$X^x_s(t)= \lim_{n\to +\infty}\exp\left\{ \sum_{k=1}^n B_k \left(W_k(t)-W_k(s)\right)+ B_0(t-s) - \frac{1}{2} \sum_{k=1}^n B_k^2 \left(t-s\right) \right\}x,$$ where the limit is in $L^2(\Omega, \mathfrak{F},\mathbb{P};H)$. - * generates a stochastic flow if and only if for all $0\le s\le t$, $\mathbb{P}$-a.s. $$\exp\left\{ \sum_{k=1}^n B_k \left(W_k(t)-W_k(s)\right) - \frac{1}{2} \sum_{k=1}^n B_k^2 \left(t-s\right) \right\}$$ converges as $n\to+ \infty$ in the operator norm.* ## System of multiplication operators In this section we assume that $$H=L^2(\mathbb{R}^d, \vartheta(\xi){\rm d}\xi),$$ where the weight $\vartheta\colon \mathbb{R}^d\mapsto (0,+\infty)$ is a measurable function. Let $W$ be a Wiener process taking values in $H$. Then, see see e.g. , $$\label{E39} W=\sum_{k} W_ke_k\,,$$ where $(W_k)$ are independent real-valued Wiener processes, and $\{e_k\}$ is an orthonormal basis of the Reproducing Hilbert Kernel Space (RHKS for short) of $W$. Consider the equation $$\label{E310} {\rm d}X(t)= X(t) {\rm d}W(t), \qquad X(s)=x,$$ Taking into account we can write in the form $$\label{E311} {\rm d}X(t)= \sum_{k} B_k X{\rm d}W_k(t),\qquad X(s)=x,$$ where $B_k$ are multiplication operators; $B_kh= he_k$, $h\in H$. Clearly a multiplication operator $h\mapsto he$ is bounded on $H$ if and only if $e\in L^\infty(\mathbb{R}^d)$. Note that bounded multiplication operators commute and are symmetric. Proposition 13. * Assume that $e_k\in L^\infty(\mathbb{R}^d)$. Then:* - For any $x\in H$ and $s\ge 0$ there exists a square integrable solution $X^x_s$ to if and only if $\sum_{k=1}^{+\infty} e_k^2 \in L^\infty(\mathbb{R}^d)$. Moreover, $$X^x_s(t)= L^2-\lim_{n\to +\infty} \exp\left\{ \sum_{k=1}^n \left(W_k(t)-W_k(s)\right)e_k - \frac{t-s}{2} \sum_{k=1}^n e_k^2 \right\}x.$$ - Assume that $\sum_{k=1}^{+\infty} e_k^2 \in L^\infty(\mathbb{R}^d)$. Then defines a stochastic flow if and only if $$\mathbb{P}\left\{ \operatorname{ess}\sup\limits_{\xi\in \mathbb{R}^d}\sum_k e_k(\xi)W_k(t)<+\infty\right\}=1,$$ equivalently iff the process $W(t)= \sum_k W_k(t)e_k$ lives in $L^\infty(\mathbb{R}^d)$; that is $$\mathbb{P}\left( W(t)\in L^\infty(\mathbb{R}^d)\right)=1.$$ - If $\sum_{k} \log \sqrt{k} \vert e_k\vert \in L^\infty(\mathbb{R}^d)$, then defines stochastic flow on $H$. Proof. The first part follows from Proposition . The second part is a reformulation of the second part of Proposition . The las part follows from the law of iterated logarithm. ◻ Example 14. Assume that $W=W(t,\xi)$ is a spatially homogeneous Wiener process on $\mathbb{R}^d$, see e.g. . Then $W(t)= \sum_{k} e_k W_{k}$, where $e_k =\widehat{f_k\mu}$, $\{f_k\}$ is a orthonormal basis of $L^2_{(s)}(\mathbb{R}^d,\mu)$ and $\mu$ is the spectral measure of $W$. The sum over finite or infinite number of indices $k$. We can assume that $e_k\in L^\infty(\mathbb{R}^d)$ by choosing suitable $f_k$. Then $$\sum_{k} e_k^2 = \sum_{k} \left\vert \widehat {f_k \mu}\right\vert ^2 = \mu(\mathbb{R}^d).$$ Therefore, has a square integrable solution if and only if $\mu$ is finite, that is $W$ is a random field. The condition $\mathbb{P}\left( W(t)\in L^\infty(\mathbb{R}^d)\right)$ wich is if and only if condition for the existence of stochastic flow holds only in some degenerated cases. Namely assume that $$W(t,\xi)= \sum_{k=1}^{+\infty} a_k \left(W_k(t)\cos\langle \xi, \eta_k\rangle + \tilde W_k(t)\sin \langle \xi,\eta_k\rangle\right),$$ where $\{\eta_k\}\subset\mathbb{R}^d$, $(a_k)\in l^2$, and $W_k$ and $\tilde W_k$, $k\in \mathbb{N}$, are independent real-valued Brownian motions. Then $W =W(t,\xi)$ is a spatially homogeneous Wiener process. Taking into account that $$\limsup_{k\to +\infty} \frac{W_k(t)}{\sqrt{2\log k}}= \sqrt{t},$$ we see the equation defines stochastic flow on any weighted $L^2$-space if $\sum_{k} \vert a_k\vert \sqrt{\log k}<+\infty$. Example 15. Assume that $W$ is a Brownian sheet on $[0,L)^{d+1}$ where $L\in [0,+\infty]$. To be more precisely $W(t, \xi_1, \xi_2,\ldots,\xi_d)$ is a Gaussian random field on $[0,L)^{d+1}$ with the covariance $$\mathbb{E}\, W(t,\xi_1,\ldots,\xi_d)W(s,\eta_1,\ldots,\eta_d)= t\wedge s \prod_{k=1}^d \xi_k\wedge \eta_k.$$ Then $$f_k = \frac{\partial ^d}{\partial \xi_1,\ldots\partial \xi_d} e_k$$ is an orthonormal basis of $L^2([0,L)^d)$, respectively. Hence $$\sum_{k} e_k^2(\xi) = \sum_k \langle \chi_{[0,\xi_1]\times \ldots \times [0,\xi_d]}, f_k \rangle ^2 = \xi_1\xi_2\ldots \xi_d.$$ Therefore has a square integrable solution (in $L^2([0,T)^d$) if and only if $L<\infty$. Clearly, Brownian sheet has continuous trajectories, and therefore for arbitrary $T<+\infty$ and $L<+\infty$ we have $\mathbb{P}\left\{ \sup_{0\le t<T, \xi\in [0,L)^d} W(t, \xi)<+\infty\right\}=1$. Hence if is considered on a bounded domain the equation defines stochastic flow. Let us observe that the stochastic flow can be also well defined, but not square integrable, if $L=+\infty$. Indeed, the stochastic flow exists, if and only if $$\label{E412} \mathbb{P}\left\{\sup\limits_{\xi\in [0,+\infty)^d}\left( W(t,\xi_1,\ldots,\xi_d)- \frac t 2 \xi_1\xi_2,\ldots \xi_d\right) <+\infty\right\}=1.$$ # Equations in Schatten classes The problem of the existence of the stochastic flow in the finite dimensional case $H=\mathbb{R}^d$ is simple. The flow $\mathcal{X}(s,t)$ takes valued in the space of bounded linear operators $L(H,H)$ that can be identified with the space of $d\times d$ matrices $M(d\times d)$. We have the following SDE for $\mathcal{X}$ in the space $M(d\times d)$; $$\label{E51} {\rm d}\mathcal{X} = A\mathcal{X} {\rm d}t + ({\rm d}\mathcal{W})\mathcal{X}, \qquad \mathcal{X} (s,s)=I,$$ where $\mathcal {W}:= \sum_{k=1}^d B_k W_k$ and $I$ is the identity matrix. By a standard fixed point argument has a unique global solution. In infinite dimensional case, even if $(B_k)$ is a finite sequence of bounded linear operators, the situation is different. There is no proper theory of stochastic integration in the space $L(H,H)$ if $H$ is infinite-dimensional. One can overcome this difficulty by replacing $L(H,H)$ with a smaller space of operators, such as the Hilbert–Schmidt or, more generally, the Schatten classes of operators. ## Main result Let us recall that for every $p\in[1,+\infty)$ the Schatten class $\mathbb S^p$ of compact operators $K\colon H\to H$ is a Banach space endowed with the norm $$\\|K\\|_p=\left(\sum_{k=1}^{+\infty} \lambda_k\left( K^\star K\right)^{p/2}\right)^{1/p}<+\infty\,,$$ where $\lambda_k(K^\star K)$ stands for the $k$-th eigenvalue of $K^\star K$. For every $p\in[1,+\infty)$ the space $\mathbb S^p$ is a separable Banach space. Lemma 16. * For every $p\ge 2$ the space $\mathbb S^p$ is an M-type 2 Banach space.* Proof. By Propositions 5.4.2 in the space $\mathbb S^p$ is a UMD space for every $p\in(1,+\infty)$. By Proposition 7.1.11 in , $\mathbb S^p$ has type 2 for $p\in[2,+\infty)$, and by Proposition 4.3.13 in , M-type 2 property follows. ◻ Lemma ensures that if $p\ge 2$, then $\mathbb{S}^p$ is a right space for stochastic integration, for more details see e.g. . We only recall here that if $W_k$ are independent Wiener processes, ad $\psi_k$ are $\mathbb{S}^p$-valued progressively measurable processes, such that $$\mathbb{E} \int_0^T \\|\psi_k(s)\\|^2_p{\rm d}s <+\infty, \qquad k=1,2, \ldots,$$ then for each $k$, the integral $$\int_0^t \psi_k(s){\rm d}W_k(s),$$ is well-defined, has continuous trajectories in $\mathbb{S}^p$ and there exists a universal constant $C$ such that $$\mathbb{E}\left\\|\int_0^T \psi_k (s){\rm d}W_k(s)\right\\| ^2_p \le C\, \mathbb{E} \int_0^T \\|\psi_k(s)\\|^2_p{\rm d}s <+\infty\,.$$ Moreover, $$\mathbb{E} \left\\|\sum_{k=1}^N \int_0^t \psi_k(s) {\rm d}W_k(s)\right\\| ^2_p \le C' \sum_{k=1}^N \mathbb{E} \int_0^T \\|\psi_k(s)\\|^2_p{\rm d}s.$$ The definition of the Wiener process given below uses the fact that $L(H,H)=\left(\mathbb S^1\right)^\star$. Definition 17. Let $B_k\in L(H,H)$ for $k\ge 1$. We call $\mathcal{W}=\sum_{k=1}^{+\infty} W_kB_k$ a cylindrical Wiener process on $L(H,H)$ if for every $K\in\mathbb{S}^1$ the process $$W^K(t)=\sum_{k=1}^{+\infty}\mathrm{Tr}\left( KB_k\right) W_k(t)\,,$$ is a real-valued Wiener process, and there is a constant $C$ independent of $K$ such that $$\label{Cyl} \mathbb{E} \left\|W^K(t)\right\|^2=t \sum_{k=1}^{+\infty} \left(\mathrm{Tr}\left( KB_k\right)\right)^2\le Ct \\|K\\|_1^2<+\infty\,.$$ Note that condition holds if $\sum_k\left\\|B_k\right\\|^2_{L(H,H)}<\infty$. Indeed, we have (see Theorem 3.1 in ) $$\left\|\mathrm{Tr}\left(KB_k\right)\right\|\le \left\\|KB_k\right\\|_1\le \left\\|B_k\right\\|_{L(H,H)}\\|K\\|_1$$ and the claim follows. Lemma 18. * Let $(A,D(A))$ be the generator of a $C_0$-semigroup $({\rm e}^{tA})$ on $H$. For $T\in L(H,H)$ we define $$\label{E43} \mathcal{S}(t)T={\rm e}^{tA}\circ T,\quad t\ge0\,,$$ and denote $\mathcal S=\{\mathcal S(t);\, t\ge 0\}$. Then: (a) $\mathcal{ S}=\left(\mathcal{ S}(t)\right)$ defines a semigroup (but in general not a $C_0$-semigroup) of bounded operators on $L(H,H)$. (b) For every $p\in[1,+\infty)$ we have $\mathcal S(t)\mathbb S^p\subset\mathbb S^p$ and $\mathcal S$ defines a $C_0$-semigroup on $\mathbb S^p$.* Proof. Since $\mathbb S^p$ is an ideal, we have $\mathcal S(t)\mathbb S^p\subset\mathbb S^p$ and the operator $\mathcal S(t)\colon \mathbb S^p\to\mathbb S^p$ is bounded. To prove strong continuity of $\mathcal S$ on $\mathbb S^p$, let us recall that for every $p\in[1,+\infty)$ the space $\mathbb S^p$ is the closure of the space of finite rank operators in the $\mathbb S^p$-norm. Let $t>0$ and $T\in\mathbb S^p$ be fixed. There exists a sequence $\left(T_n\right)$ of finite rank operators, such that $\left\\|T_n-T\right\\|_p\to 0$. Choose $n$ such that for $t\le 1$, $$\left\\|T-T_n\right\\|_p+\left\\|{\rm e}^{tA}\left(T-T_n\right)\right\\|_p<\varepsilon\,.$$ Since $$\begin{aligned} \left\\|\mathcal S(t)T-T\right\\|_p&\le \left\\|\mathcal S(t)\left(T-T_n\right)\right\\|_p+\left\\|\mathcal S(t)T_n-T_n\right\\|_p+\left\\|T-T_n\right\\|_p\\ &\le\varepsilon+\left\\|\mathcal S(t)\left(T-T_n\right)\right\\|_p \end{aligned}$$ and $$\lim_{t\to 0}\left\\|\mathcal S(t)T_n-T_n\right\\|_p=0\,,$$ the strong continuity follows. ◻ Let $A$ be the generator of a $C_0$-semigroup $({\rm e}^{tA})$ on $H$ and let $\mathcal A$ be the generator of the semigroup $\mathcal S$ defined in Lemma . Let $W$ be a cylindrical Wiener process on $L(H,H)$. Consider the following stochastic equation on $\mathbb{S}^p$, where $p\ge 2$, $$\label{E42} {\rm d}\mathcal{X} =\mathcal{A}\mathcal{X} {\rm d}t+({\rm d}\mathcal{W})\mathcal{X}, \qquad \mathcal{X}(s)=I.$$ Definition 19. Let $p\ge 1$. We will say that a process $\mathcal{X}(s,\cdot;\cdot) \colon [s,+\infty)\times \Omega\mapsto L(H,H)$ with continuous paths in $L(H,H)$ is an $\mathbb{S}^p$-valued solution to if $\mathcal{X}(s,\cdot;\cdot)\colon (s,+\infty)\times \mapsto \mathbb{S}^p$ is measurable and adapted, $$\mathbb{E}\int_s^T \\|\mathcal{X}(t)\\|_p^2{\rm d}t <+\infty,\qquad \text{for any $T\in (s,+\infty)$},$$ and $$\mathcal{X}(t)=\mathcal{S}(t-s)I+\int_s^t\mathcal{S}(t-r)({\rm d}\mathcal{W}(r))\mathcal{X}(r) \qquad \text{for $t\ge s$, $\mathbb{P}$-a.s.}.$$ In the equation above $$\int_s^t\mathcal{S}(t-r)({\rm d}\mathcal{W}(r))\mathcal{X}(r)=\sum_k\mathcal S(t-r)B_k\mathcal X(r){\rm d}W_k(r)\,.$$ We have the following result. Proposition 20. If $\mathcal{X}$ is a solution to with $\mathcal{A}$ as above, then $\mathcal{X}$ is the flow corresponding to . Theorem 21. * Assume that there exist constants $\gamma <1/2$ and $C>0$ such that $$\label{E44} \\|S(t)\\|_p\le\frac{C}{t^\gamma},\quad t\le 1.$$ Assume that $\mathcal{W}=\sum_kB_k W_k$ is a cylindrical Wiener process on $L(H,H)$ such that $\sum_k \\|B_k\\|^2 <+\infty$. Then for any $s\ge 0$, has a unique solution in $\mathbb{S}^p$. Moreover, $(s,+\infty)\ni t\mapsto \mathcal{X}(s,t)\in \mathbb{S}^p$ is continuous $\mathbb{P}$-a.s.* Proof. Let us fix $0\le s<T<+\infty$. Let $\Psi$ be the space of all adapted measurable processes $\psi \colon (s,T] \times \Omega\mapsto \mathbb{S}^p$ such that $$\mathbb{E}\int_s^T \\|\psi(t)\\|^2_p {\rm d}t <+\infty.$$ On $\Psi$ consider the family of equivalent norms $$\|\|\| \psi \|\|\| _{\beta} := \left(\mathbb{E}\int_s^T {\rm e}^{-\beta t} \\|\psi(t)\\|^2_p {\rm d}t\right)^{1/2}, \qquad \beta \ge 0.$$ Note that, as $({\rm e}^{tA})$ is $C_0$ on $H$ there is a constant $C_1<+\infty$ such that for $t\in (s,T]$, and $\psi\in \Psi$, $$\begin{aligned} \sum_{k} \mathbb{E} \int_s^t \\|\mathcal{S}(t-r)B_k \psi(r)\\|_p^2 {\rm d}r &\le \sum_{k} \mathbb{E} \int_s^t \\|{\rm e}^{(t-r)A}\\|^2 \\|B_k\\|^2 \\|\psi(r)\\|_p^2 {\rm d}r \\ &\le C_1 \sum_{k} \\|B_k\\|^2 \,\mathbb{E} \int_s^t \\|\psi(r)\\|_p^2 {\rm d}r <+\infty. \end{aligned}$$ Therefore, by assumption , the mapping $$\mathcal{I}\psi (t):= \mathcal{S}(t-s)I+\int_s^t\mathcal{S}(t-r)({\rm d}\mathcal{W}(r))\psi(r), \qquad \psi\in \Psi,\quad t\in (s,T],$$ is well-defined from $\Psi$ into $\Psi$. For $\beta$ large enough $\mathcal I$ is contraction on $(\Psi,\vert \\|\cdot\\|\vert_\beta)$. For we have $$\begin{aligned} \|\|\| \mathcal {I}(\psi)-\mathcal{I}(\phi) \|\|\|^2 &\le C_2 \sum_{k}\\|B_k\\|^2 \mathbb{E} \int_s^T {\rm e}^{-\beta t} \int_s^t \\|\psi(r)-\phi(r)\\|^2_p{\rm d}r {\rm d}t\\ &\le C_3\mathbb{E} \int_s^T {\rm e}^{-\beta r}\\|\psi(r)-\phi(r)\\|^2_p\int_r^T {\rm e}^{-\beta(t-r)}{\rm d}t {\rm d}r \\ &\le C_3 \frac{1}{\beta} \|\|\| \psi-\phi \|\|\|^2_\beta. \end{aligned}$$ Thus by the Banach fixed point theorem there is an $\mathcal{X}(s,\cdot)\in \Psi$ such that $\mathcal {I} \left(\mathcal{X}(s,\cdot)\right)= \mathcal{X}(s,\cdot)$. What is left is to show that for any $\psi\in \Psi$, the stochastic integral $$\int_s^t\mathcal{S}(t-r)({\rm d}\mathcal{W}(r))\psi(r), \qquad t\ge s,$$ has continuous paths in $\mathbb{S}^p$. Since there exists an $\alpha >0$ such that $$\int_s^T (t-s)^{-\alpha}\\|\mathcal{S}(t-s) I\\|^2_{p}{\rm d}t<+\infty,$$ the desired continuity follows from Burkholder inequality by a standard modification of the Da Prato–Kwapien–Zabczyk factorization method. ◻ Example 22. Assume that $A$ is diagonal $A= -\sum_{k} \alpha _k e_k\otimes e_k$, where $(e_k)$ is an orthonormal basis of $H$, and $\alpha_k\ge 0$ are real numbers. Then $A$ generates a $C_0$-semigroup ${\rm e}^{tA}$, $t\ge 0$, on $H$. Moreover, ${\rm e}^{tA}\in \mathbb{S}^p$ for $t>0$ if and only if $$\\|{\rm e}^{tA}\\|_{p}= \left( \sum_{k} {\rm e}^{-p\lambda_k t}\right)^{1/p}<+\infty.$$ In particular, we can consider the heat semigroup generated by a Dirichlet Laplacian $\Delta$ in a bounded subdomain of $\mathbb R^2$ with sufficiently smooth boundary. The eigenvalues of $\Delta$ have asymptotics $\lambda_{kn}\sim k^2+n^2$, hence $$\sum_{k,n} {\rm e}^{-pt\left(k^2+n^2\right)}\sim\frac{1}{2tp}\,,$$ which yields $$\\|{\rm e}^{t\Delta} \\|_{p}\sim\left(\frac{1}{2tp}\right)^{1/p}\,.$$ Therefore, condition is satisfied if and only if $p>2$. If $B_k\colon H\to H$ are bounded operators, such that $\sum_k\left\\|B_k\right\\|^2<+ \infty$, then Theorem ensures the existence of the stochastic flow in $\mathbb{S}_p$ for any $p>2$, but the Hilbert–Schmidt theory developed in cannot be directly applied. Note however that in , $B_k$ can be unbounded. [^1]: This work was partially supported by the ARC Discovery Grant DP200101866. [^2]: The work of Szymon Peszat was supported by Polish National Science Center grant 2017/25/B/ST1/02584.	{ "dup_signals": {}, "filename": "out/2105.04140.tex.md" }	arxiv
abstract: Updating a probability distribution in the light of new evidence is a very basic operation in Bayesian probability theory. It is also known as state revision or simply as conditioning. This paper recalls how locally updating a joint state can equivalently be described via inference using the channel extracted from the state (via disintegration). . This paper also investigates the quantum analogues of conditioning, and in particular the analogues of this equivalence between updating a joint state and inference. The main finding is that in order to obtain a similar equivalence, we have to distinguish two forms of quantum conditioning, which we call lower and upper conditioning. They are known from the literature, but the common framework in which we describe them and the equivalence result are new. author: Bart Jacobs date: 2024-10-03 title: Lower and Upper Conditioning in Quantum Bayesian Theory[^1] # Introduction This paper is about quantum analogues of Bayesian reasoning. It works towards one main result, Theorem below, which gives a relation between locally updating a joint state and Bayesian inference. This is a fundamental matter, which requires some preparation in order to set the scene. We use the term 'classical' probability for the ordinary, non-quantum form. We often use the word 'state' for a probability distribution, both in the classical and the quantum case. Classical Bayesian probability is based on what is called Bayes' rule. It describes probabilities of events (evidence) in an updated state. In fact, there are two closely related rules, sometimes called 'product rule' en 'Bayes rule' (proper). Making this distinction is not so relevant in the classical case, but, as we shall see, it is very relevant in the quantum case. The paper starts with the back-and-forth constructions between a joint state (distribution) on the one hand, and a channel with an initial state on the other. A channel is a categorical abstraction of a conditional probability. We shall describe this process in terms of pairing and disintegration, following . This process has a logical dimension that relates locally updating a joint state ('crossover inference') and Bayesian inference via the associated channel, in a result called the Bayesian Inference Theorem (see Theorem below). This result is already described in , but is repeated here in more concrete form, and illustrated with an example. The second part of the paper is about analogues in the quantum world. The constructions back-and-forth between a joint state and a channel exist in the literature and are adapted to the current context. What is new here is the quantum logical analogue of this back-and-forth process. It is shown that updating a state with new evidence, in the form of a predicate, splits in two operations, which we call 'lower' and 'upper' conditioning. Both forms exist already, but not as counterparts. We show that the earlier mentioned product rule holds for lower conditioning, but Bayes' rule itself holds for upper conditioning. In classical probability, the 'lower' and 'upper' versions coincide. In a next step, the main result of the paper (Theorem ) shows how 'lower' updating a joint state can equivalently be done via Bayesian inference with 'upper' conditioning, using the channel that is extracted from the joint state. This puts lower and upper conditioning into perspective and unveils some fundamental aspects of a quantum Bayesian theory. Finally, there are two separate points worth emphasising. First, several constructions in this paper are illustrated with concrete calculations, via the Python-based tool EfProb ; it works both for classical and quantum probability and uses a common language for both. Next, along the way we find a novel result about how disintegration introduces 'semi' higher order structure in discrete probability, see Subsection . # Basics of discrete classical probability This section recalls the basics of (classical, finite) discrete probability and fixes notation. For more information we refer to . A distribution, also called a state, on a set $X$ is a function $\omega\colon X \rightarrow [0,1]$ with finite support $\mathrm{supp}(\omega) = \{x\in X\;\|\;\omega(x) \neq 0\}$ and with $\sum_{x}\omega(x) = 1$. Such a distribution can also be written as formal convex sum $\omega = \sum_{x} \omega(x)\ensuremath{\|{\kern.1em}x{\kern.1em}\rangle}$. We write $\mathcal{D}(X)$ for the set of such distributions. The mapping $X\mapsto\mathcal{D}(X)$ is a monad on the category of sets, called the distribution monad. A joint state is a state on an $n$-ary product set. A binary state is thus a distribution $\tau\in\mathcal{D}(X_{1} \times X_{2})$. It has first and second marginals, written here as $\mathsf{M}_{1}(\tau) \in \mathcal{D}(X_{1})$ and $\mathsf{M}_{2}(\tau) \in\mathcal{D}(X_{2})$. These marginal states are defined in the standard way as $\mathsf{M}_{1}(\tau)(x_{1}) = \sum_{x_{2}}\tau(x_{1}, x_{2})$ and $\mathsf{M}_{2}(\tau)(x_{2}) = \sum_{x_{1}}\tau(x_{1}, x_{2})$. In the other direction, two states $\omega_{i}\in\mathcal{D}(X_{i})$ can be combined to product state $\omega_{1}\otimes\omega_{2}\in\mathcal{D}(X_{1}\times X_{2})$ via $(\omega_{1}\otimes\omega_{2})(x_{1},x_{2}) = \omega_{1}(x_{1}) \cdot \omega_{2}(x_{2})$. Obviously, $\mathsf{M}_{i}(\omega_{1}\otimes\omega_{2}) = \omega_{i}$. A channel is a function of the form $c\colon X \rightarrow \mathcal{D}(Y)$, that is, a map $X\rightarrow Y$ in the Kleisli category $\mathcal{K}{\kern-.4ex}\ell(\mathcal{D})$ of the distribution monad $\mathcal{D}$. Such a channel $c$ has a Kleisli extension function, or state transformer, $c \gg (-) \colon \mathcal{D}(X) \rightarrow \mathcal{D}(Y)$ given by $(c \gg \omega)(y) = \sum_{x} \omega(x) \cdot c(x)(y)$. For another channel $d\colon Y \rightarrow \mathcal{D}(Z)$ there is a composite channel $d \mathrel{\bullet}c \colon X \rightarrow \mathcal{D}(Z)$ given by $(d \mathrel{\bullet}c)(x) = d \gg c(x)$. Channels $c_{i} \colon X_{i} \rightarrow \mathcal{D}(Y_{i})$ can be combined to a product channel $c_{1}\otimes c_{2} \colon X_{1}\times X_{2} \rightarrow \mathcal{D}(Y_{1}\times Y_{2})$ by $(c_{1}\otimes c_{2})(x_{1}, x_{2}) = c_{1}(x_{1})\otimes c_{2}(x_{2})$. A (fuzzy) predicate on a set $X$ is a function $p\colon X \rightarrow [0,1]$. For another predicate $q\in [0,1]^{X}$ there is a (sequential) conjunction predicate $p\mathrel{\&}q$ on $X$ via $(p\mathrel{\&}q)(x) = p(x)\cdot q(x)$. For two predicates $p_{i} \in [0,1]^{X_i}$ on different sets $X_{i}$ we can form a parallel conjunction predicate $p_{1}\otimes p_{2} \in [0,1]^{X_{1}\times X_{2}}$, given by $(p_{1}\otimes p_{2})(x_{1},x_{2}) = p_{1}(x_{1}) \cdot p_{2}(x_{2})$. There is always a truth channel $\ensuremath{\mathbf{1}}\in [0,1]^{X}$ given by $\ensuremath{\mathbf{1}}(x) = 1$. For a state $\omega\in\mathcal{D}(X)$ and a predicate $p\in [0,1]^{X}$ on the same set $X$ the validity $\omega\models p$ in $[0,1]$ is the expected value $\sum_{x} \omega(x) \cdot p(x)$. If this validity is non-zero, one can form a conditioned state $\omega\|_{p}$ on $X$, given by $\omega\|_{p}(x) = \frac{\omega(x)\cdot p(x)}{\omega\models p}$. This updated state $\omega\|_{p}$ is called '$\omega$ given $p$', and is commonly written as $\omega(-\|p)$. It is easy to check to see that conditioning with truth does nothing: $\omega\|_{\ensuremath{\mathbf{1}}} = \omega$. Proposition 1. * Assuming the conditionings of the states below exist, we have the 'product' rule on the left, and the 'Bayesian' rule on the right: $$\label{eqn:classicalbayes} \begin{array}{rclcrcl} \omega\|_{p} \models q & = & \displaystyle\frac{\omega\models p\mathrel{\&}q}{\omega\models p} & \hspace{5em} & \omega\|_{p} \models q & = & \displaystyle\frac{(\omega\|_{q}\models p)\cdot(\omega\models q)}{\omega\models p}. \end{array}$$ Moreover, successive conditioning can be reduced to a single conditioning, as on the left below, so that conditioning becomes commutative, as on the right: $$\label{eqn:classicalsuccesiveconditioning} \begin{array}{rclcrcl} (\omega\|_{p})\|_{q} & = & \omega\|_{p\mathrel{\&}q} & \hspace{5em} & (\omega\|_{p})\|_{q} & = & (\omega\|_{q})\|_{p}. \end{array}$$* Each channel $c \colon X \rightarrow \mathcal{D}(Y)$ also gives rise to a predicate transformer function $c \ll (-) \colon [0,1]^{Y} \rightarrow [0,1]^{X}$, given by $(c \ll q)(x) = \sum_{y} c(x)(y)\cdot q(y)$. We can now relate validity $\models$ and state/predicate transformation ($\gg$ and $\ll$) via the following fundamental equality of validities: $$\label{eqn:classicalvaliditytransformation} \begin{array}{rcl} (c \gg \omega) \models q & \;=\; & \omega \models (c \ll q). \end{array}$$ # Classical Bayesian nets and disintegration A major rationale for using Bayesian networks is efficiency of representation: a joint probability distribution (state) on multiple sample spaces (domains) quickly becomes very large. Representing the same distribution in graphical form, as a 'Bayesian network' is often much more efficient. The directed graph structure is determined by conditional independence. Semantically, the directed arcs are given by channels, that is by stochastic matrices, or more abstractly by Kleisli morphisms for the distribution monad . The essence of this semantical view on Bayesian network theory consists of two parts. 1. The ability to move back-and-forth between a joint state and a graph (network) of channels. The difficult direction is extracting the various channels of the graph from a joint state. This is often called disintegration . 2. Equivalence of inference via joint states and inference via associated channels. In general, inference (or, Bayesian learning) happens via conditioning (updating, revising) of states, in the light of evidence given by predicates. Inference involves the propagation of such conditioning via joint states and/or via channels, via the bank-and-forth connections in , both in a forward and backward direction (as in ). Point is well-known, but point is usually left implicit; it is however a crucial part of why efficient representation of (big) joint states as Bayesian network graphs can be used for Bayesian reasoning. In this section we briefly elaborate both points below, and illustrate them with an example. Note that we do not claim that with these two points and we capture all essentials of Bayesian network theory: e.g., we do not address the matter of how to turn a joint state into a graph, via conditional independence or via causality. This question has also be studied in a quantum setting, see e.g. . ## Disintegration ```latex \begin{wrapfigure}[5]{r}{0pt} \begin{minipage}{11em}\centering \vspace{-1.5em} \begin{equation} \label{diag:pairing} \hspace{-1em}\ensuremath{\mathrm{pair}}(\omega,c) \coloneqq \vcenter{\hbox{% \begin{tikzpicture}[font=\small,scale=0.7] \node[state] (s) at (0,0) {$\omega$}; \node[copier] (copier) at (0,0.3) {}; \node[arrow box] (c) at (0.5,0.95) {$c$}; \coordinate (X) at (-0.5,1.5); \coordinate (Y) at (0.5,1.5); % \draw (s) to (copier); \draw (copier) to[out=150,in=-90] (X); \draw (copier) to[out=15,in=-90] (c); \draw (c) to (Y); \end{tikzpicture}}} \end{equation} \end{minipage} \end{wrapfigure} ``` Abstractly, point involves the correspondence between a joint state on the one hand, and a channel and a (single) state on the other hand. In one direction this is easy: given a state $\omega$ on $X$ and a channel $c\colon X \rightarrow Y$ we can form a joint state on $X\times Y$, namely as: $\ensuremath{\mathrm{pair}}(\omega, c) \coloneqq \big((\ensuremath{\mathrm{id}_{}}\otimes c) \mathrel{\bullet}\Delta\big) \gg \omega$, where $\Delta \colon X \rightarrow X\times X$ is the copier channel with $\Delta(x) = 1\ensuremath{\|{\kern.1em}x,x{\kern.1em}\rangle}$. This construction is drawn as a picture on the right , using the graphical language associated with monoidal categories. It will be used here only as illustration, hopefully in an intuitive self-explanatory manner. We refer to for details. ```latex \begin{wrapfigure}[6]{r}{0pt} \begin{minipage}{13em}\centering %\vspace{-0em} \begin{equation} \label{diag:disintegration} \vcenter{\hbox{% \begin{tikzpicture}[font=\small,scale=0.7] \node[state] (omega) at (0,0) {$\,\;\tau\;\,$}; \coordinate (X) at (-0.4,0.55) {}; \coordinate (Y) at (0.4,0.55) {}; % \draw (omega) ++(-0.4, 0) to (X); \draw (omega) ++(0.4, 0) to (Y); \end{tikzpicture}}} \; = \; \vcenter{\hbox{% \begin{tikzpicture}[font=\small] \node[state] (omega) at (0.25,0) {$\,\;\tau\;\,$}; \node[copier] (copier) at (0,0.4) {}; \node[arrow box] (c) at (0.5,0.95) {$c$}; \coordinate (X) at (-0.5,1.5); \coordinate (Y) at (0.5,1.5); \coordinate (omega1) at ([xshiftu=-0.25]omega); \coordinate (omega2) at ([xshiftu=0.25]omega); \node[discarder] (d) at ([yshiftu=0.2]omega2) {}; % \draw (omega1) to (copier); \draw (omega2) to (d); \draw (copier) to[out=150,in=-90] (X); \draw (copier) to[out=15,in=-90] (c); \draw (c) to (Y); \end{tikzpicture}}} \end{equation} \end{minipage} \end{wrapfigure} ``` Going in the other direction, from a joint state to a channel is less trivial. It is called disintegration* e.g. in . It involves a joint state $\tau$ on $X,Y$ from which a channel $c\colon X \rightarrow Y$ is extracted, in such a way that $\tau$ itself can be reconstructed from its first marginal $\mathsf{M}_{1}(\tau)$ and the channel $c$. Pictorially this marginal is represented by blocking its second wire via the ground symbol $\mathbin{\text{\raisebox{-0.2ex}{\usebox\sbground}}}$. We write $\ensuremath{\mathrm{extr}}(\tau)$ for this extracted channel $c$. Then we can write Equation as $\tau = \ensuremath{\mathrm{pair}}\big(\mathsf{M}_{1}(\tau), \ensuremath{\mathrm{extr}}(\tau)\big)$. Lemma 2. * Extracted channels $\ensuremath{\mathrm{extr}}(\tau)$ exist and are unique in classical discrete probability, for joint states $\tau$ whose first marginal has full support.* For a more systematic, diagrammatic description of disintegration, also for continuous probability, we refer to . Here we only need it for discrete probability, as a preparation for the quantum case. ## Excursion on disintegration and semi-exponentials We conclude this part on disintegration with a novel observation. It is interesting in itself, but it does not play a role in the sequel. It shows that disintegration gives rise to higher order 'semi-exponential' structure, originally introduced in . Recall that a categorical description of a (proper) exponential in a cartesian closed category involves exponent objects $Y^X$ with an evaluation map $\mathrm{ev}\colon Y^{X} \times X \rightarrow Y$ such that for each map $f\colon Z\times X \rightarrow Y$ there is an abstraction map $\Lambda(f) \colon Z \rightarrow Y^{X}$. These $\mathrm{ev}$ and $\Lambda$ should satisfy: $$\label{eqn:exponential} \begin{array}{rclcrclcrcl} \mathrm{ev}\mathrel{\circ}(\Lambda(f)\times\ensuremath{\mathrm{id}_{}}) & = & f & \qquad & \Lambda(f \mathrel{\circ}(g\times\ensuremath{\mathrm{id}_{}})) & = & \Lambda(f) \mathrel{\circ}g & \qquad & \Lambda(\mathrm{ev}) & = & \ensuremath{\mathrm{id}_{}}. \end{array}$$ The last two equations ensure that $\Lambda(f)$ is the unique map $h$ with $\mathrm{ev}\mathrel{\circ}(h\times\ensuremath{\mathrm{id}_{}}) = f$, since: $h = \ensuremath{\mathrm{id}_{}}\mathrel{\circ}h = \Lambda(\mathrm{ev}) \mathrel{\circ}h = \Lambda(\mathrm{ev}\mathrel{\circ} (h\times\ensuremath{\mathrm{id}_{}})) = \Lambda(f)$. For a semi-exponential, the last equation in need not hold. A semi-exponential is thus more than a 'weak' exponential (only the first equation) since it also satisfies naturality (the second equation). In the language of the $\lambda$-calculus, having 'semi-exponentials' means that one has a $\beta$-equation, but not an $\eta$-equation, see or for more details. Theorem 1. * Let $\mathcal{K}{\kern-.4ex}\ell_{\mathsf{f}}(\mathcal{D})$ be the subcategory of the Kleisli category $\mathcal{K}{\kern-.4ex}\ell(\mathcal{D})$ of the distribution monad on $\ensuremath{\mathbf{Sets}}$ with only finite sets as objects. This category $\mathcal{K}{\kern-.4ex}\ell_{\mathsf{f}}(\mathcal{D})$ is symmetric monoidal 'semi' closed: it has semi-exponentials, which are semi-right adjoint to the (standard) tensor product.* ## Bayesian inference and disintegration We now turn to the second point from the very beginning of this section, about Bayesian inference, especially in relation to the passage back-and-forth between joint states and channels via pairing and extraction, as just described. It may happen that a joint state $\tau\in\mathcal{D}(X\times Y)$ is equal to the product of its two marginals, i.e. $\tau = \mathsf{M}_{1}(\tau) \otimes \mathsf{M}_{2}(\tau)$. The state $\tau$ is then called non-entwined. The more common case is that a joint state is entwinted, and its marginal components are correlated. If we then update in one component, we see a change in the other component. This is called crossover influence in . The essence of the point , in the beginning of this section, about inference and disintegration is that for a joint state $\tau$, this crossover influence can be propagated through the channel $c$ that is extracted from the state $\tau$ via disintegration. This is expressed in the next result, called the Bayesian Inference Theorem. Theorem 2. * Let $\tau\in\mathcal{D}(X\times Y)$ be a joint state, and $c = \ensuremath{\mathrm{extr}}(\tau) \colon X \rightarrow \mathcal{D}(Y)$ the extracted channel obtained via disintegration — as described in Subsection . For predicates $p\in [0,1]^{X}$ and $q\in [0,1]^{Y}$ we then have: $$\label{eqn:classicalconditioning} \begin{array}{rclcrcl} \mathsf{M}_{2}\big(\tau\|_{p\otimes\ensuremath{\mathbf{1}}}\big) & = & c \gg \big(\mathsf{M}_{1}(\tau)\big\|_{p}\big) & \qquad\mbox{and}\qquad & \mathsf{M}_{1}\big(\tau\|_{\ensuremath{\mathbf{1}}\otimes q}\big) & = & \mathsf{M}_{1}(\tau)\big\|_{c \ll q}. \end{array}$$* The first equation describes crossover inference on the left-hand-side as forward inference on the right: first update and then do state transformation $\gg$. The second equation in describes crossover inference in the other component as backward inference: first do predicate transformation $\ll$ and then update. The terminology of 'forward' and 'backward' inference comes from , see also . An abstract graphical proof of the equations is given in . But it is not hard to prove these equations concretely, by unwrapping the definitions. ## An illustration of inference in a classical Bayesian network We consider the relation between smoking and the presence of ashtrays and (lung) cancer, in the following simple Bayesian network. $$\email@example.com@R-1.5pc{ {\setlength\tabcolsep{0.2em}\renewcommand{\arraystretch}{1} \begin{tabular}{\|c\|c\|} \hline smoking & $\mathop{\mathsf{P}}(\text{ashtray})$ \\ \hline\hline $t$ & $0.95$ \\ \hline $f$ & $0.25$ \\ \hline \end{tabular}} & \ovalbox{\strut ashtray} & & \ovalbox{\strut cancer} & {\setlength\tabcolsep{0.2em}\renewcommand{\arraystretch}{1} \begin{tabular}{\|c\|c\|} \hline smoking & $\mathop{\mathsf{P}}(\text{cancer})$ \\ \hline\hline $t$ & $0.4$ \\ \hline $f$ & $0.05$ \\ \hline \end{tabular}} \\ \\ & & \ovalbox{\strut smoking}\ar[uul]_{a}\ar[uur]^{c} \rlap{\quad\smash{\setlength\tabcolsep{0.2em}\renewcommand{\arraystretch}{1} \begin{tabular}{\|c\|} \hline $\mathop{\mathsf{P}}(\text{smoking})$ \\ \hline\hline $0.3$ \\ \hline \end{tabular}}} }$$ Thus, 95% of people who smoke have an ashtray in their home, and 25% of the non-smokers too. On the right we see that in this situation a smoker has 40% chance of developing cancer, whereas a non-smoker only has 5% chance. The question we want to address is: what is the influence of the presence or absence of an ashtray on the probability of developing cancer? Here the presence/absence of the ashtray is the 'evidence', whose influence is propagated through the network. We shall describe the outcome using the EfProb tool , concentrating on evidence propagation, and not so much on the precise representation of the above network, using channels `a` and `c` associated with the conditional probability tables. We first consider the prior probabilities of smoking, ashtray, and cancer: The network gives rise to a joint state, by tupling the ashtray, identity and cancer channels, and applying them to the `smoking` state. We can then obtain the above three prior probabilities alternatively via three marginalisations of this joint state, namely as first, second, third marginals, by using in EfProb the corresponding masks `[1,0,0]`, `[0,1,0]`, `[0,0,1]` after the marginalisation sign `\%`. We now wish to infer the (adapted) cancer probability when we have evidence of ashtrays. We shall do this in two ways, first via crossover inference using the above joint state. The ashtray evidence `tt` needs to be extended (weakened) to a predicate with the same domain as the joint state. In the Equations this is written as: $p\otimes\ensuremath{\mathbf{1}}$, but in EfProb it is: `tt @ truth(bnd) @ truth(bnd)`. We first use this predicate for updating the joint state, written as `/` in EfProb, and then we marginalise to obtain the third 'cancer' component that we are interested in: Alternatively we can compute this posterior cancer probability by following the graph structure. The ashtray evidence `tt` is now first turned into predicate `a << tt` on the state `smoking`. After updating this state, we transform it to an updated cancer probability, via state transformation `>>`. We can do this down-and-up propagation in one go: The fact that we get the same distribution is an instance of the equations . As expected, in presence of ashtrays the probability of cancer is higher. Aside: clearly, ashtrays influence (the probability of) cancer, but they are not the cause; in the graph this influence happens via a common ancestor, namely smoking, working statistically as 'confounder', and as the actual cause of cancer. # Towards quantum Bayesian theory The main aim of this paper is to investigate quantum analogues of the Bayesian Inference Theorem , from the conviction that any adequate quantum Bayesian network theory should address these points and from the beginning of Section in a satisfactory manner. Point has received ample attention in quantum theory, see for instance . But Point involving quantum conditioning has not really been studied this explicitly. Our main result is that one can also describe quantum conditioning consistently, both on joint states and via channels, as in Equations , but this requires in the quantum case that one distinguishes two forms of conditioning, which we shall call lower and upper conditioning, written as $\sigma\|_{p}$ and $\sigma\|^{p}$ respectively[^2]. Classically these two forms of conditioning coincide, but the quantum world is more subtle — as usual. Lower conditioning has appeared in effectus theory and upper conditioning in the approach of . Here they are clearly distinguished for the first time, and used jointly to capture quantum inference and propagation of evidence. Interestingly, what is commonly called Bayes' rule holds for upper conditioning, but not for lower conditioning, for which we "only" have the product rule. First we introduce the basics about states and predicates in the quantum world. We shall do so for finite-dimensional quantum theory, using the formalism of Hilbert spaces. ## Basics of quantum probability Let $\mathscr{H}$ be a finite-dimensional complex Hilbert space. A state $\sigma$ of $\mathscr{H}$ is a positive operator on $\mathscr{H}$ with trace one. That is, $\sigma$ is linear function $\sigma \colon \mathscr{H} \rightarrow \mathscr{H}$ satisfying $\sigma \geq 0$ and $\ensuremath{\mathrm{tr}}(\sigma) = 1$. A state is often called a density matrix. The canonical way to define a state is to start from a vector $\ensuremath{\|{\kern.1em}v{\kern.1em}\rangle} \in \mathscr{H}$ with norm $1$, and consider the operator $\ensuremath{\|{\kern.1em}v{\kern.1em}\rangle}\langle\,v\,\| \colon \mathscr{H} \rightarrow \mathscr{H}$. It sends any element $\ensuremath{\|{\kern.1em}w{\kern.1em}\rangle}\in\mathscr{H}$ to the vector $\langle v\|w\rangle\cdot\ensuremath{\|{\kern.1em}v{\kern.1em}\rangle}$. An arbitrary state is a convex combination of such vector states $\ensuremath{\|{\kern.1em}v{\kern.1em}\rangle}\langle\,v\,\|$. A joint state $\tau$ on two Hilbert space $\mathscr{H}$ and $\mathscr{K}$ is a state on the tensor product $\mathscr{H}\otimes\mathscr{K}$. A predicate, also called an effect, is a positive operator $p$ on $\mathscr{H}$ below the identity: $0 \leq p \leq \ensuremath{\mathrm{id}_{}}$. The identity $\ensuremath{\mathrm{id}_{}}$ is given by the identity/unit matrix, and corresponds to the truth predicate, often written as $\ensuremath{\mathbf{1}}$. For each predicate $p$ there is an orthosupplement, written as $p^\bot$, playing the role of negation. It is defined by $p^{\bot} = \ensuremath{\mathrm{id}_{}}- p$, and satisfies: $p^{\bot\bot} = p$ and $p + p^{\bot} = \ensuremath{\mathbf{1}}$. The most interesting logical operation on quantum predicates is sequential conjunction $\mathrel{\&}$. It is defined via the square root operation on predicates, as: $$\label{AndthenEqn} \begin{array}{rcl} p \mathrel{\&}q & = & \sqrt{p}\,q\sqrt{p}. \end{array}$$ We pronounce $\mathrel{\&}$ as 'and-then', and read it as: after $p$ with its side-effect, the predicate $q$ holds. This operation $\mathrel{\&}$ has been studied in , and re-emerged in effectus theory . The square root of the matrix $p$ exists since $p$ is positive. It is computed via diagonalisation $\sqrt{p} = U\sqrt{D}U^{-1}$, where $p = UDU^{-1}$, in which $\sqrt{D}$ is obtained from the diagonal matrix $D$ by taking the square roots of the (positive) eigenvalues on the diagonal. States $\sigma$ and predicates $p$ of the same Hilbert space $\mathscr{H}$ can be combined in validity, defined as: $$\label{ValidityEqn} \begin{array}{rcl} \sigma\models p & \;\coloneqq\; & \ensuremath{\mathrm{tr}}(\sigma\,p) \;\in\; [0,1]. \end{array}$$ This standard definition is also known as the Born rule. Remark 1. * There is a standard way to embed classical probability into quantum probability. Suppose we have classical state $\omega$ and predicate $p$ on a finite set $X = \{x_{1}, \ldots, x_{n}\}$ with $n$ elements. Then we consider the Hilbert space $\mathbb{C}^{n}$ with standard basis given by vectors $\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}$ with an $1$ on the $i$-th position and zeros elsewhere. We write $\widehat{\omega} = \sum_{i} \omega(x_{i})\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}\langle\,i\,\|$ for the 'diagonal' quantum state on $\mathbb{C}^n$. By construction it is positive and has trace $\sum_{i}\omega(x_{i}) = 1$.* Similarly, a classical predicate $p \in [0,1]^{X}$ gives a quantum predicate $\widehat{p}$ on $\mathbb{C}^{n}$ via $\widehat{p} = \sum_{i} p(x_{i})\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}\langle\,i\,\|$. By construction, $0 \leq p \leq \ensuremath{\mathrm{id}_{}}$. It is easy to see that the classical and quantum validities coincide: $$\begin{array}{rcccccl} \omega\models p & = & \sum_{i} \omega(i)\cdot p(i) & = & \ensuremath{\mathrm{tr}}\big(\widehat{\omega}\,\widehat{p}\big) & = & \widehat{\omega} \models \widehat{p}. \end{array}$$ The mapping $\smash{\widehat{(\;\cdot\;)}}$ preserves the logical structure on predicates, including sequential conjunction $\mathrel{\&}$. Remark 2. * In both classical and quantum probability, as described here, a state is also a predicate. This is peculiar. When one moves to a higher level of abstraction, this is no longer the case — for instance by using von Neumann algebras instead of Hilbert spaces, or by using continuous probability distributions on measurable spaces instead of discrete distributions on sets. In the next section we sometimes 'convert' a state into a predicate, but we shall make explicit when we do so. A more abstract approach is possible, using the duality between states and effects, see also Remark .* ## Two forms of quantum conditioning This subsection introduces two forms of quantum conditioning of a state by a predicate, called 'lower' and 'upper' conditioning, and describes their basic properties. Definition 1. Let $\sigma$ be a state, and $p$ a predicate, on the same Hilbert space, for which the validity $\sigma\models p$ is non-zero. We shall use the following terminology, notation and definition for two forms of conditioning: $$\begin{array}{rclcrcl} \mbox{lower:}\quad \sigma\|_{p} & \coloneqq & \displaystyle\frac{\sqrt{p}\,\sigma\sqrt{p}}{\sigma\models p} & \qquad\qquad & \mbox{upper:}\quad \sigma\|^{p} & \coloneqq & \displaystyle\frac{\sqrt{\sigma}\,p\sqrt{\sigma}}{\sigma\models p}. \end{array}$$ It is easy to see that both $\sigma\|_{p}$ and $\sigma\|^{p}$ are states again — using the familiar 'rotation' property of traces: $\ensuremath{\mathrm{tr}}(AB) = \ensuremath{\mathrm{tr}}(BA)$. Lower conditioning $\sigma\|_{p}$ arises in effectus theory, whereas upper conditioning $\sigma\|^{p}$ comes from . We first observe that this difference between 'lower' and 'upper' does not exist classically. Lemma 3. * For classical (non-quantum) states and predicates, lower and upper conditioning coincide with classical conditioning. To express this more precisely we use the notation $\smash{\widehat{(\;\cdot\;)}}$ from Remark to translate from classical to quantum: for a classical state $\omega$ and predicate $p$, $$\begin{array}{rcccl} \widehat{\omega}\|_{\widehat{p}} & = & \omega\|_{p} & = & \widehat{\omega}\|^{\widehat{p}}. \end{array}$$* A second observation is about truth $\ensuremath{\mathbf{1}}$ and sequential conjunction $\mathrel{\&}$. Both lower and upper conditioning with truth $\ensuremath{\mathbf{1}}$ does nothing, like in the classical case, but successive conditioning cannot be reduced to single conditioning, like in the first equation in , in Proposition . In addition, the order in quantum conditioning matters, just like the order of priming in psychology matters . Remark 3. * We have $\sigma\|_{\ensuremath{\mathbf{1}}} = \sigma$ and $\sigma\|^{\ensuremath{\mathbf{1}}} = \sigma$, but in general successive quantum conditionings cannot be reduced to a single conditioning via sequential conjunction: $$\begin{array}{rclcrcl} (\sigma\|_{p})\|_{q} & \neq & \sigma\|_{p\mathrel{\&}q} & \qquad\mbox{and also}\qquad & (\sigma\|^{p})\|^{q} & \neq & \sigma\|^{p\mathrel{\&}q}. \end{array}$$* Similarly, in general, quantum conditionings do not commute: $$\begin{array}{rclcrcl} (\sigma\|_{p})\|_{q} & \neq & (\sigma\|_{q})\|_{p} & \qquad\mbox{and}\qquad & (\sigma\|^{p})\|^{q} & \neq & (\sigma\|^{q})\|^{p}. \end{array}$$ Interestingly, the two classical equations in Proposition hold separately for the two kinds of quantum conditioning. Proposition 4. * The 'product' rule holds for lower conditioning and Bayes' rule holds for upper conditioning: $$\label{eqn:quantumbayes} \begin{array}{rclcrcl} \sigma\|_{p} \models q & = & \displaystyle\frac{\sigma\models p\mathrel{\&}q}{\sigma\models p} & \hspace{5em} & \sigma\|^{p} \models q & = & \displaystyle\frac{(\sigma\|^{q}\models p)\cdot(\sigma\models q)}{\sigma\models p}. \end{array}$$ # Quantum channels In order to express the quantum analogues of the equations in Theorem we need the notion of 'channel' in a quantum setting. It exists, and is alternatively often called a quantum operation, see e.g. . There are several variations possible in the requirements, such as just positive or complete positive, unitary or subunitary, normal or not. These variations are not essential for what follows. For a finite-dimensional Hilbert space $\mathscr{H}$ be write $\mathcal{B}(\mathscr{H})$ for the set of linear maps $A\colon \mathscr{H} \rightarrow \mathscr{H}$. Because $\mathscr{H}$ has finite dimension, such $A$ are automatically bounded, or equivalently, continuous. The set of operators $\mathcal{B}(\mathscr{H})$ is in fact a Hilbert space itself, with Hilbert-Schmidt inner product $\ensuremath{\langle A\|B\rangle_{\mathrm{HS}}} = \ensuremath{\mathrm{tr}}(A^{\dag}B)$, where $A^{\dag}$ is the conjugate transpose of $A$, as matrix. Moreover, there are canonical isomorphisms $\mathcal{B}(\mathscr{H}\otimes\mathscr{K}) \cong \mathcal{B}(\mathscr{H}) \otimes \mathcal{B}(\mathscr{K})$ and $\mathcal{B}(\mathbb{C}) \cong \mathbb{C}$. If $\mathscr{K}$ is another finite-dimensional Hilbert space, then a CP-map $\mathscr{H} \rightarrow \mathscr{K}$ is a completely positive linear map $c\colon \mathcal{B}(\mathscr{K}) \rightarrow \mathcal{B}(\mathscr{H})$. Notice the change of direction. This CP-map $c$ is called a channel if it preserves the unit/identity matrix: $c(\ensuremath{\mathrm{id}_{}}) = \ensuremath{\mathrm{id}_{}}$. It may be called subchannel if $c(\ensuremath{\mathrm{id}_{}}) \leq \ensuremath{\mathrm{id}_{}}$. Each CP-map $c\colon \mathcal{B}(\mathscr{K}) \rightarrow \mathcal{B}(\mathscr{H})$ has a 'dagger', written as $c^{\#}\colon \mathcal{B}(\mathscr{H}) \rightarrow \mathcal{B}(\mathscr{K})$, so that $\ensuremath{\langle c(A)\|B\rangle_{\mathrm{HS}}} = \ensuremath{\langle A\|c^{\#}(B)\rangle_{\mathrm{HS}}}$, that is, $\ensuremath{\mathrm{tr}}(c(A)^{\dag}B) = \ensuremath{\mathrm{tr}}(A^{\dag}c^{\#}(B))$. For a channel $c \colon \mathscr{H} \rightarrow \mathscr{K}$ and a predicate (effect) $q$ on $\mathscr{K}$ we define predicate transformation via function application $c \ll q \coloneqq c(q)$. Similarly, for a state $\sigma$ on $\mathscr{H}$ we define state transformation via the dagger of the channel, as: $c \gg \sigma \coloneqq c^{\#}(\sigma)$. Then, using that positive operators are self-adjoint, we get the same relation between validity and state/predicate transformation as in the classical case: $$\label{eqn:quantumvaliditytransformation} % \def\arraystretch{1.3}% \setlength{\arraycolsep}{2pt}% \begin{array}{rcl} c \gg \sigma \models q \hspace{\arraycolsep}=\hspace{\arraycolsep} \ensuremath{\mathrm{tr}}\big(c^{\#}(s)\,q\big) & = & \ensuremath{\mathrm{tr}}\big(c^{\#}(s)\,q^{\dag}\big) \\ & = & \ensuremath{\mathrm{tr}}\big(s\,c(q)^{\dag}\big) \hspace{\arraycolsep}=\hspace{\arraycolsep} \ensuremath{\mathrm{tr}}\big(s\,c(q)\big) \hspace{\arraycolsep}=\hspace{\arraycolsep} s \models c \ll q. \end{array}$$ Definition 2. Let $p$ be a (quantum) predicate on Hilbert space $\mathscr{H}$. It gives rise to a subchannel $\ensuremath{\mathrm{asrt}}_{p} \colon \mathscr{H} \rightarrow \mathscr{H}$ defined by: $$\begin{array}{rcl} \ensuremath{\mathrm{asrt}}_{p}(A) & \coloneqq & \sqrt{p}\,A\,\sqrt{p}. \end{array}$$ This assert map $\ensuremath{\mathrm{asrt}}_p$ plays a fundamental role in effectus theory, see , for instance because it allows us to define sequential conjunction via predicate transformation as $p \mathrel{\&}q = \ensuremath{\mathrm{asrt}}_{p} \ll q$. Remark 4. * States/predicates on $\mathscr{H}$ are special instances of CP-maps $\mathbb{C}\rightarrow \mathscr{H}$, resp. $\mathscr{H}\rightarrow\mathbb{C}$. If we consider them as such channels, we can take their dagger $(-)^{\#}$. Then we can relate upper and lower conditioning via an exchange, namely as: $\sigma\|^{p} = p^{\#}\|_{\sigma^{\#}}$. This re-formulation may be useful in a more general setting.* ## Representation of quantum channels As mentioned, a channel $c\colon \mathscr{H} \rightarrow \mathscr{K}$ is a (completely positive) linear function $\mathcal{B}(\mathscr{K}) \rightarrow \mathcal{B}(\mathscr{H})$ between spaces of operators. Let's assume $\mathscr{H},\mathscr{K}$ have dimensions $n,m$, respectively. The space of operators $\mathcal{B}(\mathscr{K})$ then has dimension $m\times m$, so that the channel $c$ is determined by its values on the $m\times m$ base vectors $\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}\langle\,j\,\|$ of $\mathcal{B}(\mathscr{K})$. Thus, the channel $c$ is determined by $m\times m$ matrices of size $n\times n$, as in: $$\label{diag:channelmatrix} \begin{array}{cl} \left(\begin{array}{ccc} \left(\,\fbox{\strut\rule[-0.5em]{0em}{0em}$n\times n$}\,\right) & \quad\cdots\quad & \left(\,\fbox{\strut\rule[-0.5em]{0em}{0em}$n\times n$}\,\right) \\ \vdots & & \vdots \\ \left(\,\fbox{\strut\rule[-0.5em]{0em}{0em}$n\times n$}\,\right) & \quad\cdots\quad & \left(\,\fbox{\strut\rule[-0.5em]{0em}{0em}$n\times n$}\,\right) \end{array}\right) & \raisebox{+0.7em}{$\underset{\underset{\textstyle\downarrow}{\textstyle m}}{\textstyle\uparrow}$} \\ \leftarrow\!m\!\rightarrow \end{array}$$ The matrix entries of the channel $c$ will be written via double indexing, as $c_{k\ell,ij}$ for $1 \leq k,\ell \leq m$ and $1\leq i,j \leq n$. This matrix representation of a quantum channel is used in EfProb. It is convenient, for instance because parallel composition $\otimes$ of channels can simply be done by Kronecker multiplication of their (outer) matrices . We briefly describe how predicate and state transformation works. Let $q$ be a predicate on $\mathscr{K}$, represented as a $m\times m$ matrix. Predicate transformation $c \ll q$ is done simply by linear extension. It yields an $n\times n$ matrix, forming a predicate on $\mathscr{H}$, via: $$\label{eqn:qpredtransform} \begin{array}{rcl} c \ll q & \coloneqq & \sum_{k,\ell}\, q_{k\ell}\cdot c_{k\ell}. \end{array}$$ In the other direction we do state transformation essentially via the dagger $c^{\#}$ of the channel $c$. Explicitly, it works as follows. Let $\sigma$ be a state of $\mathscr{H}$, represented by a $n\times n$ matrix. Then we obtain the transformed state $c \gg \sigma$ as an $m\times m$ matrix given by computing traces: $$\label{eqn:qstatetransform} \begin{array}{rcl} \big(c \gg \sigma\big)_{k\ell} & \coloneqq & \ensuremath{\mathrm{tr}}(c_{\ell k}\,\sigma). \end{array}$$ Notice the change of order of indices: at position $(k,\ell)$ of $c \gg \sigma$ we use the inner matrix $c_{\ell k}$ from . The reason is the implicit use of the Hilbert-Schmidt inner product, given by $\ensuremath{\langle A\|B\rangle_{\mathrm{HS}}} = \ensuremath{\mathrm{tr}}(A^{\dag}\cdot B)$, where the dagger involves a conjugate transpose. ## Quantum pairing and extraction The pairing of a classical state and a channel in involves a copier $\mathord{\usebox\sbcopier}$. It does not exist in general in a quantum setting because of the 'no-cloning' theorem. But we do have 'cup' states $\cup$ with maximal entanglement. They are basis dependent: given a finite-dimensional Hilbert space $\mathscr{H}$ with orthonormal basis $\big(\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}\big)$ of size $n$, we can for a state $\cup$ of $\mathscr{H}\otimes\mathscr{H}$ as $\cup = \frac{1}{n}\sum_{i,j}\ensuremath{\|{\kern.1em}ii{\kern.1em}\rangle}\langle\,jj\,\|$. Similarly, there is 'cap' predicate $\cap$. The quantum pairing and extraction operations that we describe in this subsection are due to . But the more abstract description in terms of cups and caps does not occur there. These operations depend on a choice of basis. Given a state $\sigma$ of $\mathscr{H}$ and a channel $c\colon \mathscr{H} \rightarrow \mathscr{K}$ we can thus form a joint state of $\mathscr{H}\otimes\mathscr{K}$ via the 'cup' state $\cup$ of $\mathscr{H}\otimes\mathscr{H}$. Then we can define a pair state of $\mathscr{H}\otimes\mathscr{K}$ via state transformation $\gg$ as in: $$\label{eqn:quantumpairing} \begin{array}{rclcrcl} \ensuremath{\mathrm{pair}}(\sigma,c) & \coloneqq & \big(\ensuremath{\mathrm{asrt}}_{\sigma^{T}} \otimes c\big) \gg \cup & \qquad\mbox{that is} \qquad \langle\,ik\,\|\ensuremath{\mathrm{pair}}(\sigma,c)\ensuremath{\|{\kern.1em}j\ell{\kern.1em}\rangle} & = & \overline{\big(\sqrt{\sigma}c_{k\ell}\sqrt{\sigma}\big)_{ij}}. \end{array}$$ In the other direction, given a joint state $\tau$ of $\mathscr{H}\otimes\mathscr{K}$ we write $\ensuremath{\mathrm{proj}}(\tau)$ for the transpose of its first marginal, so: $$\label{eqn:quantumproject} \begin{array}{rclcrcl} \ensuremath{\mathrm{proj}}(\tau) & \coloneqq & \mathsf{M}_{1}(\tau)^{T} & \qquad\mbox{where}\qquad & \langle\,i\,\|\mathsf{M}_{1}(\tau)\ensuremath{\|{\kern.1em}j{\kern.1em}\rangle} & \coloneqq & \sum_{k} \langle\,ik\,\|\tau\ensuremath{\|{\kern.1em}jk{\kern.1em}\rangle}. \end{array}$$ We extract a channel $\ensuremath{\mathrm{extr}}(\tau) \colon \mathscr{H} \rightarrow \mathscr{K}$ from $\tau$ in the manner defined in : $$\label{eqn:quantumextract} \begin{array}{rcl} \ensuremath{\mathrm{extr}}(\tau)_{k\ell} & \coloneqq & \sum_{i,j} \overline{\langle\,ik\,\|\tau\ensuremath{\|{\kern.1em}j\ell{\kern.1em}\rangle}} \cdot \big(\sqrt{\ensuremath{\mathrm{proj}}(\tau)^{-1}}\ensuremath{\|{\kern.1em}i{\kern.1em}\rangle}\langle\,j\,\|\sqrt{\ensuremath{\mathrm{proj}}(\tau)^{-1}}\big). \end{array}$$ The next result is the analogue of Lemma about disintegration for classical discrete probability. Proposition 5 (After ). * A quandum state $\sigma$ and channel $c$, with matching types, can be recovered from their pair, defined in , via projection and extraction $$\begin{array}{rclcrcl} \ensuremath{\mathrm{proj}}(\ensuremath{\mathrm{pair}}(\sigma,c)) & = & \sigma & \hspace{3em}\mbox{and}\hspace{3em} & \ensuremath{\mathrm{extr}}\big(\ensuremath{\mathrm{pair}}(\sigma,c)\big) & = & c. \end{array}$$* Similarly, a joint state $\tau$ for which the transpose of its first marginal $\ensuremath{\mathrm{proj}}(\tau)$, as defined above, is invertible can be recovered as a pair, as on the left below. In addition, $\tau$'s second marginal can be obtained via state transformation, as on the right: $$\begin{array}{rclcrcl} \tau & = & \ensuremath{\mathrm{pair}}(\ensuremath{\mathrm{proj}}(\tau), \ensuremath{\mathrm{extr}}(\tau)) & \hspace{7em} & \mathsf{M}_{2}(\tau) & = & \ensuremath{\mathrm{extr}}(\tau) \gg \ensuremath{\mathrm{proj}}(\tau). \end{array}$$* As an aside, for readers who are comfortable with diagrammatic notation (see e.g. ) one can write: $$\ensuremath{\mathrm{pair}}(\sigma,c) \;\; = \;\; \vcenter{\hbox{% \begin{tikzpicture}[font=\small] \node[arrow box] (a) at (-1,0) {$\ensuremath{\mathrm{asrt}}_{\sigma^T}$}; \node[arrow box] (c) at (0,0) {$c$}; % \draw (0.0,-0.3) to (c); \draw (0,-0.3) arc(0.0:-180:0.5); \draw (-1,-0.3) to (a); \draw (a) to (-1,0.5); \draw (c) to (0,0.5); \end{tikzpicture}}} \hspace{3em} \ensuremath{\mathrm{proj}}(\tau) \;\; = \;\; \vcenter{\hbox{% \begin{tikzpicture}[font=\small] \node[state] (t) at (0,0) {$\;\;\tau\;\;$}; \coordinate (t1) at ([xshiftu=-0.25]t); \coordinate (t2) at ([xshiftu=0.25]t); \node[discarder] (d) at ([yshiftu=0.3]t2) {}; % \draw (t1) to ([yshiftu=0.2]t1); \draw (t2) to (d); \draw (-0.25,0.2) arc(0.0:180:0.4); \draw (-1.05,-0.5) to (-1.05,0.2); \end{tikzpicture}}} \hspace{3em} \ensuremath{\mathrm{extr}}(t) \;\; = \;\; \vcenter{\hbox{% \begin{tikzpicture}[font=\small] \node[state] (t) at (0,0) {$\;\;\tau\;\;$}; \coordinate (t1) at ([xshiftu=-0.25]t); \coordinate (t2) at ([xshiftu=0.25]t); \node[arrow box, minimum height=1.5em] (a) at (-2.05,-0.3) % {\raisebox{0.2em}{$\ensuremath{\mathrm{asrt}}_{\ensuremath{\mathrm{proj}}(\tau)^{-1}}$}}; % \draw (t1) to ([yshiftu=0.1]t1); \draw (t2) to ([yshiftu=0.7]t2); \draw (-0.25,0.1) arc(0.0:180:0.9); \draw (a) to ([yshiftu=+0.5]a); \draw (a) to ([yshiftu=-0.6]a); \end{tikzpicture}}}$$ # A quantum Bayesian Inference Theorem This section contains the main result of this paper, namely the quantum analogue of Theorem . It describes how conditioning of a joint state can also be performed via the extracted channel. The novelty in our quantum description is that we need both lower and upper conditioning to capture what is going on. Theorem 3. * Let $\tau$ be a state of $\mathscr{H}\otimes\mathscr{K}$ and let $p,q$ be predicates, on $\mathscr{H}$ and on $\mathscr{K}$ respectively. Then: $$\begin{array}[b]{rclcrcl} \mathsf{M}_{2}\big(\tau\|_{p \otimes \ensuremath{\mathbf{1}}}\big) & = & \ensuremath{\mathrm{extr}}(\tau) \gg (\ensuremath{\mathrm{proj}}(\tau)\|^{p^{T}}) & \qquad\mbox{and}\qquad & \mathsf{M}_{1}\big(\tau\|_{\ensuremath{\mathbf{1}}\otimes q}\big) & = & \big(\ensuremath{\mathrm{proj}}(\tau)\|^{\ensuremath{\mathrm{extr}}(\tau) \ll q}\big)^{T}. \end{array} \eqno{\square}$$* The proof is ommitted since it involves rather long and boring matrix calculations. Instead we include a random test: the quantum versions of pairing / projection / extraction and lower / upper conditioning have been implemented in EfProb. They can be used to test Theorem as below, by generating an arbitrary state `t`, in this case of type $\mathbb{C}^{3}\otimes \mathbb{C}^{5}$, together with arbitrary (suitably typed) predicates. The EfProb notation for lower and upper conditioning is `/` and `^`. The two equality tests `==` involve $5\times 5$ and $3\times 3$ matrices of complex numbers. In the equations in Theorem we perform lower conditioning on the joint state. One may ask if there are also 'dual' equations where upper conditioning on the joint state is re-described via state/predicate transformation. We have not found them. ## Acknowledgements Thanks to Kenta Cho and Alex Kissinger for helpful feedback and discussions. [^1]: The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement nr. 320571. [^2]: The terminology 'lower' and 'upper' is simply determined by the position of the predicate $p$, low in $\sigma\|_{p}$ and up in $\sigma\|^{p}$.	{ "dup_signals": {}, "filename": "out/1810.02438_extract_lower_upper_conditioning.tex.md" }	arxiv
abstract: Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB image to its hyperspectral image (HSI). These CNN-based methods achieve impressive restoration performance while showing limitations in capturing the long-range dependencies and self-similarity prior. To cope with this problem, we propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In particular, we employ Spectral-wise Multi-head Self-attention (S-MSA) that is based on the HSI spatially sparse while spectrally self-similar nature to compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure to extract multi-resolution contextual information. Finally, our MST++, cascaded by several SSTs, progressively improves the reconstruction quality from coarse to fine. Comprehensive experiments show that our MST++ significantly outperforms other state-of-the-art methods. In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place. Code and pre-trained models are publicly available at <https://github.com/caiyuanhao1998/MST-plus-plus>. author: Yuanhao Cai $^{1,}$, Jing Lin $^{1,}$[^1] , Zudi Lin $^2$, Haoqian Wang $^{1,\dagger}$, Yulun Zhang $^3$, Hanspeter Pfister $^2$, Radu Timofte $^{3,4}$, Luc Van Gool $^{3}$ $^{1}$ Shenzhen International Graduate School, Tsinghua University, $^2$ Harvard University, $^3$ CVL, ETH Zürich, $^4$ CAIDAS, JMU Würzburg bibliography: reference.bib title: MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction # Introduction Hyperspectral imaging records the real-world scene spectra in narrow bands, where each band captures the information at a specific spectral wavelength. Compared to normal RGB images, HSIs have more spectral bands to store richer information and delineate more details of the captured scenes. Because of this advantage, HSIs have wide applications such as medical image processing , remote sensing , object tracking , and so on. Nonetheless, such HSIs with plentiful spectral information is time-consuming that spectrometers are used to scan the scenes along the spatial or spectral dimension. This limitation impedes the application scope of HSIs, especially in dynamic or real-time scenes. One way to solve this problem is to develop snapshot compressive imaging (SCI) systems and computational reconstruction algorithms from 2D measurement to 3D HSI cube. Nevertheless, these methods rely on expensive hardware devices. To reduce costs, spectral reconstruction (SR) algorithms are proposed to reconstruct the HSI from a given RGB image, which can be easily obtained by RGB cameras. Conventional SR methods are mainly based on sparse coding or relatively shallow learning models. Nonetheless, these model-based methods suffer from limited representing capacity and poor generalization ability. Recently, with the development of deep learning, SR has witnessed significant progress. Deep convolutional neural networks (CNNs) have been applied to learn the end-to-end mapping function from RGB images to HSI cubes. Although impressive performance have been achieved, these CNN-based methods show limitations in capturing long-range dependencies and inter-spectra self-similarity. In recent years, the natural language processing (NLP) model, Transformer , has been applied in computer vision and achieved great success. The multi-head self-attention (MSA) mechanism in Transformer does better in modeling long-range dependencies and non-local self-similarity than CNN, which can alleviate the limitations of CNN-based SR algorithms. However, directly using standard Transformer for SR will encounter two main issues. (i)* Global and local Transformer captures inter-actions of spatial regions. Yet, the HSI representations are spatially sparse while spectrally highly self-similar. Thus, modeling spatial inter-dependencies may-be less cost-effective than capturing inter-spectra correlations. (ii) On the one hand, the computational complexity of standard global MSA is quadratic to the spatial dimension, which is a huge burden that may be unaffordable. On the other hand, local window-based MSA suffers from limited receptive fields within position-specific windows. To address the aforementioned limitations, we propose the first Transformer-based framework, Multi-stage Spectral-wise Transformer (MST++) for efficient spectral reconstruction from RGB images. Notely, our MST++ is based on the prior work MST , which is customized for spectral compressive imaging restoration. Firstly, we note that HSI signals are spatially sparse while spectrally self-similar. Based on this nature, we adopt the Spectral-wise Multi-head Self-Attention (S-MSA) to compose the basic unit, Spectral-wise Attention Block (SAB). S-MSA treats each spectral feature map as a token to calculate the self-attention along the spectral dimension. Secondly, our SABs build up our proposed Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure to extract multi-resolution spectral contextural information which is critical for HSI restoration. Finally, our MST++, cascaded by several SSTs, develops a multi-stage learning scheme to progressively improve the reconstruction quality from coarse to fine, which significantly boosts the performance. The main contributions of this work are listed as follow. - We propose a novel framework, MST++, for SR. To the best of our knowledge, it is the first attempt to explore the potential of Transformer in this task. - We validate a series of natural image restoration models on this SR task. Toward them, we propose a Top-K multi-model ensemble strategy to improve the SR performance. Codes and pre-trained models of these methods are made publicly available to serve as a baseline and toolbox for further research in this topic. - Quantitative and qualitative experiments demonstrate that our MST++ dramatically outperforms SOTA methods while requiring much cheaper Params and FLOPS. Surprisingly, our MST++ won the First place in NTIRE 2022 Spectral Reconstruction Challenge . # Related Work ## Hyperspectral Image Aquisition Traditional imaging systems for collecting HSIs often adopt spectrometers to scan the scene along the spatial or spectral dimensions. Three main types of scanners including whiskbroom scanner, pushroom scanner, and band sequential scanner are often used to capture HSIs. These scanners have been widely used in detecting, remote sensing, medical imaging, and environmental monitoring for decades. For example, pushbroom scanner and whiskbroom scanner have been used in satellite sensors for photogrammetric and remote sensing. However, the scanning procedure usually requires a long time, which makes it unsuitable for measuring dynamic scenes. Besides, the imaging devices are usually too large physically to be plugged in portable platforms. To address these limitations, researchers have developed SCI systems to capture HSIs, where the 3D HSI cube is compressed into a single 2D measurement . Among these SCI systems, coded aperture snapshot spectral imaging (CASSI) stands out and forms one promising research direction. Nonetheless, the SCI systems remain prohibitively expensive to date for consumer grade use. Even "low-cost" SCI systems are often in the \$ 10K - \$ 100K. Therefore, the SR topic has significant research and practical value. ```latex \begin{figure}[t]\begin{center} \begin{tabular}[t]{c} \hspace{-3mm} \includegraphics[width=1.0\textwidth]{img/pipeline.pdf} \end{tabular} \end{center} %\vspace{-3mm} \caption{\small The overall pipeline of the proposed solution MST++. (a) Multi-stage Spectral-wise Transformer. (b) Single-stage Spectral-wise Transformer. (c) Spectral-wise Attention Block. (d) Feed Forward Network. (e) Spectral-wise Multi-head Self-Attention.} \label{fig:pipeline} %\vspace{0mm} \end{figure} ``` ## Spectral Reconstruction from RGB Conventional SR methods are mainly based on hand-crafted hyperspectral priors. For instance, Paramar et al. propose a data sparsity expending method for HSI reconstruction. Arad et al. propose a sparse coding method that create a dictionary of HSI signals and their RGB projections. Aeschbacher et al. suggest using relatively shallow learning models from a specific spectral prior to fulfill spectral super-resolution. However, these model-based methods suffer from limited representing capacities and poor generalization ability. Recently, inspired by the great success of deep learning in natural image restoration , CNNs have been exploited to learn the underlying mapping function from RGB to HSI . For instance, Xiong et al. propose a unified HSCNN framework for HSI reconstruction from both RGB images and compressive measurements. Shi et al. use adapted residual blocks to build up a deep residual network HSCNN-R for SR. Zhang et al. customize a pixel-aware deep function-mixture network consisting to model the RGB-to-HSI mapping. However, these CNN-based SR methods achieve impressive results but show limitations in capturing non-local self-similarity and long-range inter-dependencies. ## Vision Transformer The NLP model Transformer is proposed for machine translation. In recent years, it has been introduced into computer vision and gained much popularity due to its advantage in capturing long-range correlations between spatial regions. In high-level vision, Transformer has been widely applied in image classification , object detection , semantic segmentation , human pose estimation , etc. In addition, vision Transformer has also been used in low-level vision . For instance, Cai et al. propose the first Transformer-based end-to-end framework MST for HSI reconstruction from compressive measurements. Lin et al. embed the HSI sparsity into Transformer to establish a coarse-to-fine learning scheme for spectral comrpessive imaging. The prior work Uformer adopts a U-shaped structure built up by Swin Transformer blocks for natural image restoration. Nonetheless, to the best of our knowledge, the potential of Transformer in spectral super-resolution has not been explored. This work aims to fill this research gap. # Method ## Network Architecture As shown in Fig. , (a) depicts the proposed Multi-stage Spectral-wise Transformer (MST++), which is cascaded by $N_s$ Single-stage Spectral-wise Transformers (SSTs). Our MST++ takes a RGB image as the input and reconstructs its HSI counterpart. A long identity mapping is exploited to ease the training procedure. Fig. (b) shows the U-shaped SST consisting of an encoder, a bottleneck, and a decoder. The embedding and mapping block are single $conv$`<!-- -->`{=html}3$\times$`<!-- -->`{=html}3 layers. The feature maps in the encoder sequentially undergo a downsampling operation (a strided $conv$`<!-- -->`{=html}4$\times$`<!-- -->`{=html}4 layer), $N_1$ Spectral-wise Attention Blocks (SABs), a downsampling operation, and $N_2$ SABs. The bottleneck is composed of $N_3$ SABs. The decoder employs a symmetrical architecture. The upsampling operation is a strided deconv2$\times$`<!-- -->`{=html}2 layer. To avoid the information loss in the downsampling, skip connections are used between the encoder and decoder. Fig. (c) illustrates the components of SAB, i.e., a Feed Forward Network (FFN as shown in Fig. (d) ), a Spectral-wise Multi-head Self-Attention (S-MSA), and two layer normalization. Details of S-MSA are given in Fig. (e). ## Spectral-wise Multi-head Self-Attention Suppose $\mathbf{X}_{in} \in \mathbb{R}^{H\times W \times C}$ as the input of S-MSA, which is reshaped into tokens $\mathbf{X} \in \mathbb{R}^{HW \times C}$. Then $\mathbf{X}$ is linearly projected into query $\mathbf{Q} \in \mathbb{R}^{HW \times C}$, key $\mathbf{K} \in \mathbb{R}^{HW \times C}$, and value $\mathbf{V} \in \mathbb{R}^{HW \times C}$: $$%\small \mathbf{Q} = \mathbf{X}\mathbf{W}^\mathbf{Q}, \mathbf{K} = \mathbf{X}\mathbf{W}^\mathbf{K}, \mathbf{V} = \mathbf{X}\mathbf{W}^\mathbf{V}, \label{linear_proj} \vspace{0.2mm}$$ where $\mathbf{W}^\mathbf{Q}$, $\mathbf{W}^\mathbf{K}$, and $\mathbf{W}^\mathbf{V} \in \mathbb{R}^{C \times C}$ are learnable parameters; $biases$ are omitted for simplification. Subsequently, we respectively split $\mathbf{Q}$, $\mathbf{K}$, and $\mathbf{V}$ into $N$ heads along the spectral channel dimension: $\mathbf{Q} = [\mathbf{Q}_1,\ldots,\mathbf{Q}_N]$, $\mathbf{K} = [\mathbf{K}_1,\ldots,\mathbf{K}_N]$, and $\mathbf{V} = [\mathbf{V}_1,\ldots,\mathbf{V}_N]$. The dimension of each head is $d_h = \frac{C}{N}$. Please note that Fig. (e) depicts the situation with $N$ = 1 and some details are omitted for simplification. Different from original MSAs, our S-MSA treats each spectral representation as a token and calculates self-attention for $head_j$: $$\mathbf{A}_j = \text{softmax}(\sigma_j \mathbf{K}_j^\text{T} \mathbf{Q}_j), ~~{head}_j =\mathbf{V}_j \mathbf{A}_j, \label{s-attention} \vspace{-0.2mm}$$ where $\mathbf{K}_j^\text{T}$ denotes the transposed matrix of $\mathbf{K}_j$. Because the spectral density varies significantly with respect to the wavelengths, we use a learnable parameter $\sigma_j \in \mathbb{R}^{1}$ to adapt the self-attention $\mathbf{A}_j$ by re-weighting the matrix multiplication $\mathbf{K}_j^\text{T} \mathbf{Q}_j$ inside $head_j$. Subsequently, the outputs of $N$ heads are concatenated to undergo a linear projection and then is added with a position embedding: $$\text{S-MSA}(\mathbf{X}) =\big(\mathop{\text{Concat}}\limits_{j=1}^{N}(head_{j})\big)\mathbf{W} + f_p(\mathbf{V}), \label{agg_heads} %\vspace{-1.2mm}$$ where $\mathbf{W} \in \mathbb{R}^{C \times C}$ are learnable parameters, $f_p(\cdot)$ is the function to generate position embedding. It consists of two depth-wise conv3$\times$`<!-- -->`{=html}3 layers, a GELU activation, and reshape operations. The HSIs are sorted by the wavelength along the spectral dimension. Therefore, we exploit this embedding to encode the position information of different spectral channels. Finally, we reshape the result of Eq. to obtain the output feature maps $\mathbf{X}_{out} \in \mathbb{R}^{H\times W \times C}$. ```latex \begin{figure}[t]\begin{center} \begin{tabular}[t]{c} \hspace{-2mm} \includegraphics[width=0.97\textwidth]{img/msa_compare.pdf} \end{tabular} \end{center} \vspace{-5mm} \caption{\small Diagram of different MSAs. The dark colored box represents $query$ element and the dashed box denotes $key$ element. (a) Global MSA samples all the tokens (pixel vectors) as $query$ and $key$ elements. (b) W-MSA calculates the self-attention inside position-specific windows. (c) The adopted S-MSA treats each spectral channel as a token and calculates the self-attention along the spectral dimension. } \label{fig:attan_compare} \vspace{-1mm} \end{figure} ``` ## Discussion with Original Transformers In this section, we introduce the general paradigm of MSA in Transformer and then we analyze the computational complexity of the spatial-wise MSAs in original Transformers and the adopted S-MSA. ### General Paradigm of MSA We denote the input token as $\mathbf{X} \in \mathbb{R}^{n\times C}$, where $n$ is to be determined. In spatial-wise MSAs, $n$ denotes the number of tokens. In S-MSA, $n$ represents the dimension of the token. $\mathbf{X}$ is firstly linearly projected into query $\mathbf{Q} \in \mathbb{R}^{n\times C}$, key $\mathbf{K} \in \mathbb{R}^{n\times C}$, and value $\mathbf{V} \in \mathbb{R}^{n\times C}$: $$\mathbf{Q} = \mathbf{X} \mathbf{W^Q}, \mathbf{K} = \mathbf{X} \mathbf{W^K}, \mathbf{V} = \mathbf{X} \mathbf{W^V}, \label{eq:linear_proj}$$ where $\mathbf{W^Q},\mathbf{W^K}$, and $\mathbf{W^V} \in \mathbb{R}^{C\times C}$ are learnable parameters; biases are omitted for simplification. Subsequently, we respectively split $\mathbf{Q}$, $\mathbf{K}$, and $\mathbf{V}$ into $N$ heads along the spectral channel dimension: $\mathbf{Q} = [\mathbf{Q}_1,\ldots,\mathbf{Q}_N]$, $\mathbf{K}=[\mathbf{K}_1,\ldots,\mathbf{K}_N]$, and $\mathbf{V}=[\mathbf{V}_1,\ldots,\mathbf{V}_N]$. The dimension of each head is $d_h=\frac{C}{N}$. Then MSA calculates the self-attention for each $head_j$: $$head_j = \text{MSA}(\mathbf{Q}_j,\mathbf{K}_j,\mathbf{V}_j). \label{eq:msa}$$ Subsequently, the outputs of $N$ $\emph{heads}$ are concatenated along the spectral dimension and undergo a linear projection to generate the output feature map $\textbf{X}_{out} \in \mathbb{R}^{n\times C}$: $$\textbf{X}_{out} = \big(\mathop{\text{Concat}}\limits_{j=1}^{N}(head_{j})\big)\mathbf{W}, \label{eq:msa_out}$$ where $\mathbf{W} \in \mathbb{R}^{C\times C}$ are learnable parameters. Please note that some other contents such as the position embedding are omitted for simplification. Because we only compare the main difference between original spatial-wise MSAs and S-MSA, i.e., the specific formulation of Eq. . ```latex \scalebox{0.90}{ \hspace{-1.5mm} \begin{tabular}{l c c c} \toprule \rowcolor{color3} MSA Scheme &~ Global MSA ~&~ Local W-MSA ~&~\bf S-MSA ~\\ \midrule Receptive Field &Global &Local &Global \\ Complexity to $HW$ &Quadratic &Linear &Linear \\ Calculating Wise &Spatial &Spatial &Spectral \\ \bottomrule \end{tabular}} ``` ### Spatial-wise MSA The spatial-wise MSA treats a pixel vector along the spectral dimension as a token and then calculates the self-attention for each $head_j$. Thus, Eq. can be specified as $$head_j = \mathbf{A}_j \mathbf{V}_j, ~~\mathbf{A}_j = \text{softmax}(\frac{\mathbf{Q}_j\mathbf{K}_j^T}{\sqrt{d_h}}). \label{eq:spatial_msa}$$ Eq. neads to be calculated for $N$ times. Therefore, the computational complexity of spatial-wise MSA is $$O(\text{Spatial-MSA}) = N (n^2 d_h + n^2 d_h) = 2n^2 C. \label{eq:cost_spatial_msa}$$ The spatial-wise MSA is mainly divided into two categories: global MSA and local window-based MSA . Now we analyze these two kinds of MSAs. Global MSA. As shown in Fig. (a), global MSA samples all the tokens as $key$ and $query$ elements, and then calculates the self-attention. Thus, the number of tokens $n$ ($key$ or $query$ elements) is equal to $HW$. Then, according to Eq. , the computational complexity of global MSA is $$O(\text{Global MSA}) = 2(HW)^2 C, \label{eq:global_msa}$$ which is quadratic to the spatial size of the input feature map. Global MSA enjoys a very large receptive field but its computational cost is nontrivial and sometimes unaffordable. Meanwhile, sampling redundant $key$ elements may easily lead to over-smooth results and even non-convergence issue . To cut down the computational cost, researchers propose local window-based MSA. Window-based MSA. As depicted in Fig. (b), W-MSA firstly splits the feature map into non-overlapping windows at size of $M^2$ and samples all the tokens inside each window to calculate self-attention. Hence, the number of tokens $n$ is equal to $M^2$ and W-MSA is conducted $\frac{HW}{M^2}$ times for all windows. Thus, the computational complexity is $$O(\text{W-MSA}) = \frac{HW}{M^2}(2(M^2)^2C) = 2M^2HWC, \label{eq:cost_w_msa}$$ which is linear to the spatial size ($HW$). W-MSA enjoys low computational cost but suffers from limited receptive fields inside position-specific windows. As a result, some highly related non-local tokens may be neglected. Original spatial-wise MSAs aim to capture the long-range dependencies of spatial regions. However, the HSI representations are spatially sparse while spectrally similar and correlated. Capturing spatial-wise interactions may be less cost-effective than modeling the spectral-wise correlations. Based on this HSI characteristic, we adopt S-MSA. ### S-MSA As shown in Fig. (b), S-MSA treats each spectral feature map as a token and calculates the self-attention along the spectral dimension. Then Eq. is specified as $$\mathbf{A}_j = \text{softmax}(\sigma_j \mathbf{K}_j^\text{T} \mathbf{Q}_j), ~~{head}_j =\mathbf{V}_j \mathbf{A}_j, \label{eq:s_msa}$$ where $\mathbf{K}_j^\text{T}$ denotes the transposed matrix of $\mathbf{K}_j$. We note that the spectral density varies significantly with respect to the wavelengths. Therefore, we exploit a learnable parameter $\sigma_j \in \mathbb{R}^{1}$ to adapt the self-attention $\mathbf{A}_j$ by re-weighting the matrix multiplication $\mathbf{K}_j^\text{T} \mathbf{Q}_j$ inside $head_j$. Because S-MSA treats a whole feature map as a token, the dimension of each token $n$ is equal to $HW$. Eq. needs to be calculated $N$ times. Thus, the complexity of S-MSA is $$O(\text{S-MSA}) = N (d_h^2 n + d_h^2 n) = \frac{2HWC^2}{N}. \label{eq:cost_Spectra_msa}$$ The computational complexity of W-MSA and S-MSA are linear to the spatial size ($HW$), which is much cheaper than that of global MSA (quadratic to $HW$). Nonetheless, S-MSA treats each spectral feature as a token. When calculating the self-attention $\mathbf{A}_j$, S-MSA views the global representations and $\mathbf{A}_j$ functions as global spatial positions. Therefore, the receptive fields of S-MSA are global and not limited to the position-specific windows. In addition, S-MSA calculates self-attention along the spectral dimension, which is based on HSI characteristics and more suitable for HSI reconstruction when compared to spatial-wise MSAs. Thus, S-MSA is considered to be more cost-effective than global MSA and W-MSA. For brevity, we summarize the properties of global MSA, window-based MSA, and S-MSA in Tab. . S-MSA enjoys global receptive fields, models the spectral-wise self-similarity, and requires linear computational costs. ## Ensemble Strategy In NTIRE 2022 Spectral Reconstruction Challenge, we adopt three ensemble strategies including self-ensemble, multi-scale ensemble, and Top-K multi-model ensemble to improve the performance and generality of our MST++. Now in this part, we describe them in details. ### Self-Ensemble The RGB input is flipped up/down/left/right or rotated 90°/180°/270° to be fed into the network. Subsequently, the outputs are transformed to the original state to be averaged. ### Multi-scale Ensemble We respectively train our models with patches at size of 256$\times$`<!-- -->`{=html}256, 128$\times$`<!-- -->`{=html}128, and 64$\times$`<!-- -->`{=html}64. Then the outputs (whole images) are averaged to improve the restoration quality. ### Top-K Multi-model Ensemble We also train MIRNet , MPRNet , Restormer , HINet , and MST families. The Top-K performers are selected for SR. Then we conduct our Top-K multi-model ensemble to fuse these reconstructed HSIs as $$\mathbf{Y}_{ens} = \sum_{i=1}^{\text{K}} \alpha_i \mathbf{\hat{Y}}^t_i,$$ where $\mathbf{Y}_{ens} \in \mathbb{R}^{H\times W\times N_{\lambda}}$ denotes the ensembled HSIs, $\mathbf{\hat{Y}}^t_i$ represents the reconstructed HSIs of the $i$-th model, and $\alpha_i$ represents hyperparameter satisfying $\sum_{i=1}^{\text{K}} \alpha_i = 1$. # Experiment ## Dataset The dataset provided by NTIRE 2022 Spectral Reconstruction Challenge contains 1000 RGB-HSI pairs. This dataset is split into `train`, `valid`, and `test` subsets in proportional to 18:1:1. Each HSI at size of 482$\times$`<!-- -->`{=html}512 has 31 wavelengths from 400 nm to 700 nm. To generate the corresponding RGB counterpart $\mathbf{I} \in \mathbb{R}^{H\times W\times 3}$, a transformation matrix $\mathbf{M} \in \mathbb{R}^{N_{\lambda}\times3}$ is applied to the ground-truth HSI cube $\mathbf{Y} \in \mathbb{R}^{H\times W\times N_{\lambda}}$ as $$\mathbf{I} = \mathbf{Y} \times \mathbf{M}.$$ Then the generated RGB images are injected with shot noise to simulate the real-camera situation. ## Implementation Details During the training procedure, RGB images are linearly rescaled to \[0, 1\], after which $128\times 128$ RGB and HSI sample pairs are cropped from the dataset. The batch size is set to 20 and the parameter optimization algorithm chooses Adam modification with $\beta_1=0.9$ and $\beta_2=0.999$. The learning rate is initialized as 0.0004 and the Cosine Annealing scheme is adopted for 300 epochs. The training data is augmented with random rotation and flipping. The proposed MST++ has been implemented on the Pytorch framework and approximately 48 hours are required for training a network on a single RTX 3090 GPU. MRAE loss function between the predicted and ground-truth HSI is adopted as the objective. In the implementation of our MST++, we set $N_s$ = 3, $N_1$ = $N_2$ = $N_3$ = 1, $C$ = 31. During the testing phase, the RGB image is also linearly rescaled to \[0, 1\] and fed into the network to fulfill the spectral recovery. Our MST++ takes 102.48 ms for per image (size 482$\times$`<!-- -->`{=html}512$\times$`<!-- -->`{=html}3) reconstruction on an RTX 3090 GPU. We adopt three evaluation metrics to assess the model performance. The first metric is mean relative absolute error (MRAE) that computes the pixel-wise disparity between all wavelengths of the reconstructed and ground-truth HSIs. MRAE can be formulated as $$\text{MRAE}(\mathbf{Y},\mathbf{\hat{Y}}) = \frac{1}{N} \sum_{i=1}^{N} \frac{\big\|~\mathbf{Y}[i] - \mathbf{\hat{Y}}[i]~\big\|}{\mathbf{Y}[i]},$$ where $\mathbf{\hat{Y}} \in \mathbb{R}^{H\times W\times N_{\lambda}}$ indicates the reconstructed HSI cube and $N = H\times W\times N_{\lambda}$ denotes the number of all values on the image. The second metric is the root mean square error (RMSE) that is defined as $$\text{RMSE}(\mathbf{Y},\mathbf{\hat{Y}}) = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \big(\mathbf{Y}[i] - \mathbf{\hat{Y}}[i]\big)^2}.$$ Since the deciding metric for the NTIRE 2022 Spectral Reconstruction Challenge is MRAE, we directly set it as the training objective for our SR models. The last metric is the Peak Signal-to-Noise Ratio (PSNR). ```latex \begin{table} \begin{center} %\vspace{-1mm} \setlength{\tabcolsep}{5.9pt} \resizebox{0.97\textwidth}{26mm}{\noindent \begin{tabular}{l \| c c c c c \| l \| c c} \toprule[0.15em] \multicolumn{6}{c\|}{NTIRE 2022 HSI Dataset - \texttt{Valid}} &\multicolumn{3}{c}{NTIRE 2022 HSI Dataset - \texttt{Test}}\\ \midrule Method~~~~~~~~ & ~~Params (M)~~ & ~~FLOPS (G)~~ &~~~~~MRAE~~~~~ & ~~~~~RMSE~~~~~ &~~~~~~PSNR~~~~~~ &Username~~~~ &~~~~MRAE~~~~ &~~~~RMSE~~~~ \\ \midrule[0.15em] HSCNN+~\cite{shi2018hscnn} & 4.65 & 304.45 & 0.3814 & 0.0588 &26.36 &pipixia &0.2434 &0.0411 \\ HRNet~\cite{orange_cat} & 31.70 & 163.81 & 0.3476 & 0.0550 &26.89 &uslab &0.2377 &0.0391 \\ EDSR~\cite{edsr} & 2.42 & 158.32 & 0.3277 & 0.0437 &28.29 &orange\_dog &0.2377 &0.0376 \\ AWAN~\cite{awan} &4.04 &270.61 &0.2500 &0.0367 &31.22 &askldklasfj &0.2345 &0.0361 \\ HDNet~\cite{hdnet} &2.66 &173.81 &0.2048 &0.0317 &32.13 &HSHAJii &0.2308 &0.0364 \\ HINet~\cite{hinet} &5.21 &31.04 &0.2032 &0.0303 &32.51 &ptdoge\_hot &0.2107 &0.0365 \\ MIRNet~\cite{mirnet} &3.75 &42.95 &0.1890 & 0.0274 &33.29 &test\_pseudo &0.2036 &0.0324 \\ Restormer~\cite{restormer} &15.11 &93.77 &0.1833 &0.0274 &33.40 &gkdgkd & 0.1935 &0.0322 \\ MPRNet~\cite{mprnet} &3.62 &101.59 &0.1817 &0.0270 &33.50 &deeppf & 0.1767 &0.0322 \\ MST-L~\cite{mst} &2.45 &32.07 & 0.1772 &0.0256 &33.90 &mialgo\_ls & 0.1247 &0.0257 \\ \midrule \textbf{MST++} & \textbf{1.62} & \textbf{23.05} & \textbf{0.1645} & \textbf{0.0248} & \textbf{34.32} &\textbf{MST++} & \textbf{0.1131} & \textbf{0.0231} \\ \bottomrule[0.15em] \end{tabular}} \caption{Comparisons with SOTA methods on NTIRE 2022 HSI datasets (\texttt{valid} and \texttt{test}). * represents using ensembled models. } \label{tab:valid} \end{center}\vspace{-3mm} \end{table} ``` ```latex \begin{figure}[t]\begin{center} \begin{tabular}[t]{c} \hspace{-4.8mm} \includegraphics[width=1\textwidth]{img/SR_scene1.pdf} \end{tabular} \end{center} \vspace{-3mm} \caption{\small Reconstructed HSI comparisons of \emph{Scene} \texttt{ARAD\_1K\_0922} with 4 out of 31 spectral channels. 9 SOTA algorithms and our MST++ are included. The spectral curves (bottom-left) are corresponding to the selected green box of the RGB image. Please zoom in.} \label{fig:simulation} %\vspace{-4.5mm} \end{figure} ``` ```latex \begin{table}[t] \subfloat[\small Ablation study of different self-attention mechanisms.\label{tab:attention}]{\vspace{2mm} \scalebox{0.80}{ \begin{tabular}{l c c c c c} \toprule[0.15em] Method &~~Baseline~~ &~~SW-MSA~~ &~~W-MSA~~ &~~G-MSA~~ &\bf~~S-MSA~~\\ \midrule MRAE &0.3177 &0.2839 &0.2624 &0.1821 &\bf 0.1645 \\ RMSE &0.0453 &0.0399 &0.0375 &0.0271 &\bf 0.0248 \\ Params (M) &1.30 &1.60 &1.60 &1.60 &1.62 \\ FLOPS (G) &17.68 &24.10 &24.10 &25.11 &23.05 \\ \bottomrule[0.15em] \end{tabular}}}\hspace{2mm} % subfloat b - mask representation \subfloat[\small Ablation study of stage number $N_s$. \label{tab:stage}]{\vspace{2mm} \scalebox{0.80}{ \begin{tabular}{l c c c c} %\small \toprule[0.15em] $N_s$ &1 &2 &3 &4 \\ \midrule MRAE &~~0.1761~~ &~~0.1716~~ &\bf ~~0.1645~~ &~~0.1711~~ \\ RMSE &0.0266 &0.0269 &\bf 0.0248 &0.0265 \\ %\checkmark & &\checkmark &34.02 &0.930 &0.76 &10.02 \\ Params (M) &0.55 &1.08 &1.62 &2.16 \\ FLOPS (G) &8.10 &15.57 &23.05 &30.52 \\ \bottomrule[0.15em] \end{tabular}}} \vspace{-1mm} \caption{\small Ablations. We train models on the \texttt{train} set and test on the \texttt{valid} set. MRAE, RMSE, Params, and FLOPS are reported.} \label{tab:ablations}\vspace{2mm} \end{table} ``` ```latex \begin{figure}[t]\begin{center} \begin{tabular}[t]{c} \hspace{-2.7mm} %\vspace{-6mm} \includegraphics[width=1\textwidth]{img/SR_scene2.pdf} \end{tabular} \end{center} \vspace{-5mm} \caption{\small Reconstructed HSI comparisons of \emph{Scene} \texttt{ARAD\_1K\_0924} with 4 out of 31 spectral channels. 9 SOTA algorithms and our MST++ are included. The spectral curves (bottom-left) are corresponding to the selected green box of the RGB image. Please zoom in.} \label{fig:real} %\vspace{-2mm} \end{figure} ``` ## Main Results ### Quantitative Results on `Valid` Set We compare our MST++ with SOTA methods including two SCI reconstruction methods (MST and HDNet ), three SR algorithms (HSCNN+ , AWAN and HRNet ), and five natural image restoration models (MIRNet , MPRNet , Restormer , HINet , EDSR ) on the `valid` set. Please note that HSCNN+ , AWAN and HRNet are the winners of NTIRE 2018 and 2020 Spectral Reconstruction Challenges. The results are listed in Tab. . Our MST++ significantly outperforms SOTA methods by a large margin while requiring the least Params and FLOPS. For instance, our MST++ achieves 3.10, 7.43, and 7.96 dB improvement in PSNR while only requiring 40.10% (1.62 / 4.04), 5.11%, 34.84% Params and 8.52% (23.05 / 270.61), 14.07%, 7.57% FLOPS when compared to AWAN, HRNet, and HSCNN+. To intuitively show the superiority of MST++, we provide PSNR-Params-FLOPS comparisons of different algorithms in Fig. . The vertical axis is PSNR (performance), the horizontal axis is FLOPS (computational cost), and the circle radius is Params (memory cost). It can be seen that our MST++ takes up the top-left corner, exhibiting the extreme efficiency advantages of our method. ### Quantitative Results on `Test` Set Tab. lists the top-12 leaders of NTIRE 2022 Spectral Challenge (`test` set), where \ indicates using ensembled models. Impressively, our method won the championship out of 231 participants, suggesting the superiority of our MST++. ### Qualitative Results Fig. and compares the reconstructed HSIs with 4 out of 31 spectral channels of nine SOTA methods and our MST++ on the `valid` set. Please zoom in for a better view. The top-left part depicts the input RGB image. The right part shows the reconstructed HSI patches of the selected yellow boxes in RGB image. It can be observed that previous methods show limitations in HSI detail restoration. They either achieve over-smooth HSIs sacrificing fine-grained contents and structural details, or introduce unpleasing artifacts and blotchy textures. By contrast, MST++ does better in producing perceptually-pleasing and sharp-edge HSIs, and preserving the spatial smoothness of the homogeneous regions. This is mainly because our MST++ excels at modeling inter-spectra self-similarity and dependencies. Besides, the bottom-left part exhibits the spectral density curves corresponding to the picked region of the green box in the RGB image. The highest correlation and coincidence between our curve and the ground truth verify the spectral-wise consistency restoration effectiveness of MST++. ## Ablation Study we use the `valid` subset to conduct ablations. The baseline model is derived by removing S-MSA from MST++. ### Self-Attention Mechanism We have discussed different self-attention mechanisms in Sec. . In this part, we conduct ablation studies to verify the performance of these MSAs including global MSA (G-MSA) , local window-based MSA (W-MSA) , Swin MSA (SW-MSA) , and the adopted S-MSA . The results are reported in Tab. . For fairness, the Params of models using different MSAs are set to the same value. Notely, the input feature of G-MSA is downscaled into $\frac{1}{4}$ size to avoid out of memory. It can be observed that our adopted S-MSA achieves the most significant improvement while requiring the least memory and computational costs. To be specific, when we respectively apply SW-MSA, W-MSA, G-MSA, and S-MSA, the performance is improved by 0.0338, 0.0553, 0.1356, and 0.1532 in MRAE while increasing 6.42, 6.42, 7.43, and 5.37 GFLOPS. As analyzed in Sec. , these results mainly stem from the HSI spatially sparse while spectrally self-similar nature. Thus, capturing inter-spectra dependencies is more cost-effective than modeling correlations of spatial regions. ### Stage Number We change the stage number $N_s$ of MST++ to investigate its effect. The results are shown in Tab. . When $N_s$ = 3, the performance achieves its peak. Therefore, we finally adopt 3-stage MST++ as our SR model. ### Ensemble Strategy In Sec. , we adopt three ensemble strategies for NTIRE 2022 Spectral Reconstruction Challenge. In this part, we perform ablations to study their effects. On the `valid` set, self-ensemble, multi-scale ensemble, and Top-K (K is set to 5) multi-model ensemble respectively achieve improvements by 0.015, 0.033, and 0.045 in terms of MRAE. # Future Work Until now, there has not been a low-cost high-accuracy open-source baseline for SR research. Our MST++ aims to fill this gap. Moreover, all the source code and pre-trained models in Tab. (`valid`) including 11 SOTA methods are made publicly available. Our goal is to provide a model zoo and toolbox to benefit the community. # Conclusion In this paper, we propose the first Transformer-based framework, MST++, for spectral reconstruction from RGB. Based on the HSI spatially sparse while spectrally self-similar nature, we adopt S-MSA that treats each spectral feature map as a token for self-attention calculation to compose the basic unit SAB. Then SABs build up SST. Eventually, our MST++ is cascaded by several SSTs. Enjoying a multi-stage learning scheme, MST++ progressively improves the reconstruction quality from coarse to fine. Quantitative and qualitative experiments demonstrate that our MST++ dramatically surpasses SOTA methods while requiring cheaper memory and computational costs. Impressively, our MST++ won the First place in the NTIRE 2022 Challenge on Spectral Reconstruction from RGB. Acknowledgements: This work is partially supported by the NSFC fund (61831014), the Shenzhen Science and Technology Project under Grant (ZDYBH201900000002, CJGJZD20200617102601004), the Westlake Foundation (2021B1501-2). Zudi Lin and Hanspeter Pfister acknowledge the support from NSF award IIS-2124179 and Google Cloud research credits. [^1]: Equal Contribution, $\dagger$ Corresponding Author	{ "dup_signals": {}, "filename": "out/2204.07908_extract_MST++.tex.md" }	arxiv
"abstract: In this paper, we investigate the large-time behavior for a slightly modified version of (...TRUNCATED)	{ "dup_signals": {}, "filename": "out/2212.05649_extract_LangevinII.tex.md" }	arxiv
"abstract: An algorithm is given for finding the solutions to 3SAT problems. The algorithm uses Bien(...TRUNCATED)	{ "dup_signals": {}, "filename": "out/1810.00875.tex.md" }	arxiv
"abstract: The remarkable success of deep learning has prompted interest in its application to medic(...TRUNCATED)	{ "dup_signals": {}, "filename": "out/2205.04766_extract_main.tex.md" }	arxiv
"abstract: We report a Spitzer/IRAC search for infrared excesses around white dwarfs, including 14 n(...TRUNCATED)	{ "dup_signals": {}, "filename": "out/1109.4207_extract_Spitzer7_u.tex.md" }	arxiv
"abstract: A projected Gromov-Witten variety is the union of all rational curves of fixed degree tha(...TRUNCATED)	{ "dup_signals": {}, "filename": "out/1312.2468_extract_projgw.tex.md" }	arxiv

End of preview.

TxT360: A Top-Quality LLM Pre-training Dataset Requires the Perfect Blend

We introduce TxT360 (Trillion eXtracted Text) the first dataset to globally deduplicate 99 CommonCrawl snapshots and 14 commonly used non-web data sources (e.g. FreeLaw, PG-19, etc.) providing pretraining teams with a recipe to easily adjust data weighting, obtain the largest high-quality open source dataset, and train the most performant models.

TxT360 Compared to Common Pretraining Datasets

Data Source	TxT360	FineWeb	RefinedWeb	PedPajamaV2	C4	Dolma	RedPajamaV1	The Pile
CommonCrawl Snapshots	99	96	90	84	1	24	5	0.6% of 74
Papers**	5 Sources	-	-	-	-	1 Source	1 Source	4 Sources
Wikipedia	310+ Languages	-	-	-	-	Included	Included	English Only
FreeLaw	Included	-	-	-	-	-	-	Included
DM Math	Included	-	-	-	-	-	-	Included
USPTO	Included	-	-	-	-	-	-	Included
PG-19	Included	-	-	-	-	Included	Included	Included
HackerNews	Included	-	-	-	-	-	-	Included
Ubuntu IRC	Included	-	-	-	-	-	-	Included
EuroParl	Included	-	-	-	-	-	-	Included
StackExchange**	Included	-	-	-	-	-	-	Included
Code	*	-	-	-	-	Included	Included	Included

TxT360 does not include code. This decision was made due to the perceived low duplication code with other sources.
StackExchange and PubMed Central datasets will be uploaded shortly. All other datasets are present and complete.

Complete details on the dataset can be found in our blog post here.

TxT360 Performance

To evaluate the training efficiency of our dataset, we sampled 1.5T tokens from both FineWeb and TxT360 (using the aforementioned weighting) and conducted a training ablation on an 8x8B Mixture-of-Experts architecture, similar to Mixtral. We compared the learning curves by tracking training loss, validation scores, and performance across a wide array of diverse evaluation benchmarks. The validation set was sampled independently from SlimPajama. Note that this experiment is done on a slightly earlier version of the dataset.

Initial Data Representation

To produce TxT360, a comprehensive data processing pipeline was designed to account for the nuances of both web and curated datasets. The pipeline presents a unified framework for processing both data types, making it convenient and easily adaptive for users to revise and fine-tune the pipeline for their own use cases.

Web datasets are inherently noisy and varied. The TxT360 pipeline implements sophisticated filtering and deduplication techniques to clean and remove redundancies while preserving data integrity.

Curated datasets are typically structured and consistently formatted, but also can cause troubles with their own special formatting preferences. TxT360 filters these sources with selective steps to maintain their integrity while providing seamless integration into the larger dataset. Both data source types are globally deduplicated together resulting in ~5T tokens of high-quality data. The table below shows the source distribution of TxT360 tokens.

We further highlight the importance of mixing the datasets together with the right blend. The raw distribution of the deduplicated dataset is actually suboptimal, a simple working recipe is provided in the studies section. This recipe will create a dataset of 15T+ tokens, the largest high quality open source pre-training dataset.

Data Source	Raw Data Size	Token Count	Information Cut-Off Date
CommonCrawl	9.2 TB	4.83T	2024-30
Papers	712 GB	154.96B	Q4 2023
Wikipedia	199 GB	35.975B	-
Freelaw	71 GB	16.7B	Q1 2024
DM Math	22 GB	5.23B	-
USPTO	45 GB	4.95B	Q3 2024
PG-19	11 GB	2.63B	-
HackerNews	4.1 GB	1.08B	Q4 2023
Ubuntu IRC	4.7 GB	1.54B	Q3 2024
Europarl	6.1 GB	1.96B	-
StackExchange	79 GB	27.0B	Q4 2023

The TxT360 blog post provides all the details behind how we approached and implemented the following features:

CommonCrawl Data Filtering

Complete discussion on how 99 Common Crawl snapshots were filtered and comparison to previous filtering techinques (e.g. Dolma, DataTrove, RedPajamaV2).

Curated Source Filtering

Each data source was filtered individually with respect to the underlying data. Full details and discussion on how each source was filter are covered.

Global Deduplication

After the web and curated sources were filtered, all sources globally deduplicated to create TxT360. The tips and tricks behind the deduplication process are included.

Citation

BibTeX:

@misc{txt360data2024,
      title={TxT360: A Top-Quality LLM Pre-training Dataset Requires the Perfect Blend}, 
      author={Liping Tang, Nikhil Ranjan, Omkar Pangarkar, Xuezhi Liang, Zhen Wang, Li An, Bhaskar Rao, Zhoujun Cheng, Suqi Sun, Cun Mu, Victor Miller, Yue Peng, Eric P. Xing, Zhengzhong Liu},
      year={2024}
}

Downloads last month: 148

Edit dataset card