![]() An RL algorithm is then assigned to find a policy (or controller, in control engineering terminology) that solves the control objective optimally, given the problem constraints and uncertainties. The process typically involves hand-engineering a reward function, which assigns a reward, or penalty, to the actions that induce desired, or undesired, outcomes, respectively. Contrary to optimal control theory, RL is based on evaluative, rather than instructive, feedback and comes in different forms, which may or may not include partial knowledge of the environment or the system. The roots of RL can be traced back to the 60's and a thorough overview of its evolution can be found in Sutton and Barto (2018) and Bertsekas (2019). Reinforcement learning (RL), also known as neuro-dynamic programming or approximate dynamic programming, is a field of research developed by the Artificial Intelligence (AI) community for achieving optimal sequential decision making under system and environment uncertainty. In the majority of the aforementioned works, model-based approaches exploiting human knowledge on hydrodynamics and the laws of motion were considered. To this end, research effort has been dedicated to developing flexible methods for updating the model parameters by, for instance, using system identification methods or parameter estimation via neural networks ( Källström and Åström, 1981 Kallstrom, 1982 Fossen et al., 1996 Sutton et al., 1997 Mišković et al., 2011 Dai et al., 2012 Wang et al., 2017). Two well-researched ways to achieve such performance diversity with conventional methods are to design numerous controllers and switch among them when needed, or to use adaptive approaches. Naturally, the main drawback is that, when moving from one speed regime to another, controllers and/or models with different properties are needed. ![]() This approach simplifies the vessel modeling process and has led to dynamic positioning (DP) and station keeping controllers for speeds close to zero, and trajectory tracking or path following (depending on whether temporal constraints are considered) controllers when a vessel is in transit mode. Due to the fact that the hydrodynamic coefficients, and consequently the behavior, of a marine vehicle can vary significantly in different speed regimes, a common approach has been to design controllers for specific motion control scenarios. As a result, considerable research effort has been dedicated to the topic since the early 90's ( Fossen, 1994), resulting in a vast literature utilizing ideas from virtually every branch of control engineering: Linear, non-linear, adaptive, intelligent, optimal, fuzzy, and stochastic control approaches, to name a few, have been developed and tested over the years, and many of their properties are well-understood ( Hasegawa et al., 1989 Pettersen and Egeland, 1996 Katebi et al., 1997 Fossen, 2000 McGookin et al., 2000 Soetanto et al., 2003 Wang et al., 2015 Do, 2016). Finally, we include an section with considerations about assurance for RL-based methods and where our approach stands in terms of the main challenges.Ĭontrol of marine vehicles is a challenging problem, mostly due to the unpredictable nature of the sea and the difficulty in developing accurate mathematical models to represent the varying marine vehicle dynamics. The results demonstrate the method's ability to accomplish the control objectives and a good agreement between the performance achieved in the Revolt Digital Twin and the sea trials. The method's efficiency is evaluated via simulations and sea trials, with the unmanned surface vehicle (USV) ReVolt performing three different tracking tasks: The four corner DP test, straight-path tracking and curved-path tracking. The proposed method learns online both a model-based feedforward controller, as well an optimizing feedback policy in order to follow a desired trajectory under the influence of environmental forces. We present a reinforcement learning-based (RL) control scheme for trajectory tracking of fully-actuated surface vessels. 3Digital Assurance Program, Group Technology and Research, DNV GL, Trondheim, Norway.2Centre for Autonomous Marine Operations and Systems, Norwegian University of Science and Technology, Trondheim, Norway. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |