A convolutional vision transformer for semantic segmentation of side-scan sonar data
dc.contributor.author
dc.date.accessioned
2023-10-25T09:33:23Z
dc.date.available
2023-10-25T09:33:23Z
dc.date.issued
2023-10-15
dc.identifier.issn
0029-8018
dc.identifier.uri
dc.description.abstract
Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder–decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further enhance multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements
dc.description.sponsorship
This study was supported by the DeeperSense project, funded by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement no. 101016958. The study was also supported in part by the SIREC project, funded by the Ministerio de Ciencia e Innovación, Gobierno de España under agreement no. PID2020-116736RB-IOO
Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier
dc.format.mimetype
application/pdf
dc.language.iso
eng
dc.publisher
Elsevier
dc.relation
PID2020-116736RB-I00
dc.relation.isformatof
Reproducció digital del document publicat a: https://doi.org/10.1016/j.oceaneng.2023.115647
dc.relation.ispartof
Ocean Engineering, 2023, vol. 286, núm. 2, p. 115647
dc.relation.ispartofseries
Articles publicats (D-ATC)
dc.rights
Attribution 4.0 International
dc.rights.uri
dc.subject
dc.title
A convolutional vision transformer for semantic segmentation of side-scan sonar data
dc.type
info:eu-repo/semantics/article
dc.rights.accessRights
info:eu-repo/semantics/openAccess
dc.relation.projectID
info:eu-repo/grantAgreement/EC/H2020/101016958/EU/Deep-Learning for Multimodal Sensor Fusion/DeeperSense
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-116736RB-I00/ES/ROBOT INTELIGENTE PARA LA EXPLORACION Y CLASIFICACION DEL FONDO MARINO/
dc.type.version
info:eu-repo/semantics/publishedVersion
dc.identifier.doi
dc.identifier.idgrec
037092
dc.contributor.funder
dc.type.peerreviewed
peer-reviewed
dc.relation.FundingProgramme
dc.relation.ProjectAcronym
dc.identifier.eissn
1873-5258