Journal Description

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.

Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, RePEc, and other databases.
Journal Rank: CiteScore - Q2 (Information Systems and Management)
Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 21 days after submission; acceptance to publication is undertaken in 4.3 days (median values for papers published in this journal in the first half of 2023).
Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.

Impact Factor: 2.6 (2022); 5-Year Impact Factor: 3.0 (2022)

Imprint Information Journal Flyer Open Access ISSN: 2306-5729

Latest Articles

Open AccessData Descriptor

Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones

and

Data 2023, 8(8), 129; https://doi.org/10.3390/data8080129 - 08 Aug 2023

Abstract

This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students’ solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). The DTA system is trained to distinguish between approaches to solve programming exercises, as well as to identify correct and incorrect solutions, using intelligent algorithms responsible for analyzing the source code in the DTA system using vector representations of programs based on Markov chains, calculating pairwise Jensen–Shannon distances for programs and using a hierarchical clustering algorithm to detect high-level approaches used by students in solving unique programming exercises. In the process of learning, each student must correctly solve 11 unique exercises in order to receive admission to the intermediate certification in the form of a test. In addition, a motivated student may try to find additional approaches to solve exercises they have already solved. At the same time, not all students are able or willing to solve the 11 unique exercises proposed to them; some will resort to outside help in solving all or part of the exercises. Since all information about the interactions of the students with the DTA system is recorded, it is possible to identify different types of students. First of all, the students can be classified into 2 classes: those who failed to solve 11 exercises and those who received admission to the intermediate certification in the form of a test, having solved the 11 unique exercises correctly. However, it is possible to identify classes of typical, motivated and suspicious students among the latter group based on the proposed dataset. The proposed dataset can be used to develop regression models that will predict outbursts of student activity when interacting with the DTA system, to solve clustering problems, to identify groups of students with a similar behavior model in the learning process and to develop intelligent data classifiers that predict the students’ behavior model and draw appropriate conclusions, not only at the end of the learning process but also during the course of it in order to motivate all students, even those who are classified as suspicious, to visualize the results of the learning process using various tools. Full article

► Show Figures

Figure 1

Open AccessData Descriptor

VEPL Dataset: A Vegetation Encroachment in Power Line Corridors Dataset for Semantic Segmentation of Drone Aerial Orthomosaics

Mateo Cano-Solis

John R. Ballesteros

and

John W. Branch-Bedoya

Data 2023, 8(8), 128; https://doi.org/10.3390/data8080128 - 04 Aug 2023

Abstract

Vegetation encroachment in power line corridors has multiple problems for modern energy-dependent societies. Failures due to the contact between power lines and vegetation can result in power outages and millions of dollars in losses. To address this problem, UAVs have emerged as a promising solution due to their ability to quickly and affordably monitor long corridors through autonomous flights or being remotely piloted. However, the extensive and manual task that requires analyzing every image acquired by the UAVs when searching for the existence of vegetation encroachment has led many authors to propose the use of Deep Learning to automate the detection process. Despite the advantages of using a combination of UAV imagery and Deep Learning, there is currently a lack of datasets that help to train Deep Learning models for this specific problem. This paper presents a dataset for the semantic segmentation of vegetation encroachment in power line corridors. RGB orthomosaics were obtained for a rural road area using a commercial UAV. The dataset is composed of pairs of tessellated RGB images, coming from the orthomosaic and corresponding multi-color masks representing three different classes: vegetation, power lines, and the background. A detailed description of the image acquisition process is provided, as well as the labeling task and the data augmentation techniques, among other relevant details to produce the dataset. Researchers would benefit from using the proposed dataset by developing and improving strategies for vegetation encroachment monitoring using UAVs and Deep Learning. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

Open AccessData Descriptor

eMailMe: A Method to Build Datasets of Corporate Emails in Portuguese

Akira A. de Moura Galvão Uematsu

and

Anarosa A. F. Brandão

Data 2023, 8(8), 127; https://doi.org/10.3390/data8080127 - 31 Jul 2023

Abstract

One of the areas in which knowledge management has application is in companies that are concerned with maintaining and disseminating their practices among their members. However, studies involving these two domains may end up suffering from the issue of data confidentiality. Furthermore, it is difficult to find data regarding organizations processes and associated knowledge. Therefore, this paper presents a method to support the generation of a labeled dataset composed of texts that simulate corporate emails containing sensitive information regarding disclosure, written in Portuguese. The method begins with the definition of the dataset’s size and content distribution; the structure of its emails’ texts; and the guidelines for specialists to build the emails’ texts. It aims to create datasets that can be used in the validation of a tacit knowledge extraction process considering the 5W1H approach for the resulting base. The method was applied to create a dataset with content related to several domains, such as Federal Court and Registry Office and Marketing, giving it diversity and realism, while simulating real-world situations in the specialists’ professional life. The dataset generated is available in an open-access repository so that it can be downloaded and, eventually, expanded. Full article

(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)

► Show Figures

Figure 1

Open AccessData Descriptor

Datasets of Simulated Exhaled Aerosol Images from Normal and Diseased Lungs with Multi-Level Similarities for Neural Network Training/Testing and Continuous Learning

Mohamed Talaat

Xiuhua Si

and

Jinxiang Xi

Data 2023, 8(8), 126; https://doi.org/10.3390/data8080126 - 31 Jul 2023

Abstract

Although exhaled aerosols and their patterns may seem chaotic in appearance, they inherently contain information related to the underlying respiratory physiology and anatomy. This study presented a multi-level database of simulated exhaled aerosol images from both normal and diseased lungs. An anatomically accurate mouth-lung geometry extending to G9 was modified to model two stages of obstructions in small airways and physiology-based simulations were utilized to capture the fluid-particle dynamics and exhaled aerosol images from varying breath tests. The dataset was designed to test two performance metrics of convolutional neural network (CNN) models when used for transfer learning: interpolation and extrapolation. To this aim, three testing datasets with decreasing image similarities were developed (i.e., level 1, inbox, and outbox). Four network models (AlexNet, ResNet-50, MobileNet, and EfficientNet) were tested and the performances of all models decreased for the outbox test images, which were outside the design space. The effect of continuous learning was also assessed for each model by adding new images into the training dataset and the newly trained network was tested at multiple levels. Among the four network models, ResNet-50 excelled in performance in both multi-level testing and continuous learning, the latter of which enhanced the accuracy of the most challenging classification task (i.e., 3-class with outbox test images) from 60.65% to 98.92%. The datasets can serve as a benchmark training/testing database for validating existent CNN models or quantifying the performance metrics of new CNN models. Full article

(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications in Diagnostics)

► Show Figures

Figure 1

Open AccessData Descriptor

Quantitative Metabolomic Dataset of Avian Eye Lenses

Ekaterina A. Zelentsova

and

Data 2023, 8(8), 125; https://doi.org/10.3390/data8080125 - 31 Jul 2023

Abstract

Metabolomics is a powerful set of methods that uses analytical techniques to identify and quantify metabolites in biological samples, providing a snapshot of the metabolic state of a biological system. In medicine, metabolomics may help to reveal the molecular basis of a disease, make a diagnosis, and monitor treatment responses, while in agriculture, it can improve crop yields and plant breeding. However, animal metabolomics faces several challenges due to the complexity and diversity of animal metabolomes, the lack of standardized protocols, and the difficulty in interpreting metabolomic data. The current dataset includes quantitative metabolomic profiles of eye lenses from 26 bird species (111 specimens) that can aid researchers in developing new experiments, mathematical models, and integrating with other “-omics” data. The dataset includes raw ¹H NMR spectra, protocols for sample preparation, and data preprocessing, with the final table containing information on the abundance of 89 reliably identified and quantified metabolites. The dataset is quantitative, making it relevant for supplementing with new specimens or comparison groups, followed by data mining and expected new interpretations. The data were obtained using the bird specimens collected in compliance with ethical standards and revealed potential differences in metabolic pathways due to phylogenetic differences or environmental exposure. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

Open AccessArticle

Measuring the Effect of Fraud on Data-Quality Dimensions

Samiha Brahimi

and

Mariam Elhussein

Data 2023, 8(8), 124; https://doi.org/10.3390/data8080124 - 30 Jul 2023

Abstract

Data preprocessing moves the data from raw to ready for analysis. Data resulting from fraud compromises the quality of the data and the resulting analysis. It can exist in datasets such that it goes undetected since it is included in the analysis. This study proposed a process for measuring the effect of fraudulent data during data preparation and its possible influence on quality. The five-step process begins with identifying the business rules related to the business process(s) affected by fraud and their associated quality dimensions. This is followed by measuring the business rules in the specified timeframe, detecting fraudulent data, cleaning them, and measuring their quality after cleaning. The process was implemented in the case of occupational fraud within a hospital context and the illegal issuance of underserved sick leave. The aim of the application is to identify the quality dimensions that are influenced by the injected fraudulent data and how these dimensions are affected. This study agrees with the existing literature and confirms its effects on timeliness, coherence, believability, and interpretability. However, this did not show any effect on consistency. Further studies are needed to arrive at a generalizable list of the quality dimensions that fraud can affect. Full article

► Show Figures

Figure 1

Open AccessArticle

Blockchain Payment Services in the Hospitality Sector: The Mediating Role of Data Security on Utilisation Efficiency of the Customer

and

Data 2023, 8(8), 123; https://doi.org/10.3390/data8080123 - 30 Jul 2023

Abstract

Blockchain technology has the potential to completely transform the hospitality sector by offering a safe, open, and effective method of payment. Increased customer utilisation efficiency may result from this. This study looks into how blockchain payment methods affect hotel customers’ intentions to stay loyal by devising four hypotheses. A questionnaire was specifically created and self-administered for this study as a data-gathering tool and distributed to hotel customers. The I.B.M. SPSS and Amos software packages were used to analyse the data of the 301 valid responses. Findings show that hospitality customers may use blockchain payment services if the customer is satisfied with the data security of this payment system. The study also highlighted that customer data security mediated the association between utilisation efficiency and blockchain payment systems. Blockchain payment services can affect visitors’ intentions to stay loyal by impacting data security and consumer happiness. Results suggest that blockchain payment systems can be useful for hospitality firms looking to increase client utilisation efficiency. Blockchain can simplify visitor booking and payment processes by providing a safe, open, and effective transacting method. This may result in a satisfying encounter that visitors are more inclined to recall and repeat. Full article

(This article belongs to the Special Issue Blockchain Applications in Data Management and Governance)

► Show Figures

Figure 1

Open AccessArticle

A Wavelet-Decomposed WD-ARMA-GARCH-EVT Model Approach to Comparing the Riskiness of the BitCoin and South African Rand Exchange Rates

Thabani Ndlovu

and

Delson Chikobvu

Data 2023, 8(7), 122; https://doi.org/10.3390/data8070122 - 24 Jul 2023

Abstract

In this paper, a hybrid of a Wavelet Decomposition–Generalised Auto-Regressive Conditional Heteroscedasticity–Extreme Value Theory (WD-ARMA-GARCH-EVT) model is applied to estimate the Value at Risk (VaR) of BitCoin (BTC/USD) and the South African Rand (ZAR/USD). The aim is to measure and compare the riskiness of the two currencies. New and improved estimation techniques for VaR have been suggested in the last decade in the aftermath of the global financial crisis of 2008. This paper aims to provide an improved alternative to the already existing statistical tools in estimating a currency VaR empirically. Maximal Overlap Discrete Wavelet Transform (MODWT) and two mother wavelet filters on the returns series are considered in this paper, viz., the Haar and Daubechies (d4). The findings show that BitCoin/USD is riskier than ZAR/USD since it has a higher VaR per unit invested in each currency. At the 99% significance level, BitCoin/USD has average values of VaR of 2.71% and 4.98% for the WD-ARMA-GARCH-GPD and WD-ARMA-GARCH-GEVD models, respectively; and this is slightly higher than the respective 2.69% and 3.59% for the ZAR/USD. The average BitCoin/USD returns of 0.001990 are higher than ZAR/USD returns of −0.000125. These findings are consistent with the mean-variance portfolio theory, which suggests a higher yield for riskier assets. Based on the p-values of the Kupiec likelihood ratio test, the hybrid model adequacy is largely accepted, as p-values are greater than 0.05, except for the WD-ARMA-GARCH-GEVD models at a 99% significance level for both currencies. The findings are helpful to financial risk practitioners and forex traders in formulating their diversification and hedging strategies and ascertaining the risk-adjusted capital requirement to be set aside as a cushion in the event of the occurrence of an actual loss. Full article

(This article belongs to the Special Issue Information Systems Innovation for Business: Change, Growth and Future Impact)

► Show Figures

Figure 1

Open AccessData Descriptor

Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students

Pongpon Nilaphruek

and

Pattama Charoenporn

Data 2023, 8(7), 121; https://doi.org/10.3390/data8070121 - 20 Jul 2023

Abstract

For over two decades, scholars and practitioners have emphasized the importance of digital literacy, yet the existing datasets are insufficient for establishing learning analytics in Thailand. Learning analytics focuses on gathering and analyzing student data to optimize learning tools and activities to improve students’ learning experiences. The main problem is that the ICT skill levels of the youth are rather low in Thailand. To facilitate research in this field, this study has compiled a dataset containing information from the IC3 digital literacy certification delivered at the Rajamangala University of Technology Thanyaburi (RMUTT) in Thailand between 2016 and 2023. This dataset is unique since it includes demographic and academic records about undergraduate students. The dataset was collected and underwent a preparation process, including data cleansing, anonymization, and release. This data enables the examination of student learning outcomes, represented by a dataset containing information about 45,603 records with students’ certification assessment scores. This compiled dataset provides a rich resource for researchers studying digital literacy and learning analytics. It offers researchers the opportunity to gain valuable insights, inform evidence-based educational practices, and contribute to the ongoing efforts to improve digital literacy education in Thailand and beyond. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-learning and Education)

► Show Figures

Figure 1

Open AccessData Descriptor

PoPu-Data: A Multilayered, Simultaneously Collected Lying Position Dataset

Mohammad Mohammad Amini

Arlindo F. Silva

Ahmad Reza Heravi

Davood Fanaei Sheikholeslami

Filipe Fidalgo

Francisco B. Rodrigues

Osvaldo Santos

Patrícia Coelho

and

Seyyed Sajjad Aemmi

Data 2023, 8(7), 120; https://doi.org/10.3390/data8070120 - 16 Jul 2023

Abstract

This study presents a dataset containing three layers of data that are useful for body position classification and all uses related to it. The PoPu dataset contains simultaneously collected data from two different sensor sheets—one placed over and one placed under a mattress; furthermore, a segmentation data layer was added where different body parts are identified using the pressure data from the sensors over the mattress. The data included were gathered from 60 healthy volunteers distributed among the different gathered characteristics: namely sex, weight, and height. This dataset can be used for position classification, assessing the viability of sensors placed under a mattress, and in applications regarding bedded or lying people or sleep related disorders. Full article

► Show Figures

Figure 1

Open AccessData Descriptor

Proteomic Shift in Mouse Embryonic Fibroblasts Pfa1 during Erastin, ML210, and BSO-Induced Ferroptosis

Olga M. Kudryashova

Alexey M. Nesterenko

Dmitry A. Korzhenevskii

Valeriy K. Sulyagin

Vasilisa M. Tereshchuk

Vsevolod V. Belousov

and

Arina G. Shokhina

Data 2023, 8(7), 119; https://doi.org/10.3390/data8070119 - 12 Jul 2023

Abstract

Ferroptosis is a unique variety of non-apoptotic cell death, driven by massive lipid oxidation in an iron-dependent manner. Since ferroptosis was introduced as a concept in 2012, it has demonstrated its essential role in the pathogenesis in neurodegenerative diseases and an important role in therapy-resistant cancer cells. Thus, detailed molecular understanding of both canonical and alternative ferroptosis pathways is required. There is a set of widely used chemical agents to modulate ferroptosis using different pathway targets: erastin blocks cystine–glutamate antiporter, system xc^-; ML210 directly inactivates GPX4; and L-buthionine sulfoximine (BSO) inhibits γ-glutamylcysteine synthetase, an essential enzyme for glutathione synthesis de novo. Most studies have focused on the lipidomic profiling of model systems undergoing death in a ferroptotic modality. In this study, we developed high-quality shotgun proteome sequencing during ferroptosis induction by three widely used chemical agents (erastin, ML210, and BSO) before and after 24 and 48 h of treatment. Chromato-mass spectra were registered in DDA mode and are suitable for further label-free quantification. Both processed and raw files are publicly available and could be a valuable dynamic proteome map for further ferroptosis investigation. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

Open AccessData Descriptor

A Semantically Annotated 15-Class Ground Truth Dataset for Substation Equipment to Train Semantic Segmentation Models

Andreas Anael Pereira Gomes

Francisco Itamarati Secolo Ganacim

Fabiano Gustavo Silveira Magrin

Nara Bobko

Leonardo Göbel Fernandes

Anselmo Pombeiro

and

Eduardo Félix Ribeiro Romaneli

Data 2023, 8(7), 118; https://doi.org/10.3390/data8070118 - 05 Jul 2023

Abstract

The lack of annotated semantic segmentation datasets for electrical substations in the literature poses a significant problem for machine learning tasks; before training a model, a dataset is needed. This paper presents a new dataset of electric substations with 1660 images annotated with 15 classes, including insulators, disconnect switches, transformers and other equipment commonly found in substation environments. The images were captured using a combination of human, fixed and AGV-mounted cameras at different times of the day, providing a diverse set of training and testing data for algorithm development. In total, 50,705 annotations were created by a team of experienced annotators, using a standardized process to ensure accuracy across the dataset. The resulting dataset provides a valuable resource for researchers and practitioners working in the fields of substation automation, substation monitoring and computer vision. Its availability has the potential to advance the state of the art in this important area. Full article

(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)

► Show Figures

Figure 1

Open AccessData Descriptor

Assessment of Maize Silage Quality under Different Pre-Ensiling Conditions

and

Data 2023, 8(7), 117; https://doi.org/10.3390/data8070117 - 02 Jul 2023

Abstract

Maize silage suffers from several factors that affect the final quality and, to some extent, pre-ensiled conditions that can be potentially tuned during harvesting. After assessing new indices for silage quality under lab-scale conditions, several trials have been conducted to find associations between fresh maize characteristics and silage features. Among the first, we included field input levels, FAO class, maturity stage, use of bacterial inoculants, sealing delay and chemical traits, whereas, among the latter, we assessed density and porosity, pH, fermentative profile, dry matter loss and aerobic stability. The trials were conducted using vacuum bags or mini silo buckets. More than 1500 maize samples harvested in Northeast Italy were analysed during the 2016–2022 period. Moreover, to evaluate silage aerobic stability, the fermentative profile and temperature were measured 14 days after the opening of the silo. The association between silage quality and aerobic stability was assessed, and a prognostic risk score was used to calculate the probability of aerobic instability. The dataset could provide baseline information to promote the continuous improvement of maize silage management from different botanical and crop fields, thus improving agronomic and animal farm resource allocation from a precision agriculture perspective. Full article

Open AccessData Descriptor

Dataset of Linkability Networks of Ethereum Accounts Involved in NFT Trading of Top 15 NFT Collections

Aleksandar Tošić

Niki Hrovatin

and

Jernej Vičič

Data 2023, 8(7), 116; https://doi.org/10.3390/data8070116 - 28 Jun 2023

Abstract

In this paper, we present subgraphs of Ethereum wallets involved in NFT trades of the top 15 ERC721 NFT collections. To obtain the subgraphs, we have extracted the Ethereum transaction graph from a live Ethereum node and filtered out exchanges, mining pools, and smart contracts. For each of the selected collections, we identified the set of accounts involved in NFT trading, which we used to perform a breadth-first search in the Ethereum transaction graph to obtain a subgraph. These subgraphs can offer insight into the linkability of accounts participating in NFT trading on the Ethereum blockchain. Full article

(This article belongs to the Special Issue Blockchain Applications in Data Management and Governance)

► Show Figures

Figure 1

Open AccessData Descriptor

Factory-Based Vibration Data for Bearing-Fault Detection

Adam Lundström

and

Mattias O’Nils

Data 2023, 8(7), 115; https://doi.org/10.3390/data8070115 - 28 Jun 2023

Abstract

The importance of preventing failures in bearings has led to a large amount of research being conducted to find methods for fault diagnostics and prognostics. Many of these solutions, such as deep learning methods, require a significant amount of data to perform well. This is a reason why publicly available data are important, and there currently exist several open datasets that contain different conditions and faults. However, one challenge is that almost all of these data come from a laboratory setting, where conditions might differ from those found in an industrial environment where the methods are intended to be used. This also means that there may be characteristics of the industrial data that are important to take into account. Therefore, this study describes a completely new dataset for bearing faults from a pulp mill. The analysis of the data shows that the faults vary significantly in terms of fault development, rotation speed, and the amplitude of the vibration signal. It also suggests that methods built for this environment need to consider that no historical examples of faults in the target domain exist and that external events can occur that are not related to any condition of the bearing. Full article

► Show Figures

Figure 1

Open AccessData Descriptor

A Survey Dataset Evaluating Perceptions of Civil Engineering Students about Building Information Modelling (BIM)

and

Data 2023, 8(7), 114; https://doi.org/10.3390/data8070114 - 28 Jun 2023

Abstract

The implementation of Building Information Modelling (BIM) technologies has become increasingly central in the design, construction and maintenance of both civil structures and infrastructures. As more and more software houses develop new BIM software solutions and a wide range of private and public stakeholders employ them, several educational institutes across the globe strive to expand their teaching portfolio to encompass learning and teaching of BIM. This dataset deals with the perceptions expressed by all the civil engineering undergraduate students who attended an academic course specifically about BIM at University of Stavanger (UiS), Norway, during the second semester 2022. The survey was divided into five parts and collected information regarding as many overarching aspects: socio-demographic data, perceptions about BIM before and after course attendance, satisfaction about the academic course and the way it was conducted. Considering the very moderate sample size (28 students) and potential biases due to the specific context of the University of Stavanger, the dataset can provide a useful insight into teaching approaches and future curriculum development, rather than indicating major and generalized trends in BIM education. As the questionnaire responses shed light on the feedbacks and perceptions expressed by university students dealing with BIM for their first time, the formed dataset can offer a straightforward appreciation of students’ cognitive behaviour in BIM education. Full article

► Show Figures

Figure 1

Open AccessArticle

VPTD: Human Face Video Dataset for Personality Traits Detection

and

Data 2023, 8(7), 113; https://doi.org/10.3390/data8070113 - 22 Jun 2023

Abstract

In this paper, we propose a dataset for personality traits detection based on human face videos. Ground truth data have been annotated using the IPIP-50 personality test that every participant is implementing. To collect the dataset, we developed a web-based platform that allows us to acquire spontaneous answers for predefined questions from the respondents. The website allows the participants to record an interactive interview in order to imitate the real-life interview. The dataset includes 38 videos (2 min on average) for people of different races, genders, and ages. In the paper, we propose the top five personality traits calculated based on the test, as well as the top five personality traits calculated by our own developed model that determines this information based on video analysis. We introduced a statistical analysis for the collected dataset, and we also applied a K-means clustering algorithm to cluster the data and present the clustering results. Full article

► Show Figures

Figure 1

Open AccessData Descriptor

RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage Detection, Semantic and Instance Segmentation of Cocoa Pods

Juan Felipe Restrepo-Arias

María Isabel Salinas-Agudelo

María Isabel Hernandez-Pérez

Alejandro Marulanda-Tobón

and

María Camila Giraldo-Carvajal

Data 2023, 8(6), 112; https://doi.org/10.3390/data8060112 - 18 Jun 2023

Abstract

Fruit counting and ripeness detection are computer vision applications that have gained strength in recent years due to the advancement of new algorithms, especially those based on artificial neural networks (ANNs), better known as deep learning. In agriculture, those algorithms capable of fruit counting, including information about their ripeness, are mainly applied to make production forecasts or plan different activities such as fertilization or crop harvest. This paper presents the RipSetCocoaCNCH12 dataset of cocoa pods labeled at four different ripeness stages: stage 1 (0–2 months), stage 2 (2–4 months), stage 3 (4–6 months), and harvest stage (>6 months). An additional class was also included for pods aborted by plants in the early stage of development. A total of 4116 images were labeled to train algorithms that mainly perform semantic and instance segmentation. The labeling was carried out with CVAT (Computer Vision Annotation Tool). The dataset, therefore, includes labeling in two formats: COCO 1.0 and segmentation mask 1.1. The images were taken with different mobile devices (smartphones), in field conditions, during the harvest season at different times of the day, which could allow the algorithms to be trained with data that includes many variations in lighting, colors, textures, and sizes of the cocoa pods. As far as we know, this is the first openly available dataset for cocoa pod detection with semantic segmentation for five classes, 4116 images, and 7917 instances, comprising RGB images and two different formats for labels. With the publication of this dataset, we expect that researchers in smart farming, especially in cocoa cultivation, can benefit from the quantity and variety of images it contains. Full article

► Show Figures

Figure 1

Open AccessData Descriptor

Self-Reported Mental Health and Psychosocial Correlates during the COVID-19 Pandemic: Data from the General Population in Italy

and

Maria Cristina Verrocchio

Data 2023, 8(6), 111; https://doi.org/10.3390/data8060111 - 16 Jun 2023

Abstract

The COVID-19 pandemic tremendously impacted people’s day-to-day activities and mental health. This article describes the dataset used to investigate the psychological impact of the first national lockdown on the general Italian population. For this purpose, an online survey was disseminated via Qualtrics between 1 April and 20 April 2020, to record various socio-demographic and psychological variables. The measures included both validated (namely, the Impact of the Event Scale-Revised, the Perceived Stress Scale, the nine-item Patient Health Questionnaire, the seven-item Generalized Anxiety Disorder scale, the Big Five Inventory 10-Item, and the Whiteley Index-7) and ad hoc questionnaires (nine items to investigate in-group and out-group trust). The final sample comprised 4081 participants (18–85 years old). The dataset could be helpful to other researchers in understanding the psychological impact of the COVID-19 pandemic and its related preventive and protective measures. Furthermore, the present data might help shed some light on the role of individual differences in response to traumatic events. Finally, this dataset can increase the knowledge in investigating psychological distress, health anxiety, and personality traits. Full article

Open AccessArticle

Deep Learning-Based Black Spot Identification on Greek Road Networks

Ioannis Karamanlis

Alexandros Kokkalis

Vassilios Profillidis

and

Data 2023, 8(6), 110; https://doi.org/10.3390/data8060110 - 16 Jun 2023

Abstract

Black spot identification, a spatiotemporal phenomenon, involves analysing the geographical location and time-based occurrence of road accidents. Typically, this analysis examines specific locations on road networks during set time periods to pinpoint areas with a higher concentration of accidents, known as black spots. By evaluating these problem areas, researchers can uncover the underlying causes and reasons for increased collision rates, such as road design, traffic volume, driver behaviour, weather, and infrastructure. However, challenges in identifying black spots include limited data availability, data quality, and assessing contributing factors. Additionally, evolving road design, infrastructure, and vehicle safety technology can affect black spot analysis and determination. This study focused on traffic accidents in Greek road networks to recognize black spots, utilizing data from police and government-issued car crash reports. The study produced a publicly available dataset called Black Spots of North Greece (BSNG) and a highly accurate identification method. Full article

(This article belongs to the Special Issue Signal Processing for Data Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

► Journal Browser

Highly Accessed Articles

Latest Books

More Books and Reprints...

E-Mail Alert

News

31 July 2023
MDPI’s 2022 Best PhD Thesis Awards in Computer Science and Mathematics—Winners Announced

31 July 2023
MDPI’s 2022 Young Investigator Awards in Computer Science and Mathematics—Winners Announced

31 July 2023
MDPI’s 2022 Outstanding Reviewer Awards in Computer Science and Mathematics—Winners Announced

More News & Announcements...

Topics

Edit a Topic

Topic in Applied Sciences, Biomedicines, BioMedInformatics, Data, Life

Machine Learning Techniques Driven Medicine Analysis Topic Editors: Chunhua Su, Celestine Iwendi, Thippa Reddy Gadekallu, Keping Yu
Deadline: 10 August 2023

Topic in Data, Future Internet, Information, Mathematics, Symmetry

Application of Deep Learning Method in 6G Communication Technology Topic Editors: Mohamed Abouhawwash, K. Venkatachalam
Deadline: 31 March 2024

Topic in Applied Sciences, Batteries, Buildings, Data, Electricity, Electronics, Energies, Smart Cities

Smart Energy Systems, 2nd Edition Topic Editors: Hugo Morais, Rui Castro, Cindy Guzman
Deadline: 31 May 2024

Topic in Algorithms, Data, Information, Mathematics, Symmetry

Decision-Making and Data Mining for Sustainable Computing Topic Editors: Sunil Jha, Malgorzata Rataj, Xiaorui Zhang
Deadline: 30 November 2024

Conferences

Announce Your Conference

More Conferences...

Special Issues

Edit a Special Issue

Special Issue in Data

Blockchain Applications in Data Management and Governance Guest Editors: Bijan Raahemi, Waeal J. Obidallah
Deadline: 31 August 2023

Special Issue in Data

Data Science in Fintech Guest Editors: Henry Han, Qiannong (Chan) Gu, Diane Li, Tie Wei, Jeffrey Yi-Lin Forrest
Deadline: 10 October 2023

Special Issue in Data

Privacy and Trust in Smart Cities Guest Editors: M Shahriar Rahman, Anirban Basu
Deadline: 16 November 2023

Special Issue in Data

Information Systems Innovation for Business: Change, Growth and Future Impact Guest Editors: Varun Gupta, Chetna Gupta, Leandro Ferreira Pereira, Lawrence Peters, Antonio Ferreras
Deadline: 30 November 2023

More Special Issues

Topical Collections

Topical Collection in Data

Modern Geophysical and Climate Data Analysis: Tools and Methods Collection Editors: Vladimir Sreckovic, Zoran Mijic

Journal Description

Data

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Topical Collections

Further Information

Guidelines

MDPI Initiatives

Follow MDPI