Sign in to use this feature.

Years

Between: -

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (846)

Search Parameters:
Journal = Data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
Data Descriptor
Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones
Data 2023, 8(8), 129; https://doi.org/10.3390/data8080129 - 08 Aug 2023
Viewed by 102
Abstract
This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students’ solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates [...] Read more.
This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students’ solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). The DTA system is trained to distinguish between approaches to solve programming exercises, as well as to identify correct and incorrect solutions, using intelligent algorithms responsible for analyzing the source code in the DTA system using vector representations of programs based on Markov chains, calculating pairwise Jensen–Shannon distances for programs and using a hierarchical clustering algorithm to detect high-level approaches used by students in solving unique programming exercises. In the process of learning, each student must correctly solve 11 unique exercises in order to receive admission to the intermediate certification in the form of a test. In addition, a motivated student may try to find additional approaches to solve exercises they have already solved. At the same time, not all students are able or willing to solve the 11 unique exercises proposed to them; some will resort to outside help in solving all or part of the exercises. Since all information about the interactions of the students with the DTA system is recorded, it is possible to identify different types of students. First of all, the students can be classified into 2 classes: those who failed to solve 11 exercises and those who received admission to the intermediate certification in the form of a test, having solved the 11 unique exercises correctly. However, it is possible to identify classes of typical, motivated and suspicious students among the latter group based on the proposed dataset. The proposed dataset can be used to develop regression models that will predict outbursts of student activity when interacting with the DTA system, to solve clustering problems, to identify groups of students with a similar behavior model in the learning process and to develop intelligent data classifiers that predict the students’ behavior model and draw appropriate conclusions, not only at the end of the learning process but also during the course of it in order to motivate all students, even those who are classified as suspicious, to visualize the results of the learning process using various tools. Full article
Show Figures

Figure 1

Data Descriptor
VEPL Dataset: A Vegetation Encroachment in Power Line Corridors Dataset for Semantic Segmentation of Drone Aerial Orthomosaics
Data 2023, 8(8), 128; https://doi.org/10.3390/data8080128 - 04 Aug 2023
Viewed by 259
Abstract
Vegetation encroachment in power line corridors has multiple problems for modern energy-dependent societies. Failures due to the contact between power lines and vegetation can result in power outages and millions of dollars in losses. To address this problem, UAVs have emerged as a [...] Read more.
Vegetation encroachment in power line corridors has multiple problems for modern energy-dependent societies. Failures due to the contact between power lines and vegetation can result in power outages and millions of dollars in losses. To address this problem, UAVs have emerged as a promising solution due to their ability to quickly and affordably monitor long corridors through autonomous flights or being remotely piloted. However, the extensive and manual task that requires analyzing every image acquired by the UAVs when searching for the existence of vegetation encroachment has led many authors to propose the use of Deep Learning to automate the detection process. Despite the advantages of using a combination of UAV imagery and Deep Learning, there is currently a lack of datasets that help to train Deep Learning models for this specific problem. This paper presents a dataset for the semantic segmentation of vegetation encroachment in power line corridors. RGB orthomosaics were obtained for a rural road area using a commercial UAV. The dataset is composed of pairs of tessellated RGB images, coming from the orthomosaic and corresponding multi-color masks representing three different classes: vegetation, power lines, and the background. A detailed description of the image acquisition process is provided, as well as the labeling task and the data augmentation techniques, among other relevant details to produce the dataset. Researchers would benefit from using the proposed dataset by developing and improving strategies for vegetation encroachment monitoring using UAVs and Deep Learning. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
eMailMe: A Method to Build Datasets of Corporate Emails in Portuguese
Data 2023, 8(8), 127; https://doi.org/10.3390/data8080127 - 31 Jul 2023
Viewed by 324
Abstract
One of the areas in which knowledge management has application is in companies that are concerned with maintaining and disseminating their practices among their members. However, studies involving these two domains may end up suffering from the issue of data confidentiality. Furthermore, it [...] Read more.
One of the areas in which knowledge management has application is in companies that are concerned with maintaining and disseminating their practices among their members. However, studies involving these two domains may end up suffering from the issue of data confidentiality. Furthermore, it is difficult to find data regarding organizations processes and associated knowledge. Therefore, this paper presents a method to support the generation of a labeled dataset composed of texts that simulate corporate emails containing sensitive information regarding disclosure, written in Portuguese. The method begins with the definition of the dataset’s size and content distribution; the structure of its emails’ texts; and the guidelines for specialists to build the emails’ texts. It aims to create datasets that can be used in the validation of a tacit knowledge extraction process considering the 5W1H approach for the resulting base. The method was applied to create a dataset with content related to several domains, such as Federal Court and Registry Office and Marketing, giving it diversity and realism, while simulating real-world situations in the specialists’ professional life. The dataset generated is available in an open-access repository so that it can be downloaded and, eventually, expanded. Full article
(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)
Show Figures

Figure 1

Data Descriptor
Datasets of Simulated Exhaled Aerosol Images from Normal and Diseased Lungs with Multi-Level Similarities for Neural Network Training/Testing and Continuous Learning
Data 2023, 8(8), 126; https://doi.org/10.3390/data8080126 - 31 Jul 2023
Viewed by 256
Abstract
Although exhaled aerosols and their patterns may seem chaotic in appearance, they inherently contain information related to the underlying respiratory physiology and anatomy. This study presented a multi-level database of simulated exhaled aerosol images from both normal and diseased lungs. An anatomically accurate [...] Read more.
Although exhaled aerosols and their patterns may seem chaotic in appearance, they inherently contain information related to the underlying respiratory physiology and anatomy. This study presented a multi-level database of simulated exhaled aerosol images from both normal and diseased lungs. An anatomically accurate mouth-lung geometry extending to G9 was modified to model two stages of obstructions in small airways and physiology-based simulations were utilized to capture the fluid-particle dynamics and exhaled aerosol images from varying breath tests. The dataset was designed to test two performance metrics of convolutional neural network (CNN) models when used for transfer learning: interpolation and extrapolation. To this aim, three testing datasets with decreasing image similarities were developed (i.e., level 1, inbox, and outbox). Four network models (AlexNet, ResNet-50, MobileNet, and EfficientNet) were tested and the performances of all models decreased for the outbox test images, which were outside the design space. The effect of continuous learning was also assessed for each model by adding new images into the training dataset and the newly trained network was tested at multiple levels. Among the four network models, ResNet-50 excelled in performance in both multi-level testing and continuous learning, the latter of which enhanced the accuracy of the most challenging classification task (i.e., 3-class with outbox test images) from 60.65% to 98.92%. The datasets can serve as a benchmark training/testing database for validating existent CNN models or quantifying the performance metrics of new CNN models. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications in Diagnostics)
Show Figures

Figure 1

Data Descriptor
Quantitative Metabolomic Dataset of Avian Eye Lenses
Data 2023, 8(8), 125; https://doi.org/10.3390/data8080125 - 31 Jul 2023
Viewed by 252
Abstract
Metabolomics is a powerful set of methods that uses analytical techniques to identify and quantify metabolites in biological samples, providing a snapshot of the metabolic state of a biological system. In medicine, metabolomics may help to reveal the molecular basis of a disease, [...] Read more.
Metabolomics is a powerful set of methods that uses analytical techniques to identify and quantify metabolites in biological samples, providing a snapshot of the metabolic state of a biological system. In medicine, metabolomics may help to reveal the molecular basis of a disease, make a diagnosis, and monitor treatment responses, while in agriculture, it can improve crop yields and plant breeding. However, animal metabolomics faces several challenges due to the complexity and diversity of animal metabolomes, the lack of standardized protocols, and the difficulty in interpreting metabolomic data. The current dataset includes quantitative metabolomic profiles of eye lenses from 26 bird species (111 specimens) that can aid researchers in developing new experiments, mathematical models, and integrating with other “-omics” data. The dataset includes raw 1H NMR spectra, protocols for sample preparation, and data preprocessing, with the final table containing information on the abundance of 89 reliably identified and quantified metabolites. The dataset is quantitative, making it relevant for supplementing with new specimens or comparison groups, followed by data mining and expected new interpretations. The data were obtained using the bird specimens collected in compliance with ethical standards and revealed potential differences in metabolic pathways due to phylogenetic differences or environmental exposure. Full article
Show Figures

Figure 1

Article
Blockchain Payment Services in the Hospitality Sector: The Mediating Role of Data Security on Utilisation Efficiency of the Customer
Data 2023, 8(8), 123; https://doi.org/10.3390/data8080123 - 30 Jul 2023
Viewed by 256
Abstract
Blockchain technology has the potential to completely transform the hospitality sector by offering a safe, open, and effective method of payment. Increased customer utilisation efficiency may result from this. This study looks into how blockchain payment methods affect hotel customers’ intentions to stay [...] Read more.
Blockchain technology has the potential to completely transform the hospitality sector by offering a safe, open, and effective method of payment. Increased customer utilisation efficiency may result from this. This study looks into how blockchain payment methods affect hotel customers’ intentions to stay loyal by devising four hypotheses. A questionnaire was specifically created and self-administered for this study as a data-gathering tool and distributed to hotel customers. The I.B.M. SPSS and Amos software packages were used to analyse the data of the 301 valid responses. Findings show that hospitality customers may use blockchain payment services if the customer is satisfied with the data security of this payment system. The study also highlighted that customer data security mediated the association between utilisation efficiency and blockchain payment systems. Blockchain payment services can affect visitors’ intentions to stay loyal by impacting data security and consumer happiness. Results suggest that blockchain payment systems can be useful for hospitality firms looking to increase client utilisation efficiency. Blockchain can simplify visitor booking and payment processes by providing a safe, open, and effective transacting method. This may result in a satisfying encounter that visitors are more inclined to recall and repeat. Full article
(This article belongs to the Special Issue Blockchain Applications in Data Management and Governance)
Show Figures

Figure 1

Article
Measuring the Effect of Fraud on Data-Quality Dimensions
Data 2023, 8(8), 124; https://doi.org/10.3390/data8080124 - 30 Jul 2023
Viewed by 183
Abstract
Data preprocessing moves the data from raw to ready for analysis. Data resulting from fraud compromises the quality of the data and the resulting analysis. It can exist in datasets such that it goes undetected since it is included in the analysis. This [...] Read more.
Data preprocessing moves the data from raw to ready for analysis. Data resulting from fraud compromises the quality of the data and the resulting analysis. It can exist in datasets such that it goes undetected since it is included in the analysis. This study proposed a process for measuring the effect of fraudulent data during data preparation and its possible influence on quality. The five-step process begins with identifying the business rules related to the business process(s) affected by fraud and their associated quality dimensions. This is followed by measuring the business rules in the specified timeframe, detecting fraudulent data, cleaning them, and measuring their quality after cleaning. The process was implemented in the case of occupational fraud within a hospital context and the illegal issuance of underserved sick leave. The aim of the application is to identify the quality dimensions that are influenced by the injected fraudulent data and how these dimensions are affected. This study agrees with the existing literature and confirms its effects on timeliness, coherence, believability, and interpretability. However, this did not show any effect on consistency. Further studies are needed to arrive at a generalizable list of the quality dimensions that fraud can affect. Full article
Show Figures

Figure 1

Article
A Wavelet-Decomposed WD-ARMA-GARCH-EVT Model Approach to Comparing the Riskiness of the BitCoin and South African Rand Exchange Rates
Data 2023, 8(7), 122; https://doi.org/10.3390/data8070122 - 24 Jul 2023
Viewed by 369
Abstract
In this paper, a hybrid of a Wavelet Decomposition–Generalised Auto-Regressive Conditional Heteroscedasticity–Extreme Value Theory (WD-ARMA-GARCH-EVT) model is applied to estimate the Value at Risk (VaR) of BitCoin (BTC/USD) and the South African Rand (ZAR/USD). The aim is to measure and compare the riskiness [...] Read more.
In this paper, a hybrid of a Wavelet Decomposition–Generalised Auto-Regressive Conditional Heteroscedasticity–Extreme Value Theory (WD-ARMA-GARCH-EVT) model is applied to estimate the Value at Risk (VaR) of BitCoin (BTC/USD) and the South African Rand (ZAR/USD). The aim is to measure and compare the riskiness of the two currencies. New and improved estimation techniques for VaR have been suggested in the last decade in the aftermath of the global financial crisis of 2008. This paper aims to provide an improved alternative to the already existing statistical tools in estimating a currency VaR empirically. Maximal Overlap Discrete Wavelet Transform (MODWT) and two mother wavelet filters on the returns series are considered in this paper, viz., the Haar and Daubechies (d4). The findings show that BitCoin/USD is riskier than ZAR/USD since it has a higher VaR per unit invested in each currency. At the 99% significance level, BitCoin/USD has average values of VaR of 2.71% and 4.98% for the WD-ARMA-GARCH-GPD and WD-ARMA-GARCH-GEVD models, respectively; and this is slightly higher than the respective 2.69% and 3.59% for the ZAR/USD. The average BitCoin/USD returns of 0.001990 are higher than ZAR/USD returns of −0.000125. These findings are consistent with the mean-variance portfolio theory, which suggests a higher yield for riskier assets. Based on the p-values of the Kupiec likelihood ratio test, the hybrid model adequacy is largely accepted, as p-values are greater than 0.05, except for the WD-ARMA-GARCH-GEVD models at a 99% significance level for both currencies. The findings are helpful to financial risk practitioners and forex traders in formulating their diversification and hedging strategies and ascertaining the risk-adjusted capital requirement to be set aside as a cushion in the event of the occurrence of an actual loss. Full article
Show Figures

Figure 1

Data Descriptor
Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students
Data 2023, 8(7), 121; https://doi.org/10.3390/data8070121 - 20 Jul 2023
Viewed by 356
Abstract
For over two decades, scholars and practitioners have emphasized the importance of digital literacy, yet the existing datasets are insufficient for establishing learning analytics in Thailand. Learning analytics focuses on gathering and analyzing student data to optimize learning tools and activities to improve [...] Read more.
For over two decades, scholars and practitioners have emphasized the importance of digital literacy, yet the existing datasets are insufficient for establishing learning analytics in Thailand. Learning analytics focuses on gathering and analyzing student data to optimize learning tools and activities to improve students’ learning experiences. The main problem is that the ICT skill levels of the youth are rather low in Thailand. To facilitate research in this field, this study has compiled a dataset containing information from the IC3 digital literacy certification delivered at the Rajamangala University of Technology Thanyaburi (RMUTT) in Thailand between 2016 and 2023. This dataset is unique since it includes demographic and academic records about undergraduate students. The dataset was collected and underwent a preparation process, including data cleansing, anonymization, and release. This data enables the examination of student learning outcomes, represented by a dataset containing information about 45,603 records with students’ certification assessment scores. This compiled dataset provides a rich resource for researchers studying digital literacy and learning analytics. It offers researchers the opportunity to gain valuable insights, inform evidence-based educational practices, and contribute to the ongoing efforts to improve digital literacy education in Thailand and beyond. Full article
Show Figures

Figure 1

Data Descriptor
PoPu-Data: A Multilayered, Simultaneously Collected Lying Position Dataset
Data 2023, 8(7), 120; https://doi.org/10.3390/data8070120 - 16 Jul 2023
Viewed by 381
Abstract
This study presents a dataset containing three layers of data that are useful for body position classification and all uses related to it. The PoPu dataset contains simultaneously collected data from two different sensor sheets—one placed over and one placed under a mattress; [...] Read more.
This study presents a dataset containing three layers of data that are useful for body position classification and all uses related to it. The PoPu dataset contains simultaneously collected data from two different sensor sheets—one placed over and one placed under a mattress; furthermore, a segmentation data layer was added where different body parts are identified using the pressure data from the sensors over the mattress. The data included were gathered from 60 healthy volunteers distributed among the different gathered characteristics: namely sex, weight, and height. This dataset can be used for position classification, assessing the viability of sensors placed under a mattress, and in applications regarding bedded or lying people or sleep related disorders. Full article
Show Figures

Figure 1

Data Descriptor
Proteomic Shift in Mouse Embryonic Fibroblasts Pfa1 during Erastin, ML210, and BSO-Induced Ferroptosis
Data 2023, 8(7), 119; https://doi.org/10.3390/data8070119 - 12 Jul 2023
Viewed by 433
Abstract
Ferroptosis is a unique variety of non-apoptotic cell death, driven by massive lipid oxidation in an iron-dependent manner. Since ferroptosis was introduced as a concept in 2012, it has demonstrated its essential role in the pathogenesis in neurodegenerative diseases and an important role [...] Read more.
Ferroptosis is a unique variety of non-apoptotic cell death, driven by massive lipid oxidation in an iron-dependent manner. Since ferroptosis was introduced as a concept in 2012, it has demonstrated its essential role in the pathogenesis in neurodegenerative diseases and an important role in therapy-resistant cancer cells. Thus, detailed molecular understanding of both canonical and alternative ferroptosis pathways is required. There is a set of widely used chemical agents to modulate ferroptosis using different pathway targets: erastin blocks cystine–glutamate antiporter, system xc-; ML210 directly inactivates GPX4; and L-buthionine sulfoximine (BSO) inhibits γ-glutamylcysteine synthetase, an essential enzyme for glutathione synthesis de novo. Most studies have focused on the lipidomic profiling of model systems undergoing death in a ferroptotic modality. In this study, we developed high-quality shotgun proteome sequencing during ferroptosis induction by three widely used chemical agents (erastin, ML210, and BSO) before and after 24 and 48 h of treatment. Chromato-mass spectra were registered in DDA mode and are suitable for further label-free quantification. Both processed and raw files are publicly available and could be a valuable dynamic proteome map for further ferroptosis investigation. Full article
Show Figures

Figure 1

Data Descriptor
A Semantically Annotated 15-Class Ground Truth Dataset for Substation Equipment to Train Semantic Segmentation Models
Data 2023, 8(7), 118; https://doi.org/10.3390/data8070118 - 05 Jul 2023
Viewed by 528
Abstract
The lack of annotated semantic segmentation datasets for electrical substations in the literature poses a significant problem for machine learning tasks; before training a model, a dataset is needed. This paper presents a new dataset of electric substations with 1660 images annotated with [...] Read more.
The lack of annotated semantic segmentation datasets for electrical substations in the literature poses a significant problem for machine learning tasks; before training a model, a dataset is needed. This paper presents a new dataset of electric substations with 1660 images annotated with 15 classes, including insulators, disconnect switches, transformers and other equipment commonly found in substation environments. The images were captured using a combination of human, fixed and AGV-mounted cameras at different times of the day, providing a diverse set of training and testing data for algorithm development. In total, 50,705 annotations were created by a team of experienced annotators, using a standardized process to ensure accuracy across the dataset. The resulting dataset provides a valuable resource for researchers and practitioners working in the fields of substation automation, substation monitoring and computer vision. Its availability has the potential to advance the state of the art in this important area. Full article
(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)
Show Figures

Figure 1

Data Descriptor
Assessment of Maize Silage Quality under Different Pre-Ensiling Conditions
Data 2023, 8(7), 117; https://doi.org/10.3390/data8070117 - 02 Jul 2023
Viewed by 476
Abstract
Maize silage suffers from several factors that affect the final quality and, to some extent, pre-ensiled conditions that can be potentially tuned during harvesting. After assessing new indices for silage quality under lab-scale conditions, several trials have been conducted to find associations between [...] Read more.
Maize silage suffers from several factors that affect the final quality and, to some extent, pre-ensiled conditions that can be potentially tuned during harvesting. After assessing new indices for silage quality under lab-scale conditions, several trials have been conducted to find associations between fresh maize characteristics and silage features. Among the first, we included field input levels, FAO class, maturity stage, use of bacterial inoculants, sealing delay and chemical traits, whereas, among the latter, we assessed density and porosity, pH, fermentative profile, dry matter loss and aerobic stability. The trials were conducted using vacuum bags or mini silo buckets. More than 1500 maize samples harvested in Northeast Italy were analysed during the 2016–2022 period. Moreover, to evaluate silage aerobic stability, the fermentative profile and temperature were measured 14 days after the opening of the silo. The association between silage quality and aerobic stability was assessed, and a prognostic risk score was used to calculate the probability of aerobic instability. The dataset could provide baseline information to promote the continuous improvement of maize silage management from different botanical and crop fields, thus improving agronomic and animal farm resource allocation from a precision agriculture perspective. Full article
Data Descriptor
Dataset of Linkability Networks of Ethereum Accounts Involved in NFT Trading of Top 15 NFT Collections
Data 2023, 8(7), 116; https://doi.org/10.3390/data8070116 - 28 Jun 2023
Viewed by 409
Abstract
In this paper, we present subgraphs of Ethereum wallets involved in NFT trades of the top 15 ERC721 NFT collections. To obtain the subgraphs, we have extracted the Ethereum transaction graph from a live Ethereum node and filtered out exchanges, mining pools, and [...] Read more.
In this paper, we present subgraphs of Ethereum wallets involved in NFT trades of the top 15 ERC721 NFT collections. To obtain the subgraphs, we have extracted the Ethereum transaction graph from a live Ethereum node and filtered out exchanges, mining pools, and smart contracts. For each of the selected collections, we identified the set of accounts involved in NFT trading, which we used to perform a breadth-first search in the Ethereum transaction graph to obtain a subgraph. These subgraphs can offer insight into the linkability of accounts participating in NFT trading on the Ethereum blockchain. Full article
(This article belongs to the Special Issue Blockchain Applications in Data Management and Governance)
Show Figures

Figure 1

Data Descriptor
Factory-Based Vibration Data for Bearing-Fault Detection
Data 2023, 8(7), 115; https://doi.org/10.3390/data8070115 - 28 Jun 2023
Viewed by 552
Abstract
The importance of preventing failures in bearings has led to a large amount of research being conducted to find methods for fault diagnostics and prognostics. Many of these solutions, such as deep learning methods, require a significant amount of data to perform well. [...] Read more.
The importance of preventing failures in bearings has led to a large amount of research being conducted to find methods for fault diagnostics and prognostics. Many of these solutions, such as deep learning methods, require a significant amount of data to perform well. This is a reason why publicly available data are important, and there currently exist several open datasets that contain different conditions and faults. However, one challenge is that almost all of these data come from a laboratory setting, where conditions might differ from those found in an industrial environment where the methods are intended to be used. This also means that there may be characteristics of the industrial data that are important to take into account. Therefore, this study describes a completely new dataset for bearing faults from a pulp mill. The analysis of the data shows that the faults vary significantly in terms of fault development, rotation speed, and the amplitude of the vibration signal. It also suggests that methods built for this environment need to consider that no historical examples of faults in the target domain exist and that external events can occur that are not related to any condition of the bearing. Full article
Show Figures

Figure 1

Back to TopTop