Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning
Unsupervised Deep Learning for Structural Health Monitoring
Massive Parallel Alignment of RNA-seq Reads in Serverless Computing
Blockchain Technology Adoption Barriers and Enablers for Smart and Sustainable Agriculture

Journal Description

Big Data and Cognitive Computing

Big Data and Cognitive Computing is an international, scientific, peer-reviewed, open access journal of big data and cognitive computing published quarterly online by MDPI.

Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
Journal Rank: CiteScore - Q1 (Management Information Systems)
Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 16.4 days after submission; acceptance to publication is undertaken in 3.9 days (median values for papers published in this journal in the first half of 2023).
Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.

Impact Factor: 3.7 (2022)

Imprint Information Journal Flyer Open Access ISSN: 2504-2289

Latest Articles

Open AccessArticle

Executable Digital Process Twins: Towards the Enhancement of Process-Driven Systems

and

Big Data Cogn. Comput. 2023, 7(3), 139; https://doi.org/10.3390/bdcc7030139 - 08 Aug 2023

Abstract

The development of process-driven systems and the advancements in digital twins have led to the birth of new ways of monitoring and analyzing systems, i.e., digital process twins. Specifically, a digital process twin can allow the monitoring of system behavior and the analysis of the execution status to improve the whole system. However, the concept of the digital process twin is still theoretical, and process-driven systems cannot really benefit from them. In this regard, this work discusses how to effectively exploit a digital process twin and proposes an implementation that combines the monitoring, refinement, and enactment of system behavior. We demonstrated the proposed solution in a multi-robot scenario. Full article

(This article belongs to the Special Issue Digital Twins for Complex Systems)

► Show Figures

Figure 1

Open AccessArticle

Cumulative and Rolling Horizon Prediction of Overall Equipment Effectiveness (OEE) with Machine Learning

Péter Dobra

and

János Jósvai

Big Data Cogn. Comput. 2023, 7(3), 138; https://doi.org/10.3390/bdcc7030138 - 02 Aug 2023

Abstract

Nowadays, one of the important and indispensable conditions for the effectiveness and competitiveness of industrial companies is the high efficiency of manufacturing and assembly. These enterprises based on different methods and tools systematically monitor their efficiency metrics with Key Performance Indicators (KPIs). One of these most frequently used metrics is Overall Equipment Effectiveness (OEE), the product of availability, performance and quality. In addition to monitoring, it is also necessary to predict efficiency, which can be implemented with the support of machine learning techniques. This paper presents and compares several supervised machine learning techniques amongst other polynomial regression, lasso regression, ridge regression and gradient boost regression. The aim of this article is to determine the best estimation method for semiautomatic assembly line and large batch size. The case study presented with a real industrial example gives the answer as to which of the cumulative or rolling horizon prediction methods is more accurate. Full article

► Show Figures

Figure 1

Open AccessArticle

Predicting the Price of Bitcoin Using Sentiment-Enriched Time Series Forecasting

and

Big Data Cogn. Comput. 2023, 7(3), 137; https://doi.org/10.3390/bdcc7030137 - 31 Jul 2023

Abstract

Recently, various methods to predict the future price of financial assets have emerged. One promising approach is to combine the historic price with sentiment scores derived via sentiment analysis techniques. In this article, we focus on predicting the future price of Bitcoin, which is currently the most popular cryptocurrency. More precisely, we propose a hybrid approach, combining time series forecasting and sentiment prediction from microblogs, to predict the intraday price of Bitcoin. Moreover, in addition to standard sentiment analysis methods, we are the first to employ a fine-tuned BERT model for this task. We also introduce a novel weighting scheme in which the weight of the sentiment of each tweet depends on the number of its creator’s followers. For evaluation, we consider periods with strongly varying ranges of Bitcoin prices. This enables us to assess the models w.r.t. robustness and generalization to varied market conditions. Our experiments demonstrate that BERT-based sentiment analysis and the proposed weighting scheme improve upon previous methods. Specifically, our hybrid models that use linear regression as the underlying forecasting algorithm perform best in terms of the mean absolute error (MAE of 2.67) and root mean squared error (RMSE of 3.28). However, more complicated models, particularly long short-term memory networks and temporal convolutional networks, tend to have generalization and overfitting issues, resulting in considerably higher MAE and RMSE scores. Full article

(This article belongs to the Topic Artificial Intelligence Applications in Financial Technology)

► Show Figures

Figure 1

Open AccessArticle

An Approach Based on Recurrent Neural Networks and Interactive Visualization to Improve Explainability in AI Systems

William Villegas-Ch

Joselin García-Ortiz

and

Angel Jaramillo-Alcazar

Big Data Cogn. Comput. 2023, 7(3), 136; https://doi.org/10.3390/bdcc7030136 - 31 Jul 2023

Abstract

This paper investigated the importance of explainability in artificial intelligence models and its application in the context of prediction in Formula (1). A step-by-step analysis was carried out, including collecting and preparing data from previous races, training an AI model to make predictions, and applying explainability techniques in the said model. Two approaches were used: the attention technique, which allowed visualizing the most relevant parts of the input data using heat maps, and the permutation importance technique, which evaluated the relative importance of features. The results revealed that feature length and qualifying performance are crucial variables for position predictions in Formula (1). These findings highlight the relevance of explainability in AI models, not only in Formula (1) but also in other fields and sectors, by ensuring fairness, transparency, and accountability in AI-based decision making. The results highlight the importance of considering explainability in AI models and provide a practical methodology for its implementation in Formula (1) and other domains. Full article

(This article belongs to the Special Issue Deep Network Learning and Its Applications)

► Show Figures

Figure 1

Open AccessArticle

EnviroStream: A Stream Reasoning Benchmark for Environmental and Climate Monitoring

and

Big Data Cogn. Comput. 2023, 7(3), 135; https://doi.org/10.3390/bdcc7030135 - 31 Jul 2023

Abstract

Stream Reasoning (SR) focuses on developing advanced approaches for applying inference to dynamic data streams; it has become increasingly relevant in various application scenarios such as IoT, Smart Cities, Emergency Management, and Healthcare, despite being a relatively new field of research. The current lack of standardized formalisms and benchmarks has been hindering the comparison between different SR approaches. We proposed a new benchmark, called EnviroStream, for evaluating SR systems on weather and environmental data. The benchmark includes queries and datasets of different sizes. We adopted I-DLV-sr, a recently released SR system based on Answer Set Programming, as a baseline for query modelling and experimentation. We also showcased continuous online reasoning via a web application. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)

► Show Figures

Figure 1

Open AccessArticle

Driving Excellence in Official Statistics: Unleashing the Potential of Comprehensive Digital Data Governance

Hossein Hassani

and

Steve MacFeely

Big Data Cogn. Comput. 2023, 7(3), 134; https://doi.org/10.3390/bdcc7030134 - 29 Jul 2023

Abstract

With the ubiquitous use of digital technologies and the consequent data deluge, official statistics faces new challenges and opportunities. In this context, strengthening official statistics through effective data governance will be crucial to ensure reliability, quality, and access to data. This paper presents a comprehensive framework for digital data governance for official statistics, addressing key components, such as data collection and management, processing and analysis, data sharing and dissemination, as well as privacy and ethical considerations. The framework integrates principles of data governance into digital statistical processes, enabling statistical organizations to navigate the complexities of the digital environment. Drawing on case studies and best practices, the paper highlights successful implementations of digital data governance in official statistics. The paper concludes by discussing future trends and directions, including emerging technologies and opportunities for advancing digital data governance. Full article

(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)

► Show Figures

Figure 1

Open AccessArticle

Evaluation Method of Electric Vehicle Charging Station Operation Based on Contrastive Learning

and

Big Data Cogn. Comput. 2023, 7(3), 133; https://doi.org/10.3390/bdcc7030133 - 24 Jul 2023

Abstract

This paper aims to address the issue of evaluating the operation of electric vehicle charging stations (EVCSs). Previous studies have commonly employed the method of constructing comprehensive evaluation systems, which greatly relies on manual experience for index selection and weight allocation. To overcome this limitation, this paper proposes an evaluation method based on natural language models for assessing the operation of charging stations. By utilizing the proposed SimCSEBERT model, this study analyzes the operational data, user charging data, and basic information of charging stations to predict the operational status and identify influential factors. Additionally, this study compared the evaluation accuracy and impact factor analysis accuracy of the baseline and the proposed model. The experimental results demonstrate that our model achieves a higher evaluation accuracy (operation evaluation accuracy = 0.9464; impact factor analysis accuracy = 0.9492) and effectively assesses the operation of EVCSs. Compared with traditional evaluation methods, this approach exhibits improved universality and a higher level of intelligence. It provides insights into the operation of EVCSs and user demands, allowing for the resolution of supply–demand contradictions that are caused by power supply constraints and the uneven distribution of charging demands. Furthermore, it offers guidance for more efficient and targeted strategies for the operation of charging stations. Full article

(This article belongs to the Topic Application of Big Data and Deep Learning in Engineering Analysis and Design)

► Show Figures

Figure 1

Open AccessArticle

The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters

and

Big Data Cogn. Comput. 2023, 7(3), 132; https://doi.org/10.3390/bdcc7030132 - 20 Jul 2023

Abstract

This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribed audio corpus, and the use of deep neural networks for speech modeling. A high-quality decoded audio corpus was collected, containing 554 h of data, giving an idea of the frequencies of letters and syllables, as well as demographic parameters such as the gender, age, and region of residence of native speakers. The corpus contains a universal vocabulary and serves as a valuable resource for the development of modules related to speech. Machine learning experiments were conducted using the DeepSpeech2 model, which includes a sequence-to-sequence architecture with an encoder, decoder, and attention mechanism. To increase the reliability of the model, filters initialized with symbol-level embeddings were introduced to reduce the dependence on accurate positioning on object maps. The training process included simultaneous preparation of convolutional filters for spectrograms and symbolic objects. The proposed approach, using a combination of supervised and unsupervised learning methods, resulted in a 66.7% reduction in the weight of the model while maintaining relative accuracy. The evaluation on the test sample showed a 7.6% lower character error rate (CER) compared to existing models, demonstrating its most modern characteristics. The proposed architecture provides deployment on platforms with limited resources. Overall, this study presents a high-quality audio corpus, an improved speech recognition model, and promising results applicable to speech-related applications and languages beyond Kazakh. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

Open AccessArticle

A Real-Time Vehicle Speed Prediction Method Based on a Lightweight Informer Driven by Big Temporal Data

and

Big Data Cogn. Comput. 2023, 7(3), 131; https://doi.org/10.3390/bdcc7030131 - 15 Jul 2023

Abstract

At present, the design of modern vehicles requires improving driving performance while meeting emission standards, leading to increasingly complex power systems. In autonomous driving systems, accurate, real-time vehicle speed prediction is one of the key factors in achieving automated driving. Accurate prediction and optimal control based on future vehicle speeds are key strategies for dealing with ever-changing and complex actual driving environments. However, predicting driver behavior is uncertain and may be influenced by the surrounding driving environment, such as weather and road conditions. To overcome these limitations, we propose a real-time vehicle speed prediction method based on a lightweight deep learning model driven by big temporal data. Firstly, the temporal data collected by automotive sensors are decomposed into a feature matrix through empirical mode decomposition (EMD). Then, an informer model based on the attention mechanism is designed to extract key information for learning and prediction. During the iterative training process of the informer, redundant parameters are removed through importance measurement criteria to achieve real-time inference. Finally, experimental results demonstrate that the proposed method achieves superior speed prediction performance through comparing it with state-of-the-art statistical modelling methods and deep learning models. Tests on edge computing devices also confirmed that the designed model can meet the requirements of actual tasks. Full article

► Show Figures

Figure 1

Open AccessArticle

A Guide to Data Collection for Computation and Monitoring of Node Energy Consumption

and

Big Data Cogn. Comput. 2023, 7(3), 130; https://doi.org/10.3390/bdcc7030130 - 11 Jul 2023

Abstract

The digital transition that drives the new industrial revolution is largely driven by the application of intelligence and data. This boost leads to an increase in energy consumption, much of it associated with computing in data centers. This fact clashes with the growing need to save and improve energy efficiency and requires a more optimized use of resources. The deployment of new services in edge and cloud computing, virtualization, and software-defined networks requires a better understanding of consumption patterns aimed at more efficient and sustainable models and a reduction in carbon footprints. These patterns are suitable to be exploited by machine, deep, and reinforced learning techniques in pursuit of energy consumption optimization, which can ideally improve the energy efficiency of data centers and big computing servers providing these kinds of services. For the application of these techniques, it is essential to investigate data collection processes to create initial information points. Datasets also need to be created to analyze how to diagnose systems and sort out new ways of optimization. This work describes a data collection methodology used to create datasets that collect consumption data from a real-world work environment dedicated to data centers, server farms, or similar architectures. Specifically, it covers the entire process of energy stimuli generation, data extraction, and data preprocessing. The evaluation and reproduction of this method is offered to the scientific community through an online repository created for this work, which hosts all the code available for its download. Full article

(This article belongs to the Special Issue Energy-Efficient IoT (Internet of Things) and Big Data Challenges for Connected Intelligence)

► Show Figures

Figure 1

Open AccessArticle

An End-to-End Online Traffic-Risk Incident Prediction in First-Person Dash Camera Videos

Hilmil Pradana

Big Data Cogn. Comput. 2023, 7(3), 129; https://doi.org/10.3390/bdcc7030129 - 06 Jul 2023

Abstract

Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for the driver to react safely and accurately while perceiving the diversity of traffic-risk predictions in real-world applications. In this paper, we aim to bridge the gap by investigating two key research questions regarding the driver’s current status of driving through online videos and the types of other moving objects that lead to dangerous situations. To address these problems, we proposed an end-to-end two-stage architecture: in the first stage, unsupervised learning is applied to collect all suspicious events on actual driving; in the second stage, supervised learning is used to classify all suspicious event results from the first stage to a common event type. To enrich the classification type, the metadata from the result of the first stage is sent to the second stage to handle the data limitation while training our classification model. Through the online situation, our method runs

9.60

fps on average with

1.44

fps on standard deviation. Our quantitative evaluation shows that our method reaches 81.87% and 73.43% for the average F1-score on labeled data of CST-S3D and real driving datasets, respectively. Furthermore, the proposed method has the potential to assist distribution companies in evaluating the driving performance of their driver by automatically monitoring near-miss events and analyzing driving patterns for training programs to reduce future accidents. Full article

(This article belongs to the Special Issue Deep Network Learning and Its Applications)

► Show Figures

Figure 1

Open AccessArticle

Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study

and

Big Data Cogn. Comput. 2023, 7(3), 128; https://doi.org/10.3390/bdcc7030128 - 04 Jul 2023

Cited by 2

Abstract

Plant taxonomy is the scientific study of the classification and naming of various plant species. It is a branch of biology that aims to categorize and organize the diverse variety of plant life on earth. Traditionally, plant taxonomy has been performed using morphological and anatomical characteristics, such as leaf shape, flower structure, and seed and fruit characters. Artificial intelligence (AI), machine learning, and especially deep learning can also play an instrumental role in plant taxonomy by automating the process of categorizing plant species based on the available features. This study investigated transfer learning techniques to analyze images of plants and extract features that can be used to cluster the species hierarchically using the k-means clustering algorithm. Several pretrained deep learning models were employed and evaluated. In this regard, two separate datasets were used in the study comprising of seed images of wild plants collected from Egypt. Extensive experiments using the transfer learning method (DenseNet201) demonstrated that the proposed methods achieved superior accuracy compared to traditional methods with the highest accuracy of 93% and F1-score and area under the curve (AUC) of 95%, respectively. That is considerable in contrast to the state-of-the-art approaches in the literature. Full article

(This article belongs to the Special Issue Recent Advances in Deep Transfer Learning Applications for Image Processing Problems and Big Data)

► Show Figures

Figure 1

Open AccessArticle

Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Evaluation

and

Mamoun Masoud Abdulqader

Big Data Cogn. Comput. 2023, 7(3), 127; https://doi.org/10.3390/bdcc7030127 - 03 Jul 2023

Abstract

YouTube is a popular video-sharing platform that offers a diverse range of content. Assessing the quality of a video without watching it poses a significant challenge, especially considering the recent removal of the dislike count feature on YouTube. Although comments have the potential to provide insights into video content quality, navigating through the comments section can be time-consuming and overwhelming work for both content creators and viewers. This paper proposes an NLP-based model to classify Arabic comments as positive or negative. It was trained on a novel dataset of 4212 labeled comments, with a Kappa score of 0.818. The model uses six classifiers: SVM, Naïve Bayes, Logistic Regression, KNN, Decision Tree, and Random Forest. It achieved 94.62% accuracy and an MCC score of 91.46% with NB. Precision, Recall, and F1-measure for NB were 94.64%, 94.64%, and 94.62%, respectively. The Decision Tree had a suboptimal performance with 84.10% accuracy and an MCC score of 69.64% without TF-IDF. This study provides valuable insights for content creators to improve their content and audience engagement by analyzing viewers’ sentiments toward the videos. Furthermore, it bridges a literature gap by offering a comprehensive approach to Arabic sentiment analysis, which is currently limited in the field. Full article

► Show Figures

Figure 1

Open AccessArticle

Industrial Insights on Digital Twins in Manufacturing: Application Landscape, Current Practices, and Future Needs

Rosario Davide D’Amico

Sri Addepalli

and

John Ahmet Erkoyuncu

Big Data Cogn. Comput. 2023, 7(3), 126; https://doi.org/10.3390/bdcc7030126 - 29 Jun 2023

Abstract

The digital twin (DT) research field is experiencing rapid expansion; yet, the research on industrial practices in this area remains poorly understood. This paper aims to address this knowledge gap by sharing feedback and future requirements from the manufacturing industry. The methodology employed in this study involves an examination of a survey that received 99 responses and interviews with 14 experts from 10 prominent UK organisations, most of which are involved in the defence industry in the UK. The survey and interviews explored topics such as DT design, return on investment, drivers, inhibitors, and future directions for DT development in manufacturing. This study’s findings indicate that DTs should possess characteristics such as adaptability, scalability, interoperability, and the ability to support assets throughout their entire life cycle. On average, completed DT projects reach the breakeven point in less than two years. The primary motivators behind DT development were identified to be autonomy, customer satisfaction, safety, awareness, optimisation, and sustainability. Meanwhile, the main obstacles include a lack of expertise, funding, and interoperability. This study concludes that the federation of twins and a paradigm shift in industrial thinking are essential components for the future of DT development. Full article

(This article belongs to the Special Issue Digital Twins for Complex Systems)

► Show Figures

Figure 1

Open AccessSystematic Review

Determining the Factors Influencing Business Analytics Adoption at Organizational Level: A Systematic Literature Review

and

Big Data Cogn. Comput. 2023, 7(3), 125; https://doi.org/10.3390/bdcc7030125 - 28 Jun 2023

Abstract

The adoption of business analytics (BA) has become increasingly important for organizations seeking to gain a competitive edge in today’s data-driven business landscape. Hence, understanding the key factors influencing the adoption of BA at the organizational level is decisive for the successful implementation of these technologies. This paper presents a systematic literature review that utilizes the PRISMA technique to investigate the organizational, technological, and environmental factors that affect the adoption of BA. By conducting a thorough examination of pertinent research, this review consolidates the current understanding and pinpoints essential elements that shape the process of adoption. Out of a total of 614 articles published between 2012 and 2022, 29 final articles were carefully chosen. The findings highlight the significance of organizational factors, technological factors, and environmental factors in shaping the adoption of the BA process. By consolidating and analyzing the current body of research, this paper offers valuable insights for organizations aiming to adopt BA successfully and maximize their benefits at the organizational level. The synthesized findings also contribute to the existing literature and provide a foundation for future research in this field. Full article

► Show Figures

Figure 1

Open AccessArticle

Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students

and

Big Data Cogn. Comput. 2023, 7(3), 124; https://doi.org/10.3390/bdcc7030124 - 27 Jun 2023

Abstract

Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and

N = 159

high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them. Full article

(This article belongs to the Special Issue Feature-Rich Artificial Intelligence Models and Applications of Cognition)

► Show Figures

Figure 1

Open AccessArticle

A New Big Data Processing Framework for the Online Roadshow

Kang-Ren Leow

Meng-Chew Leow

and

Lee-Yeng Ong

Big Data Cogn. Comput. 2023, 7(3), 123; https://doi.org/10.3390/bdcc7030123 - 27 Jun 2023

Abstract

The Online Roadshow, a new type of web application, is a digital marketing approach that aims to maximize contactless business engagement. It leverages web computing to conduct interactive game sessions via the internet. As a result, massive amounts of personal data are generated during the engagement process between the audience and the Online Roadshow (e.g., gameplay data and clickstream information). The high volume of data collected is valuable for more effective market segmentation in strategic business planning through data-driven processes such as web personalization and trend evaluation. However, the data storage and processing techniques used in conventional data analytic approaches are typically overloaded in such a computing environment. Hence, this paper proposed a new big data processing framework to improve the processing, handling, and storing of these large amounts of data. The proposed framework aims to provide a better dual-mode solution for processing the generated data for the Online Roadshow engagement process in both historical and real-time scenarios. Multiple functional modules, such as the Application Controller, the Message Broker, the Data Processing Module, and the Data Storage Module, were reformulated to provide a more efficient solution that matches the new needs of the Online Roadshow data analytics procedures. Some tests were conducted to compare the performance of the proposed frameworks against existing similar frameworks and verify the performance of the proposed framework in fulfilling the data processing requirements of the Online Roadshow. The experimental results evidenced multiple advantages of the proposed framework for Online Roadshow compared to similar existing big data processing frameworks. Full article

► Show Figures

Figure 1

Open AccessArticle

Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network

and

Big Data Cogn. Comput. 2023, 7(3), 122; https://doi.org/10.3390/bdcc7030122 - 21 Jun 2023

Abstract

Automated scoring systems have been revolutionized by natural language processing, enabling the evaluation of students’ diverse answers across various academic disciplines. However, this presents a challenge as students’ responses may vary significantly in terms of length, structure, and content. To tackle this challenge, this research introduces a novel automated model for short answer grading. The proposed model uses pretrained “transformer” models, specifically T5, in conjunction with a BI-LSTM architecture which is effective in processing sequential data by considering the past and future context. This research evaluated several preprocessing techniques and different hyperparameters to identify the most efficient architecture. Experiments were conducted using a standard benchmark dataset named the North Texas Dataset. This research achieved a state-of-the-art correlation value of 92.5 percent. The proposed model’s accuracy has significant implications for education as it has the potential to save educators considerable time and effort, while providing a reliable and fair evaluation for students, ultimately leading to improved learning outcomes. Full article

(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)

► Show Figures

Figure 1

Open AccessArticle

The Value of Web Data Scraping: An Application to TripAdvisor

Gianluca Barbera

Luiz Araujo

and

Silvia Fernandes

Big Data Cogn. Comput. 2023, 7(3), 121; https://doi.org/10.3390/bdcc7030121 - 21 Jun 2023

Abstract

Social Media Analytics (SMA) is more and more relevant in today’s market dynamics. However, it is necessary to use it wisely, either in promoting any kind of product/brand, or interacting with customers. This requires its effective understanding and monitoring. One way is through web data scraping (WDS) tools that allow to select sites and platforms to compare them in their performances. They can optimize extraction of big data published on social media. Due to current challenges, a sector that can particularly take advantage of this source is tourism (and its related sectors). This year has the hope of tourism’s revival after a pandemic whose impacts are still affecting several activities. Many traders and entrepreneurs have already used these versatile tools. However, do they really know their potential? The present study highlights the use of WDS to collect data from TripAdvisor’s social pages. Besides comparing competitors’ performance, companies also gain new knowledge of unnoticed preferences/habits. This contributes to more interesting innovations and results for them and for their customers. The approach used here is based on a project for smart tourism consultancy, from the identification of a gap in our region, to aid tourism organizations to enhance their digital presence and business model. Many things can be detected in this big source of unstructured data very quickly and easily without programming. Moreover, exploring code, either to refine the web scraper or connect it with other platforms/apps, can be an object of future research to leverage consumer behavior prediction for more advanced interactions. Full article

(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)

► Show Figures

Figure 1

Open AccessArticle

YOLO-v5 Variant Selection Algorithm Coupled with Representative Augmentations for Modelling Production-Based Variance in Automated Lightweight Pallet Racking Inspection

Muhammad Hussain

Big Data Cogn. Comput. 2023, 7(2), 120; https://doi.org/10.3390/bdcc7020120 - 14 Jun 2023

Abstract

The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement of racking datasets can be a difficult task. Therefore, the first contribution of this study was the proposal of several tailored augmentations that were generated based on modelling production floor conditions/variances within warehouses. Secondly, the variant selection algorithm was proposed, starting with extreme-end analysis and providing a protocol for selecting the optimal architecture with respect to accuracy and computational efficiency. The proposed YOLO-v5n architecture generated the highest [email protected] of 96.8% compared to previous works in the racking domain, with a computational footprint in terms of the number of parameters at its lowest, i.e., 1.9 M compared to YOLO-v5x at 86.7 M. Full article

► Show Figures