Forthcoming Articles
International Journal of Information and Communication Technology

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.
Register for our alerting service, which notifies you by email when new issues are published online.
International Journal of Information and Communication Technology (70 papers in press) Regular Issues
Abstract: This research tackles the persistent challenge of uncontrolled approximation errors and unreliable convergence in neural network-based methods for solving partial differential equations. We introduce a novel error-controlled implicit neural representation framework, which incorporates a trainable error indicator network and an adaptive weighting scheme to dynamically steer the optimisation process. Our approach utilises a dual-encoding architecture to represent physical fields with high fidelity and a cooperative training mechanism that iteratively estimates and reduces local errors. Experimental validation on a national aeronautics and space administration turbulent flat-plate boundary layer benchmark demonstrates that error-controlled implicit neural representation achieves a relative L2 error of 8.73 x 104, outperforming the best existing baseline by 42.6% and improving boundary-layer accuracy by 52.0%. Moreover, the proposed method reduces training time by 34.7%55.2% while maintaining physically consistent solutions, confirming its efficacy and efficiency in error-aware numerical simulation. Keywords: partial differential equations; PDEs; adaptive optimisation; numerical simulation; implicit neural representation; INR. DOI: 10.1504/IJICT.2026.10078359
Abstract: Academic pressure has a significant impact on students physical and mental health and academic performance. Understanding its dynamic formation mechanism is crucial for effective intervention. This paper proposes an innovative framework that combines causal discovery with time series modeling to simulate the development path of academic pressure. The framework first uses the systematic causal discovery algorithm to learn the directional acyclic diagram to present the causal correlation between key factors; then builds a causal constraint time series model to simulate the dynamic evolution process of student pressure. Based on the comprehensive longitudinal data evaluation of 300 students, compared with the traditional benchmark model, this model notably increases the average accuracy of pressure precursor recognition by 7.4% and effectively reduces the trajectory simulation error by 28.4%. This research finding provides operational insights for early warning of stress and personalized intervention strategies. Keywords: academic pressure; simulation; causal discovery; time series modeling; oriented acyclic diagram. DOI: 10.1504/IJICT.2026.10078360
Abstract: To solve the problems of modal heterogeneity, temporal asynchrony and cognitive adaptation imbalance in multimodal real-time interaction, a CLT-driven multi-modal real-time fusion architecture was proposed. Experimental verification on HoloAssist dataset shows that the interactive intention prediction accuracy of the proposed architecture reaches 95.2% +- 1.3%, which is 3.5 percentage points higher than that of AlignMamba model. The end-to-end delay is 0.18 s +- 0.02 s, and the alignment delay is as low as 0.028 s. The subjective score of cognitive load was 3.2 +- 0.8, which was significantly better than the baseline model. Ablation experiments confirm that each core module is crucial to performance improvement, and the model has excellent robustness in scenarios with modal loss and noise interference. This research provides support for the implementation of real-time multimodal interaction technology. Keywords: multi-modal fusion; cognitive load theory; optimal transmission; real-time human-computer interaction; adaptive weight. DOI: 10.1504/IJICT.2026.10078361
Abstract: Digital English learning environments generate massive interaction data, offering potential for adaptive learning path optimisation. However, many existing approaches treat learning state estimation and learning path recommendation as separate tasks, restricting long-term personalised learning support. This study proposes a framework that integrates a multi-factor fusion knowledge tracing (MFFKT) model with reinforcement learning. It jointly analyses behaviour sequences, item attributes, knowledge structures and temporal features to dynamically capture learners knowledge states, which serve as environment states for long-term reward-driven path optimisation. Experiments on ASSISTments 2017 and EdNet-KT4 show that MFFKT achieves AUC scores of 0.834 and 0.812, surpassing baseline models. Ablation studies validate the efficacy of multi-dimensional feature fusion. When combined with conservative Q-learning, the methods outperform greedy, rule based, and random strategies in cumulative reward, completion rate, and efficiency. Overall, the proposed framework enables coordinated modelling of learning states and learning path decisions, providing an effective technical approach for adaptive and personalised English learning within digital learning environments. Keywords: multi-factor fusion knowledge tracing; MFFKT; RL; digital English learning; personalisation; sequential decision-making. DOI: 10.1504/IJICT.2026.10078362
Abstract: Detecting subtle tampering traces within complex backgrounds remains a significant challenge in image copy-move forgery detection, primarily due to the inadequacy of shallow feature extraction. To overcome these limitations, this paper proposes an enhanced DeepLabV3+ architecture designed for efficient multi-scale feature fusion. The framework utilises a lightweight MobileNetV3 backbone within an encoder-decoder structure, integrated with an improved atrous spatial pyramid pooling (ASPP) module employing depthwise separable dilated convolutions. To strictly preserve low-level details, we introduce a dual-branch shallow feature enhancement module (dual-branch SFEM) augmented by efficient channel attention (ECA). Furthermore, the feature fusion stage is optimised through architectural restructuring to reduce computational complexity while maintaining performance. A key innovation is the inclusion of a lightweight gating network that generates spatially adaptive weights, dynamically balancing the trade-off between semantic abstraction and detail preservation. Extensive experiments on the CASIA, DEFACTO, and COVERAGE datasets demonstrate the models superiority over state-of-the-art methods. Specifically, the proposed method achieves an AUC of 95.41% and an F1 score of 77.24% on the DEFACTO dataset, while exhibiting robust generalisation capabilities on CASIA 1.0 (AUC: 78.93%, F1: 57.68%). Keywords: image forgery detection; gated fusion; efficient channel attention mechanism; dual-branch shallow feature enhancement module; DB-SFEM. DOI: 10.1504/IJICT.2026.10078363
Abstract: Aiming at the problems of semantic distortion and over-correction that often occur in the text error correction of English learners by existing generative models, this paper accordingly proposes a novel generative adversarial network method integrating grammatical rule constraints (generative adversarial networks with grammatical rule constraints). By introducing formal grammatical knowledge as a flexible constraint into the network training process, the model is effectively guided to correct errors while better maintaining the original meaning and overall fluency of sentences. Experiments conducted on the public learner corpus show that this method significantly increases the error correction accuracy by 12% and effectively reduces the number of over-correction cases by 16%. The research thereby provides an effective way to solve the persistent balance problem between accuracy and naturalness in automatic grammar correction. Keywords: English interlanguage correction; generative adversarial network; grammatical rule constraints; semantic retention; excessive correction. DOI: 10.1504/IJICT.2026.10078364
Abstract: In order to solve the problem of how to integrate visual content and semantic information into oil paintings well, this paper puts forward an emotion recognition model for oil paintings based on a multimodal adaptive deep network. Visual and textual information are handled with a two-path system in the model; it gets deep visual features out of paintings and contextual semantic features from connected texts. Adaptive feature fusion module is created to adaptively adjust the fusion weights of different modality features by using cross-modal attention and gating mechanisms. On the ArtEmis oil painting dataset, the experiment shows that the proposed model has achieved 76.8% accuracy in discrete emotion classification task and 0.319 RMSE in continuous emotion dimension prediction. Compared with the basic model, it has better classification accuracy, which proves the validity of the adaptive fusion mechanism in the analysis of multimodal art emotions. Keywords: emotion recognition; oil painting analysis; multimodal learning; adaptive fusion. DOI: 10.1504/IJICT.2026.10078365
Abstract: This study proposes a unified framework that jointly models personalised learning path recommendation and knowledge tracing to improve individualised learning support in large-scale online education. The framework integrates learners knowledge states, prerequisite relationships, learning load, and preferences within a single space, enabling dynamic tracking and coordinated optimisation. An online-updatable knowledge tracing model captures mastery levels, which inform a scoring and recommendation mechanism that adapts as knowledge states evolve. Experiments on the EdNet KT1 dataset show the proposed model achieves superior prediction accuracy and lower mean absolute error than recent baselines, with reduced parameters and training time. This approach balances predictive performance and computational efficiency, offering a practical solution for personalised learning support. Keywords: large-scale online education; knowledge tracing; personalised learning path recommendation; deep learning; educational data mining. DOI: 10.1504/IJICT.2026.10078366
Abstract: Addressing the urgent need for cross-language text translation quality assessment, this paper proposes a neural network-based model for evaluating English-Chinese translation quality. Current widely adopted automated evaluation methods exhibit significant limitations in handling specialised terminology and nuanced semantics, particularly when addressing culture-specific concepts. The neural network model constructed in this study integrates deep semantic representation with contextual correlation analysis, achieving remarkable results on the Chinese-English test set of the public WMT 2020 metrics shared task dataset. It achieved a core correlation metric (Pearsons r) of 0.682, along with a multi-dimensional classification evaluation (macro-F1) of 0.689 and a ranking quality metric (normalised discounted cumulative gain @10) of 0.927, comprehensively outperforming mainstream baseline models. This model provides a reliable technical tool for cross-language text quality control. Keywords: neural network; translation quality assessment; cross-language application. DOI: 10.1504/IJICT.2026.10078367
Abstract: Some people are concerned about the protection of students privacy, the authenticity of the materials used in the courses, and the safety of students communicating with one another in virtual learning settings. The growing number of educational institutions offers online degree programs. There is a single point of failure in older systems, which could be exploited to alter academic records, gain unauthorised access to systems, and cause problems. It is caused by the fact that earlier systems were designed to function well with centralised architecture. BESDL, which stands for blockchain-enabled secure distance learning, is one approach that could address these issues. The blockchain technologys permanent record and decentralised consensus make data more trustworthy, open, and dependable than it would otherwise be. The three primary components of this system are the secure content-based access control (SCBAC), the decentralised identity management (DIM), and the encrypted content delivery (ECD) protocols, which collaborate to safeguard educational resources. According to the test findings, BESDL enhances system security, maintains examinable academic records, and accelerates the checking process. To summarise, BESDL is not only dependable but also flexible, making it an excellent option for future higher education institutions that will combine online learning. Keywords: blockchain; distance learning; higher education; smart contracts; secure authentication. DOI: 10.1504/IJICT.2026.10078368
Abstract: The psychological stress problems among college students are increasingly prominent, requiring efficient and objective identification methods. However, existing real-time systems struggle to balance accuracy with processing speed and lack deep integration of multi-source information (such as expressions, voices, and texts). This study proposes a real-time recognition system based on faster robust bidirectional encoder representations from transformers multimodal fusion, significantly improving computing efficiency through an innovative lightweight fusion mechanism. Experiments on public datasets show the system achieves 86.5% accuracy in stress recognition, significantly improving on traditional methods (e.g., 73.2% for single-modal convolutional neural network). Its inference speed meets real time requirements (30 fps), with the key area under the curve indicator increasing to 0.91 (from 0.82). This study provides an effective approach for non-intrusive, real-time environments. Keywords: psychological stress; multimodal fusion; real-time system; mental health. DOI: 10.1504/IJICT.2026.10078371
Abstract: The extraction efficiency of natural antioxidants is influenced by the dynamic coupling of multiple factors, and traditional static optimisation methods are unable to cope with real-time disturbances. This study proposes a dynamic optimisation framework based on multi-agent simulation, which realises real-time precise control of the extraction process by simulating the autonomous decisions and collaboration of agents such as solvents, temperature, and equipment. The experiment uses an open dataset of antioxidant kinetics for verification. Compared with traditional methods, this method increases the extraction rate by an average of 12.5%, and the improvements in key indicators (area under the curve, normalised discounted cumulative gain) have passed statistical significance tests (p < 0.05), providing a new idea for solving the dynamic optimisation problem in food processing processes. Keywords: multi-agent simulation; dynamic optimisation; process control; antioxidant extraction. DOI: 10.1504/IJICT.2026.10078373
Abstract: n response to the three major challenges in AI music generation limited chord representation, monotonous emotions, and low audio fidelity this research proposes a novel end-to-end framework termed PEMF that integrates PerformanceNet with a multi-emotion music generation model. The core innovations include a structured four-dimensional chord encoding method using root, third, fifth, and crown notes to expand harmonic diversity to 60 chord types, a dual-encoding transformer architecture that independently processes melody and chord streams for superior structural coherence, and a fine-grained emotion regulation mechanism mapping pitch histograms and rhythm density parameters to Russells two-dimensional emotion space for continuous control. For audio synthesis, an asymmetric U-net structure combined with a multi-band residual learning mechanism and a flooding loss strategy significantly enhances spectral fidelity and training stability. Experimental results demonstrate that PEMF achieves a chord vocabulary coverage near 1.0, an emotion recognition accuracy of 92.3% significantly outperforming symbolic transformers 78.6%, a high-frequency energy retention rate of 89.1%, and a Frechet audio distance of 0.5. System performance shows a 36.9% improvement in emotional consistency and a 64.2% reduction in latency compared to staged training, validating its efficacy in practical applications like music therapy and film scoring. Keywords: PerformanceNet; multi-emotional music generation model; emotional regulation; audio synthesis. DOI: 10.1504/IJICT.2026.10078376
Abstract: Focused on calligraphy, this research addresses style transfer distortion and inadequate compositional aesthetics in AI-generated art. We propose an algorithm that integrates decoupling representation learning with content-aware layout modelling. A dual-encoder architecture separates character structure and brushstroke style features, enabling precise and controllable style transfer via dynamic instance normalisation. A visual-linguistic bimodal network with hierarchical spatial modules is introduced to model relationships at the character, line, and global levels. The proposed method achieves a style similarity of 0.751, a content preservation PSNR of 36.9 dB, and an 8%-16% improvement in cross-font generalisation accuracy on unseen characters. For layout generation, the framework maintains a line-spacing fluctuation coefficient of 0.032, achieves a layout aesthetics score of 4.8, and demonstrates strong long-text stability with a cross-page style consistency of 0.94. Ablation studies further confirm the effectiveness of the dynamic weight adjustment mechanism, achieving an optimisation efficiency of 0.98. This work addresses key technical bottlenecks in digital calligraphy generation, providing a practical tool for cultural heritage preservation and a transferable framework for other structured art generation tasks, thereby advancing the integration of artificial intelligence with traditional arts. Keywords: decoupled representation learning; content-aware; generative adversarial networks; GANs; digital art generation; deep learning. DOI: 10.1504/IJICT.2026.10078377
Abstract: To enhance pop music creation, this study proposes an automatic accompaniment generation method combining sliding window technology with the MuseFlow model. The sliding window segments long music sequences into short-time overlapping frames, balancing time and frequency resolution to capture local signal characteristics. MuseFlow employs an enhanced bidirectional mapping architecture and training objectives to accurately model complex relationships in multi-track music data. Experimental results show that MuseFlow achieves Frechet inception distance (FID) scores of 26.3 on the POP909 dataset and 25.4 on the FreeMidi dataset, significantly outperforming baseline models. These findings demonstrate that the proposed method generates high-quality, diverse accompaniments compatible with main melodies, providing an efficient tool for music creators. Keywords: MuseFlow; sliding windows; SWs; popular music accompaniment; STFT; audio quality; bass track generation; multi-track coordination. DOI: 10.1504/IJICT.2026.10078378
Abstract: Music-driven dance generation can effectively improve the efficiency and popularity of artistic creation, but existing generation methods have problems such as insufficient dance-music correlation, poor stability of long sequence generation, and inconsistent styles. Therefore, a novel dance generation and completion framework that integrates improved transformer and style consistency control is proposed. This framework first constructs a bidirectional attention mechanism cross-modal generation model, enhances the correlation between dance and music through bidirectional interaction perception between music and action modalities, and adopts a planned sampling strategy to alleviate exposure bias in autoregressive generation. By extracting and integrating music features, key action features, and global dance style features, the completed dance segments ensure consistency in music synchronisation and overall style. Experiments showed that the generative model significantly outperformed mainstream comparison models in Frechet distance (25.7), beat coverage (59.7%), hit rate (52.4%), and diversity metrics. The complementary model achieved a style classification accuracy of 95.4% and a style retention rate of 90.2% in dance completion tasks. From this, the model proposed by the research can effectively improve the correlation and style consistency of generated dance, and promote the popularisation of art. Keywords: transformer; bidirectional attention; BA; style consistency; music-driven dance generation; controllable dance. DOI: 10.1504/IJICT.2026.10078390 Abstract: To enhance the utilisation efficiency of interdisciplinary learning resources in cultivating innovative talents, this study proposes a fusion generative model integrating a bi-directional generative adversarial network (Bi-GAN) with a deep contrastive clustering network (DCCN). The model integrates multi-domain curriculum resource features via attention mechanism, uses Bi-GAN for feature analysis and enhancement, and finally applies DCCN to cluster and serialise resources into a coherent learning path. Experimental results show that: The silhouette coefficient, normalised mutual information, adjusted rand index, and path coherence score of the proposed model on the test set reach 0.36, 0.72, 0.56, and 0.78, respectively. Compared with the best results in the baseline methods, the proposed model achieves relative improvements of 24.1%, 10.8%, 14.3%, and 16.4% in SC, NMI, ARI, and PCS, respectively. Overall, the proposed model effectively realises deep cross-domain knowledge integration and coherent learning path generation, and provides a solution for personalised educational resource organisation. Keywords: two-way generation countermeasure network; deep contrast clustering; cross-domain resource integration; learning path generation; cultivation of innovative talents. DOI: 10.1504/IJICT.2026.10078416
Abstract: In the global wave of digital transformation in manufacturing, the in-depth exploration of the AI empowerment mechanism of lighthouse factories, as industry benchmarks, holds significant importance. This study innovatively analyses the AI empowerment mechanism of lighthouse factories from the perspective of complex systems theory. By constructing a theoretical framework based on the characteristics of complex systems, including multi-scale and nonlinear interactions, it derives dynamic evolution equations to accurately depict the complex operational state of lighthouse factories. The research findings show that AI empowerment significantly enhances key performance indicators of lighthouse factories, such as reducing order delivery cycles by 46.4% and improving energy efficiency by 23.6%. Additionally, it clarifies the phase transition points of complex systems, providing critical guidance for corporate digital transformation. This study provides a solid theoretical basis and practical solutions for manufacturing to achieve intelligent upgrades through AI, helping companies enhance their competitiveness in a complex and ever-changing market environment. Keywords: lighthouse factory; AI enabled; complex systems theory; digital transformation. DOI: 10.1504/IJICT.2026.10078435
Abstract: Faced with the challenges of high cost, long cycle and insufficient innovation in traditional art design, this paper devotes itself to developing an integrated solution of intelligent generation and interactive display. By building a knowledge-enhanced multi-view diffusion interaction model (KEMDIM), this solution effectively integrates domain knowledge graphs to enhance semantic understanding, multi-view diffusion generation to ensure geometric consistency, and metaverse dynamic rendering technology to support real-time interaction. The verification of this method on data sets such as WikiArt shows that the generation quality FID index of this method is reduced to 12.1 (an increase of 19.7%), the PSNR reaches 34.8, the user experience satisfaction score is 4.6 points, and the multi-view alignment error is reduced by 32%. The results show that the framework significantly improves the accuracy and immersion of digital generation of traditional art. Although it has limitations such as relying on the completeness of knowledge base and high demand for computing resources, it provides a new technology path for intelligent protection and innovative design of cultural heritage, and its multi-modal fusion mechanism and scalability have important practical value in promoting the development of AI-empowered traditional processes. Keywords: immersive experience; traditional art; intelligent generation; interaction. DOI: 10.1504/IJICT.2026.10078436
Abstract: In the context of the current inefficiency in mobile learning application interface design and the difficulty in adapting to diverse user scenarios, this study explores ways to enhance UI generation effectiveness and user experience through automation technology. By transforming the principles of environmental interaction and dynamic cognition emphasised in situational cognition theory into computable neural network components, an encoder-decoder model based on CNN and transformer is constructed. This model introduces two-dimensional spatial position encoding in the encoder to simulate users spatial perception of interface layout, and utilises an attention mechanism in the decoder to achieve dynamic adaptation to different task scenarios. Experiments show that this method achieves a BLEU-4 score of 82.4% on the RICO dataset, with the edit distance reduced to 7.1. Furthermore, its performance degradation is minimal after adding noise and blur interference, demonstrating good robustness. In practicality evaluation, the generated interface received a comprehensive score of 4.23 from participants, especially receiving the highest recognition among the mobile learning teacher group. The method proposed in this paper effectively achieves accurate and stable generation from high-fidelity images to interface trees, providing a solution with both theoretical guidance and practical value for the automated design of mobile learning UIs. Keywords: situational cognition; mobile learning; UI design; development. DOI: 10.1504/IJICT.2026.10078450
Abstract: In current intelligent design systems, text prompt word optimisation is a key challenge to improve the quality of image generation. Aiming at the problems of uncertain direction of prompt words and difficult quality evaluation, this paper develops an adaptive co-creation based on AIGC and reinforcement learning. The model adopts a three-stage training framework: first, supervised fine-tuning of the mapping relationship of prompt word pairs is performed, then multi-modal visual feedback of PickScore and aesthetic value models is fused through reward modelling, and finally reinforcement learning is used to fine-tune and optimise the generation strategy. The model performs well in many indexes. In the performance test, the FID of the model decreases to 15.2 +- 1. 5 and the IS increases to 8.9 +- 0. 6. In the robustness test, the FID of the model is 17.5 +- 1. 7 under the noise prompt. In addition, in the practical test, the overall user satisfaction reaches 4.4 +- 0. 3. These results show that the model realises the collaborative optimisation of prompt words and image generation through adaptive mechanism, and provides an efficient end-to-end optimisation scheme. However, the model still needs to be further refined. Therefore, in future research directions, it is necessary to focus on reducing complexity and enhancing real-time feedback integration. Keywords: artificial intelligence generated content; AIGC; reinforcement learning; adaptive; co-creation model. DOI: 10.1504/IJICT.2026.10078495
Abstract: Public crisis events surface across text, sensors, imagery, and logs, yet single-source detectors miss early weak cues. To address fragmented evidence, this study presents a multimodal bidirectional transformer for crisis recognition. First, source-aware tokens preserve time, space, provenance, and quality while synchronisation gates align asynchronous streams. Then, cross-source attention separates corroboration from dissent and memory tokens retain long-range hints. Finally, self-supervised pretraining and calibrated classification deliver auditable alerts. On composite streams, the method reaches AUPRC 0.612, AUROC 0.915, F1 0.672, ECE 0.038, and average lead time 31.6 minutes, exceeding the best baseline by 7.9 AUPRC points, 3.2 AUROC points, 6.9 F1 points, and 8.5 minutes. These gains provide earlier, more reliable, and well-calibrated alerts for public response. Keywords: event identification; multimodal data; bidirectional transformer; spatiotemporal alignment.
Abstract: In response to the problems of homogenisation in rural tourism development and inefficient resource allocation, this study has designed a rural tourism development path optimisation method that integrates geographic information system and deep learning. Firstly, multiple sources of spatial data are integrated, and convolutional neural networks are used to automatically predict the potential of rural tourism development. Subsequently, using the potential spatial distribution as input, a mathematical model for path optimisation is constructed, and an improved deep reinforcement learning method is employed, incorporating local operators to perform neighbourhood search and iteratively improve the initial solution. The experiments show that the net benefit value achieved by the proposed method on the standard test set is 44.05, and the solution time is only 3.14 seconds, which is significantly better than the comparison algorithms, providing an effective solution for the optimisation of rural tourism development paths. Keywords: rural tourism; optimisation of development path; deep learning; deep reinforcement learning; neighbourhood search.
Abstract: The free movement of sound sources and listeners in immersive virtual reality poses a significant challenge for dynamic sound field reconstruction. This study proposes a real-time, high-fidelity reconstruction method based on a multi-channel loudspeaker system. A time-varying spherical harmonic coefficient field is first constructed to parametrically represent the dynamic sound field. An optimisation algorithm integrating perceptual weighting and sparse constraints is then designed to achieve high-quality reconstruction under limited physical loudspeaker channels. Experimental results demonstrate that the proposed method significantly outperforms conventional higher-order ambisonics decoding, vector base amplitude panning, and existing deep learning approaches. Key improvements include reduced normalised field error, lower perceptual spectral distortion, and higher azimuth estimation accuracy, all while satisfying real-time processing requirements. Experiments show that, the proposed method reduces the normalised field error by more than 40% compared to the next-best method, while maintaining a frame processing latency of less than 20 milliseconds. These advancements collectively enhance the auditory immersion in dynamic virtual environments. Keywords: dynamic sound field reconstruction; immersive audio; spherical harmonics; sparse optimisation; compressed sensing; spatial hearing.
Abstract: This study addresses the inaccuracy of traditional static network models in predicting rapidly evolving interests within cultural communities by proposing an interactive network evolution model based on dynamic interest graphs. We find that members shifting interests make information propagation paths unpredictable - for instance, trending topics in music communities may cycle as frequently as every two weeks. To tackle this challenge, we developed a graph model incorporating time-decay factors that captures real-time changes in interest similarity. Experiments demonstrate that compared to traditional static graph methods, our model achieves a 12.7% improvement in community structure prediction accuracy and reduces prediction error for information reach by 18.3%. This work offers new insights into understanding the dynamic evolution of cultural communities and enabling precise content dissemination. Keywords: dynamic interest graph; cultural communities; network evolution; information dissemination.
Abstract: To address issues such as uneven equipment utilisation and delayed user demand response in traditional smart laboratory resource allocation, this paper proposes a multi-agent deep reinforcement learning framework based on attention mechanisms. The research motivation stems from the collaborative scheduling challenges posed by heterogeneous equipment and dynamic task requests. This method employs a centralised training and distributed execution architecture, enabling agents to learn cooperative strategies in partially observable environments. It further incorporates a demand forecasting module to enhance allocation foresight. Experiments on public datasets and simulation environments demonstrate that the proposed method significantly outperforms traditional genetic algorithms and single-agent reinforcement learning approaches in both resource allocation quality (normalised discounted cumulative gain @5 reached 0.87) and overall utilisation (area under the curve improvement of 15.2%), validating its effectiveness and adaptability in complex laboratory scenarios. Keywords: multi-agent reinforcement learning; MARL; smart laboratory; dynamic resource allocation; attention mechanism.
Abstract: This study addresses the inefficiency and passivity of surveillance in detecting crowd anomalies across wide, dynamic environments using unmanned aerial vehicles. To address this, this paper proposes an active perception framework for drone swarms that is driven by real-time visual-semantic feedback. The framework couples a spatiotemporal graph attention network, which models crowd interactions and infers anomaly probabilities, with a cooperative multi-agent reinforcement learning decision making module. This integration enables the swarm to dynamically and collaboratively optimise viewpoints based on live semantic cues. Evaluated on the VisDrone dataset, our approach achieves an anomaly capture rate of 89.7%, an average response delay of 1.9 seconds, an operational efficiency of 1.86 events per kilometre flown, and a low observation redundancy of 22.1%. These results demonstrate that embedding visual semantics into a closed perception-control loop significantly enhances the performance of proactive monitoring systems compared to existing baseline methods. Keywords: drone swarm; active perception; crowd anomaly detection; visual feedback; cooperative reinforcement learning.
Abstract: The generation of adaptive interaction logic for virtual characters remains challenging, as traditional rule-driven methods often produce rigid and contextually insensitive behaviours. To overcome this, we present the multimodal meta-generation network, a multimodal behaviour data-driven framework that synthesises natural and socially appropriate interaction logic from streams including speech, posture, and facial expression. The framework employs cross-modal temporal alignment and hierarchical reinforcement learning to fuse asynchronous signals and enable joint strategy planning with action execution. A causal reasoning module is integrated to enhance social rationality. Experiments on public multimodal interaction datasets demonstrate that our method significantly outperforms baseline models, achieving an F1-score of 0.795 in accuracy and a human subjective score of 4.3 out of 5.0 in naturalness. This research provides a practical solution for deploying adaptive virtual characters in fields such as the metaverse, intelligent education, and remote collaboration. Keywords: multimodal learning; virtual characters; interaction logic generation; reinforcement learning; behaviour analysis.
Abstract: Automated emotion analysis in visual art remains a significant challenge, primarily due to the paucity of annotated data and the profound stylistic and semantic gap between generic image understanding and domain-specific artistic interpretation. This study introduces a novel meta-learning framework enhanced with structured semantic knowledge for few-shot emotion recognition in oil paintings. The proposed model integrates a dual-path architecture: a meta-learning pathway for rapid visual adaptation and a semantic pathway that incorporates contextual art historical knowledge. These pathways are fused through a hierarchical cross-modal attention module, which dynamically aligns visual features with relevant semantic concepts during the learning process. Extensive evaluations on the ArtEmis dataset demonstrate the frameworks superior performance, achieving state-of-the-art macro-accuracy of 68.7% (1-shot) and 81.3% (5-shot). The results confirm the models efficacy in achieving robust, generalisable, and interpretable emotion analysis with limited data, advancing the field of computational art understanding. Keywords: oil painting emotion recognition; few-shot learning; meta-learning; semantic enhancement; interpretable artificial intelligence.
Abstract: Coal mine safety is crucial for both life and production. However, traditional monitoring relies on a single sensor, resulting in a high rate of missed alarms in complex underground environments. To address multiple challenges such as changes in light, dust interference, etc., this study proposes a deep learning early warning system that integrates video, infrared, and vibration data. Through cross-modal feature fusion and multi-task learning, it achieves collaborative perception of abnormal human behaviors and equipment failures. Experimental results show that the system achieves an area under the curve of 0.982 for abnormal behavior detection on public datasets, which is approximately 7% higher than that of a single visual model; The accuracy of fire warning reaches 96.7%, and the false alarm rate is reduced by 5.3%. This method provides a highly reliable and scalable technical path for intelligent safety monitoring in coal mines around the clock. Keywords: coal mine security; multi-modal fusion; anomaly detection; intelligent early warning.
Abstract: This research tackles the dynamic optimisation of construction carbon emissions by proposing a novel integration of building information modelling, utility-driven simulation, and multi-objective evolutionary computation. A core methodological contribution is a formalised utility function that quantifies and embeds project-specific decision-maker preferences for time, cost, and emissions directly into the optimisation search process. This function guides a bespoke evolutionary algorithm to automatically generate efficient, low-carbon construction plans, which are accurately evaluated by a high-fidelity discrete-event simulation engine using enriched building information modelling data. In an experimental study based on a publicly available office building dataset, our framework demonstrates superior performance, outperforming state-of-the-art benchmarks with a 5.6%-15.3% gain in hypervolume and achieving an 18.7% reduction in on-site emissions. This work provides a rigorous and actionable decision-support system for advancing sustainable construction practices. Keywords: building information modelling; BIM; construction carbon emission; multi-objective optimisation; utility function; discrete-event simulation.
Abstract: Reliable evaluation of the quality of machine translation is essential to ensure a reliable automatic translation system. However, adversial attacks can reduce evaluation performance by subtly disturbing sentences and endanger the security of key applications. This paper proposes a comprehensive confrontational robustness enhancement framework specially designed for translation quality evaluation, an adversarial robustness enhancement framework. The framework integrates a multi-grained confrontation sample generator, a dynamic confrontation training mechanism based on the relaxation of master and apprentice labels, and online defense module. The experiment was verified on the machine translation and multilingual quality evaluation seminar and post-editing task data set: this method increased the robustness of the model by 34.2%, reduced the average prediction error from 18.7% to 12.3% in the attack state. The framework shows stable performance in multiple fields providing an effective solution for building safe and reliable actual scene translation quality evaluation system. Keywords: translation quality assessment; adversarial machine learning; robustness assessment; natural language processing; NLP; model security.
Abstract: In recent years, social media has become a key medium for public emotion and thought dynamics, making its sentiment analysis crucial for event prediction. However, while mainstream deep learning models achieve accurate predictions, their black box decision process hampers reliable warning. This paper thus innovatively integrates the powerful transformer model with interpretable Shapley additive explanations values to construct a social sentiment warning framework with both high accuracy and transparency. Experiments on public datasets show the methods comprehensive warning performance significantly outperforms traditional models: area under the curve reaches 0.872, which is approximately 7.5% higher than the classic long short-term memory model, overall warning accuracy rises to 85.6%, and the false alarm rate drops by nearly 12%. This provides an effective solution for reliable and interpretable automated social sentiment perception and early risk warning. Keywords: emotional warning; transformer; explainable artificial intelligence; SHAP value.
Abstract: Aiming at the core problems faced by the power line carrier communication system of smart grid, such as severe channel attenuation, complex and changeable interference sources, delayed fault location and slow self-healing response, and the existing methods have significant shortcomings in reliability, real-time and scene adaptability, this paper proposes an intelligent optimisation scheme based on the integration of data mining and artificial intelligence (IOSDM-AI). Through the multi-model coupling mechanism, the scheme is driven by real-time communication data flow to achieve accurate fault diagnosis, rapid location and adaptive self-healing, while ensuring the stable operation of the system. The results show that the accuracy of IOSDM-AI algorithm is 94.7%, the diagnosis delay is reduced to 108.3 ms, and the missed diagnosis rate and misdiagnosis rate are as low as 0.7% and 2.1%, respectively. The self-healing success rate is 96. 8%, the average self-healing time is reduced to 2.3 s, the communication link stability index is 9.28, and the self-healing strategy execution accuracy (FSE) is 428. Keywords: smart grid; power line carrier communication; artificial intelligence; fault diagnosis; self-healing mechanism; data mining.
Abstract: This study proposes a multi-objective optimisation model to address the strategic placement of public art, aiming to balance spatial efficiency, social equity, and economic cost. The data-driven framework incorporates dynamic mobility patterns, multidimensional socioeconomic indices, and urban walkability networks. It concurrently optimises three objectives: maximising weighted accessibility coverage, minimising the Gini coefficient of accessibility, and reducing total expenditure. A novel cognitive heuristic adaptive search algorithm is introduced, which embeds domain knowledge of urban spatial structure to solve this high-dimensional problem. Empirical validation using Manhattan data confirms the algorithms superior performance, demonstrating a 10.8% to 24.1% improvement in the hypervolume metric over standard multi-objective evolutionary algorithms. The resulting Pareto-optimal solutions quantify clear trade-offs, such as a 142% gain in coverage efficiency or a 48% reduction in accessibility inequality, thereby establishing a scientific basis for equitable cultural resource planning. Keywords: public art placement; spatial optimisation; social equity; multi-objective optimisation; data-driven decision support.
Abstract: To address the issue of high cost incurred by multimodal large models in visual-language tasks, this paper proposes a lightweight model, CAD-LM, based on cross-modal attention distillation. It designs an adaptive modal reparameterisation module that utilises a multi-branch structure to enhance representational power during training and reparameterises it into an efficient single-branch structure during inference. This paper integrates an end-to-end deployment optimisation process that encompasses hardware-aware pruning, mixed-precision quantisation, and efficient parameter fine-tuning. Experiments show that CAD-LM successfully compresses the model parameter count to 118.4 M and reduces computational complexity to 12.1 GFLOPs, achieving approximately 20% and 30% reduction, respectively. Its performance on benchmark tasks such as Flickr30k and VQA v2.0 significantly surpasses baseline models like the original-scale CLIP and ALBEF. Edge deployment verification reveals that the final model occupies only 89.4 MB of memory and boasts millisecond-level inference latency, achieving an excellent balance between model performance, computational efficiency, and engineering practicality. This provides an efficient solution for multi-modal applications in resource-constrained environments. Keywords: attention distillation; multi-modal; large model; lightweight; end-side deployment.
Abstract: In online learning scenarios, the dynamic fluctuations of students cognitive load directly impact learning outcomes. Existing scheduling models, lacking real-time perception of cognitive load states, often result in mismatches between task assignments and learner capabilities. To address this, this paper first processes student contextual features using feature selectors and self-attention mechanisms, then predicts response performance based on cognitive load state. Subsequently, an online learning task utility scheduling model is constructed based on cognitive load diagnosis results. A multidimensional utility function for online learning tasks is designed, establishing a scheduling objective function that maximises this utility. Finally, an improved particle swarm optimisation algorithm solves the objective function to derive the optimal online learning task utility scheduling strategy. Experimental results demonstrate that the proposed method achieves scheduling times of 3.3 ms and a success rate of 98.3%, outperforming baseline methods with significantly higher scheduling efficiency. Keywords: online learning; task utility scheduling; cognitive load; cognitive diagnosis; attention mechanism.
Abstract: This paper introduces vision-based intelligent inspection system to detect surface defects in CNC-machined parts by using a YOLOv7-based transfer learning model. The manual inspection systems used traditionally are laborious, subjective, and prone to mistakes, thus necessitating automated solutions. The suggested YTL-ISDD system takes advantage of the high-resolution images taken under controlled lighting to detect defects, including scratches, cracks, pits, and burrs. The model makes use of pre-trained YOLOv7 weights that improve feature extraction and minimise the training time. The amount of data augmentation, such as rotation, scaling, flipping, and contrast adjustment, is used to enhance robustness and generalisation in different conditions of surfaces. The system can be easily integrated into CNC production environments, and it can detect defects in real-time, accurately, and consistently with minimum human involvement, which enhances the quality of products and efficiency of production. Keywords: CNC machining; surface defect detection; YOLOv7; transfer learning; intelligent inspection.
Abstract: Faced with the challenges of complexity and uncertainty of multi-source heterogeneous data in energy industry decision support systems and the dynamic environment adaptation needs brought about by smart grids and renewable energy access, this study aims to build an adaptive, strong and robust multi-modal knowledge fusion and intelligent generation model to improve the accuracy and reliability of decision-making. By innovatively integrating hypergraph attention networks to achieve multi-modal feature fusion, cloud model quantification of data uncertainty, and reinforcement learning framework-driven intelligent policy generation, the model achieves an accuracy rate of 93.5%, an F1 score of 92.8% and RMSE 0.08 in performance tests. In addition, robustness tests show that the model has a change rate of only 3.7% under 30% noise, the decision delay is optimised to 65 milliseconds, and the accuracy rate increases to 89.2% after fine-tuning in cross-scenario generalisation. Overall, the model effectively solves the semantic gap and data quality problems, and provides efficient support for energy scheduling, fault prediction and other scenarios. Keywords: energy industry; decision-making; multi-modal; knowledge fusion; intelligent generation.
Abstract: With the global economys deep integration, traditional manual migration methods cannot meet efficiency and accuracy needs. This study explores meta learning techniques application in zero sample accounting standard transfer and constructs an intelligent framework to adapt to new accounting standards in various fields. By analysing meta learning mechanisms, an innovative transfer framework is designed to use a small amount of source domain data for rapid adaptation and precise transfer of accounting standards in new fields. Experimental results show that the meta learning empowerment framework significantly improves transfer performance under zero sample conditions. Compared with traditional methods, the average accuracy is up by 12.3%. At different data scales, as the data volume increases from 100 to 1,000, the accuracy improves by 8.5%, 10.2%, and 13.1% respectively. In new accounting standard testing, the average accuracy reaches 85.6%, a 9.4% improvement over traditional methods. Keywords: meta-learning; zero-sample learning; accounting standards migration; intelligent framework.
Abstract: To address the shortcomings of neural machine translation in handling complex sentences and terminology, this paper proposes a translation quality improvement model based on the quantum-optimised osprey optimisation algorithm (QOOA). This model integrates quantum computing and metaheuristic algorithms, enhancing population diversity through qubit encoding, dynamically adjusting individual positions using a quantum rotation gate strategy to balance global exploration and local exploitation, and constructing a multi-objective fitness function that combines semantic similarity and syntactic complexity. Experiments on the WMT2018 English-Chinese dataset show that, compared to the baseline model, this method improves the BLEU score by 3.2 percentage points and reduces the TER by 12.7%, significantly reducing translation confusion. The results demonstrate that QOOA effectively improves translation quality, especially in long sentences and technical texts. Keywords: quantum optimisation; osprey algorithm; machine translation; parameter optimisation; BLEU index; meta-heuristic algorithm. DOI: 10.1504/IJICT.2026.10077980
Abstract: In today's globalised and technology-driven world, improving spoken English is increasingly important. However, traditional automatic speech recognition (ASR) systems often produce outputs with grammatical errors, poor word choices, and pronunciation ambiguities, hindering effective communication. To address this, we propose MTG-ERR, a novel multimodal transformer-GCN framework that integrates acoustic and textual information for real-time and accurate spoken English error correction. The model uses a transformer-based acoustic encoder to capture temporal speech features and a GCN-based module with dependency syntactic trees to model grammatical structures. A dynamic fusion mechanism effectively combines both modalities, significantly enhancing error correction. Experiments on the L2-ARCTIC and LibriSpeech corpora show our framework outperforms baseline models, achieving a 92.7% F1-score in grammatical error correction. Ablation studies confirm that incorporating grammatical information improves performance on long, complex sentences by 12.1% in F1-score. With an average response latency under 320 ms, the system meets real-time interactive requirements. This research provides valuable insights for developing robust spoken language assistance systems, with significant potential for educational and commercial applications. Keywords: oral error correction; multimodal learning; transformer; graph convolutional networks; GCNs; real-time systems; grammatical dependency analysis. DOI: 10.1504/IJICT.2026.10078001
Abstract: Addressing the issue of traditional customer segmentation relying on static data and struggling to respond to behavioural changes in real-time, a real-time customer segmentation framework based on big data analysis and clustering analysis is proposed. The data comes from e-commerce websites and includes user activities, transactions, and demographic information. Preprocessing involves data cleaning, normalisation, and TF-IDF feature extraction. The key features include transaction frequency, interest in product categories, and page dwell time. The proposed model is an adaptive k-nearest neighbour (k-NN) logistic regression based on clonal selection (CS-AK-LR), integrating adaptive K-means clustering (AK) and logistic regression (LR) for customer clustering and value classification prediction. The clonal selection algorithm (CS) optimises the hyperparameters of AK and LR. The segmentation detection rate of this method reaches 96.21%, and the error rate is reduced by 1.03% compared to existing methods. Combining big data with real-time clustering analysis can effectively enhance the speed and accuracy of marketing responses. Keywords: consumer segmentation; clonal selection-based adaptive K-logistic regression; CS-AK-LR; marketing strategy; big data; cluster analysis. DOI: 10.1504/IJICT.2026.10078097
Abstract: Government agencies struggle to track and respond to public sentiment on social media platforms like Weibo. This case study describes the design and development of a monitoring system for an anonymous municipal government in China, leveraging deep learning to analyse sentiment and emerging topics. The case details the system architecture, implementation challenges, and how the outputs can be used for targeted public communication. To achieve effective management of social public opinion, this article uses deep learning and clustering algorithms to process public opinion information on the Weibo platform and establishes a Weibo public opinion analysis system. Focusing on user blog posts and comments, we first use distributed crawlers to obtain data, and then complete preprocessing through cleaning and word segmentation. Emotion analysis is implemented to obtain sentiment polarity and probability, and to explore potential themes using a latent Dirichlet allocation topic model. The experimental results show that the established model has high accuracy in emotion classification. Using real Weibo data, the emotional value change curve of netizens is plotted to determine the impact of topics on netizens' emotions. The system supports targeted public opinion intervention for governmental use. Keywords: Weibo; public opinion; analysis. DOI: 10.1504/IJICT.2026.10078157
Abstract: The proposed study presents a new AI-supported CPS architecture that facilitates the establishment of real-time co-creation among artists and intelligent machines based on an adaptive communication set. The architecture of the system is a three-layer system that comprises the perception layer, where a generative adversarial network (GAN)-based design recommender is optimised by a feedback loop of reinforcement learning (RL) that captures sensory feedback of the system, the cognitive layer, which interprets the input data into a recommended creative modification; and finally the layer that implements the suggested modification into the work environment with robotic actuators and additive manufacturing tools. The semantic communication protocol is carried out on the message queuing telemetry transport (MQTT) and open platform communications - unified architecture (OPC-UA) standards to promote the uninterrupted exchange and synchronisation of the data between the interface of the artist and the physical production space. Keywords: cyber-physical system; AI-assisted design; co-creation framework; generative adversarial network; GAN; ceramic manufacturing. DOI: 10.1504/IJICT.2026.10078237
Abstract: This paper proposes a deep learning-based model for computer music denoising, addressing accuracy and efficiency limitations in existing methods. It employs a dual-branch network to separately identify transient and periodic noise, combined with an improved spectral subtraction for precise audio separation. Model compression via pruning and knowledge distillation ensures real-time capability. Experimental results on AudioSet show recognition accuracies of 93.5% (transient) and 94.1% (periodic), with average denoising gains of 15.1 dB, 15.0 dB, and 16.3 dB for transient, periodic, and mixed noise, respectively. When processing 100 minutes of lab-recorded audio, latency remains under 21.0 ms, outperforming three benchmark models in speed and stability. The model demonstrates robust noise reduction and real-time performance, suitable for applications like live music, low-latency communication, high-quality post-production, and restoration of noisy historical recordings. Keywords: dual-branch communication; spectral subtraction; computer music denoising; model pruning; knowledge distillation. DOI: 10.1504/IJICT.2026.10078358
Abstract: Tourism route planning has traditionally emphasised shortest-path optimisation, often over-looking the importance of enhancing tourists' overall experiences. Many travellers rely on user-generated content to guide their journeys, yet manually searching and adjusting routes in real-time can be inefficient and inaccurate. This study focuses on building a high-quality database to support model training for tourism route optimisation and dynamic adjustments. By leveraging graph theory and the Floyd-Warshall algorithm, the proposed approach integrates various tourism-related data factors to enhance route planning accuracy based on personalised preferences. The high-quality dataset, sourced from travel agencies and user-generated data, ensures the algorithm's adaptability in real-world scenarios. The model is tested on an online tourism platform, with its effectiveness evaluated through a framework grounded in tourism theories and user behaviour research. The results demonstrate significant improvements in both route planning accuracy and the efficiency of real-time adjustments when travellers modify their plans mid-journey. Keywords: database establishment; machine learning; tourism route planning adjustment. DOI: 10.1504/IJICT.2026.10078037
Abstract: As the scale of university graduates has continued to expand, the evaluation of employability has become a critical issue in higher education management and talent cultivation. This study aimed to develop a scientific, quantitative, and multidimensional method for assessing graduate employability. An employability indicator system was constructed using the analytic hierarchy process (AHP). The results indicated that professional competence had the highest weight (0.55). Within this dimension, technical operation and experimental skills, the application of theoretical knowledge, and the quality of project experience contributed most significantly to employment competitiveness. A back propagation neural network (BPNN) model was further applied to train and predict the sample data. The results demonstrated a high level of consistency between the predicted values and the actual values. The absolute error ranged from 0.03 to 0.11, the relative error remained below 2.12%, and the overall accuracy reached 0.926. Universities should strengthen professional practice and innovation capacity development and given to enhancing students' professional qualities to improve overall employment competitiveness. The main contribution of this study provides a decision-making reference for higher education management, career guidance, and policy formulation. Keywords: back propagation neural network; BPNN; neural network model; university graduates; employability. DOI: 10.1504/IJICT.2026.10077937
Abstract: As the belt and road project continues to proliferate, the ChinaLaos Railway appears as a connector between infrastructure and a place of cultural exchange. This paper examines the architecture of an interactive three-dimensional (3D) book that can be fuelled with Artificial Intelligence and shows the geography, ethnic culture, and transport development along the railway. The study improves visual representation, multimedia incorporation by using AIGC image generation and layout, and interactive design tools. A four-dimensional framework is created, which is called railway culture, spatial structure, interactive experience and AI generation mechanism to maximise visual narratives and dynamic content. The results show that AI is a way of enhancing design efficiency and creative behaviours as it is a new avenue of merging cultural communication with technology using conventional paper-based media. Keywords: China-Laos Railway; three-dimensional book design; artificial intelligence; AI; interactive narrative; cultural communication; visual expression. DOI: 10.1504/IJICT.2026.10077981
Abstract: This research tackles the pressing challenge of real-time automatic error detection in piano performance, a task where conventional approaches often propagate inaccuracies due to the decoupling of audio-score alignment and error identification. This paper introduce the DiffAlignTransformer framework, which incorporates a differentiable dynamic programming mechanism to jointly learn probabilistic notelevel alignment and error classification within a hierarchical crossmodal encoder. Evaluated on the Vienna Synchronous Library dataset using a leaveoneperformerout validation strategy, the model attains an overall F1score of 0.872, exceeding the strongest baseline by 6.0%, with marked gains in onset (7.2%) and offset (8.1%) error recognition. Inference requires only 78 milliseconds per second of audio, satisfying strict realtime constraints. These outcomes confirm that our method successfully resolves the intertwined alignment-detection problem and delivers precise, instantaneous feedback for piano pedagogy. Keywords: piano performance assessment; error detection; differentiable alignment; cross‑modal transformer; real‑time feedback. DOI: 10.1504/IJICT.2026.10078198
Abstract: Precise prediction of the purchase intention for marine cultural and creative products is of vital importance for e-commerce marketing. Addressing the issues of the existing methods that separately analyse images and comments, and the difficulty in capturing cross-modal collaborative effects, this study proposes a dual-stream deep learning model that integrates visual saliency and text sentiment. This model achieves a deeper understanding of user preferences by simultaneously extracting the salient regions of the images and the sentiment tendencies of the comments. Experiments on public datasets show that the purchase intention prediction accuracy of this method reaches 85.6%, significantly outperforming the baseline models that only use images (72.1%) or text (78.3%), with a recall rate increase of over 10 percentage points. This study provides an effective tool for multimodal fusion analysis and personalised recommendations in the marine cultural and creative field. Keywords: purchase intention prediction; visual saliency; text sentiment analysis; multimodal fusion; marine cultural products. DOI: 10.1504/IJICT.2026.10078357
Abstract: This study proposes a framework for suppressing the spread of fake news on social networks based on multimodal sentiment analysis. This study employs the BERT model to extract contextual semantic vectors from news texts. These are then fused with the output of a bidirectional long short-term memory (BiLSTM) network through feature concatenation, enabling simultaneous capture of local context and global long-range dependencies. Emoticon sentiment features are then extracted through autoencoders and deeply integrated to accurately identify user sentiment inclinations. The study's core innovations are: 1) a multi-tiered fake news detection and suppression architecture; 2) deep fusion of text and emoticon features through multimodal sentiment analysis; 3) dual-strategy dissemination suppression combining 'detection + sentiment immunity'. Experimental results demonstrate that the fake news detection model achieves an accuracy of up to 89.4%. The proposed model can provide effective solutions for building a timely and accurate false news prevention and control system. Keywords: fake news; multi-modal data; sentiment analysis; dissemination suppression; BERT model. DOI: 10.1504/IJICT.2026.10078158
Abstract: Japanese kana writing is fundamental to learning the Japanese language, and its standardisation has a significant impact on language learning outcomes. To address the inefficiency and subjectivity of traditional manual evaluation, this study proposes an intelligent evaluation model that integrates a convolutional long short-term memory (ConvLSTM) network with a conditional random field (CRF). First, the model utilises the ConvLSTM to efficiently extract spatiotemporal features of handwriting traces. Second, the CRF layer optimises sequence annotation to achieve automatic quantitative evaluation of kana writing accuracy, fluency, and structural standardisation. Finally, a self-constructed dataset containing 2,000 handwriting trace samples from five common hiragana and five katakana categories was used for evaluation experiments. The results show that the model achieved a 98.2% accuracy rate in kana character recognition, a Pearson correlation coefficient of 0.91 between its writing style score and expert evaluations, and a 91.2% accuracy rate in kana stroke regularity assessment, significantly outperforming the single LSTM and CNN-CRF models. Keywords: writing trajectory evaluation; ConvLSTM; CRF; Japanese kana; intelligent evaluation; sequence labelling. DOI: 10.1504/IJICT.2026.10078036
Abstract: Aiming at the problem of prediction deviation caused by ignoring deep semantic information in online marketing effect perception, this study proposes an innovative framework that deeply integrates neural networks and multi-level semantic mining. Traditional methods mostly rely on shallow interaction features, making it difficult to capture complex intentions in texts. Our model achieves deep understanding and alignment of user preferences and product connotations through collaborative fine-tuning of pre-trained language models and graph neural networks. Experiments on public datasets show that, compared with mainstream baseline models, this framework has increased the area under the receiver operating characteristic curve for click-through rate prediction by 2.1% and the ranking metric normalised discounted cumulative gain @10 by 4.7%. All improvements are statistically significant (p < 0.01). This research provides an effective approach for building a more precise and interpretable intelligent marketing system. Keywords: online marketing; deep neural networks; DNNs; semantic mining; effect perception; recommendation systems. DOI: 10.1504/IJICT.2026.10077936
Abstract: This study proposes a scenariobased stochastic optimisation framework for the optimal placement and sizing of energy storage systems (ESS) in distribution networks. The model integrates ICTenabled data acquisition and communication infrastructures to process realtime load and renewable energy data. A complete mixedinteger linear programming (MILP) formulation is developed, incorporating power balance, ESS dynamics, and network operational constraints across multiple uncertainty scenarios. The proposed method is validated on a real distribution network case study, demonstrating operational cost reductions, improved grid stability, and enhanced renewable energy utilisation compared with deterministic approaches. Keywords: energy storage systems; ESS; distribution networks; stochastic optimisation; mixed-integer linear programming; MILP. DOI: 10.1504/IJICT.2026.10078003
Abstract: The ceramic industry generates an enormous amount of waste every year. Traditional recycling relies on manual sorting, with an accuracy rate of only about 78%, and the production line scheduling is rigid. To achieve efficient resource utilisation, this paper innovatively integrates convolutional and recurrent neural networks to construct an intelligent waste recognition model, and embeds it into a discrete event simulation system for dynamic optimisation. Experiments show that the new method increases the classification accuracy to 96.7%, the system throughput increases by 32.4% after simulation optimisation, and the utilisation rate of key equipment increases by 22.8%. This research provides an intelligent solution for the precise identification and system regulation of ceramic waste recycling, promoting the implementation of the circular economy. Keywords: ceramic waste recycling; hybrid neural network; HNN; discrete event simulation; DES; resource utilisation rate. DOI: 10.1504/IJICT.2026.10078356
Abstract: This paper proposes a quantum-threat-mitigated encryption scheme by reengineering core algorithms via mathematical lattice constructs. While offering quantum-resistant security, lattice-based homomorphic encryption suffers from high latency and storage overhead. To overcome this, we redesign the ciphertext structure and decryption algorithm, introducing a polynomial Chinese remainder theorem-based method to pack multiple complex plaintexts into a single polynomial. A reconfigurable modular unit and a hybrid crossbar-fixed interconnection network are co-designed to optimise operational efficiency. This dual approach facilitates algorithm reconstruction and optimisation. Security analysis and simulations confirm that our method not only resists quantum computing attacks but also achieves an encryption time of 0.98 ms per bit, meeting real-time requirements. Keywords: post-quantum cryptography; lattice-based construction; homomorphic encryption; algorithm reengineering. DOI: 10.1504/IJICT.2026.10078199
Abstract: The demands of customers for spiritual culture are successfully met by cultural and creative products, which are a significant carrier of museum culture. Customers may develop a closer relationship with museums through the creative and cultural products' original design, which can greatly increase museums' social awareness. This paper suggests using the KANO model to innovate the design of museum cultural products from the perspective of consumer demand, given the issues of significance homogenisation, exorbitant prices, and a lack of functional development of current museum cultural products. The KANO model is utilised to analyse and prioritise consumers' demands for museum cultural products. This analysis employs a series of metrics to assess consumer needs and determine the most pressing issues within the field. The application of the KANO model in this particular context facilitates the generation of innovative concepts in the domain of product design and development. Keywords: KANO model; cultural and creative products of museums; product design; consumer demand. DOI: 10.1504/IJICT.2026.10078159
Abstract: The complex sea ice and marine environment in the polar region significantly affects marine safety operations. How to accurately simulate the complex polar environment is a key concern at home and abroad. The greenhouse effect leads to ice melting, with the expanding area of broken ice posing new challenges to ice navigation. This paper reviews the principle of discrete element method (DEM), special features of ship navigation in broken ice areas, and the progress of DEM applications in broken ice research. Based on this foundation, it discusses the existing challenges and key research applying DEM to broken ice studies. Keywords: discrete element method; DEM; broken ice areas; computational fluid dynamics; ship navigation; review. DOI: 10.1504/IJICT.2026.10078099
Abstract: Social network data has become a vital resource driving product innovation and design. Current research struggles to fully uncover users' emotional needs toward products when dealing with unstructured, high-dimensional social data, resulting in subpar product quality. To address this, this paper first employs a multi-scale attention network to analyse product emotional needs, capturing users' emotional demands. Subsequently, a spatial cross-reconstruction module is designed within the generative adversarial network to obtain more refined features. Simultaneously, a semantic correlation attention module is designed for mapping emotional needs to product images. This extracts attribute and word encodings as semantic representations to guide image generation, enhancing semantic consistency between emotional needs and visual content. Experimental results demonstrate that the proposed method achieves 92.71% accuracy in emotional need recognition and an FID of 11.88 for product images, outperforming state-of-the-art methods and delivering outstanding performance in innovative product design tasks. Keywords: innovative product design; social network; deep learning; emotional needs analysis; generative adversarial network; GAN. DOI: 10.1504/IJICT.2026.10077935 Abstract: This study proposes a collaborative management framework for tourist destination dynamic carrying capacity based on multi-agent deep reinforcement learning (MADRL) and spatio-temporal graph neural network (STGNN). A multi-dimensional topological model is constructed to characterise the spatio-temporal correlation of passenger flow, resources, environment, and service. A STGNN module embedded with spatio-temporal attention is designed to capture dynamic evolution features. A hierarchical MADRL structure realises global coordination. Experiments show that the framework reduces MAE to 0.037, shortens response delay to within 8.2 s, and improves carrying capacity utilisation to 92.6%. It outperforms traditional models in prediction, response, and multi-objective balance, providing an effective method for intelligent and sustainable tourism management. Keywords: tourist destination; dynamic bearing capacity; multi-agent deep reinforcement learning; MADRL; spatio-temporal graph neural network. DOI: 10.1504/IJICT.2026.10078160
Abstract: In the face of increasingly covert cyber-attacks, traditional detection models struggle to effectively capture the complex contextual correlation features in the traffic, resulting in insufficient ability to identify new threats. To address this issue, this study proposes a detection model based on bidirectional self-attention mechanism, which achieves deep perception of abnormal behaviours by simultaneously learning the context information of the traffic sequence. Experimental results show that compared with mainstream long short-term memory and standard transformer methods, this model has an average area under the curve improvement of over 4.2%, and the recall rate for low-rate attacks has increased by 7.5%, significantly enhancing the accuracy and robustness of detection. This study provides a new idea for improving the active defence capability of network security. Keywords: cybersecurity; anomaly detection; self-attention; bidirectional encoding. DOI: 10.1504/IJICT.2026.10078355
Abstract: Artificial intelligence (AI), edge computing, and the internet of things are all helping to make real-time analytics and immersive fan interaction possible in the sports world. However, typical cloud-based sports communication networks have too much latency, too much bandwidth congestion, and limited scalability, which makes real-time sports analytics and interactive fan experiences difficult. This paper presents an AI-driven EdgeIoT sports communication framework (AIESCF) for the intelligent processing of sports data, utilising edge-based deep learning inference, adaptive bandwidth-aware communication protocols, and distributed IoT sensing infrastructures. It uses spatiotemporal event recognition models, edge-level data filtering, and AI-assisted communication optimisation to find events and look at how well players are doing without putting too much strain on the network. The system has an accuracy of 90%, a latency of 85 ms, a bandwidth optimisation of 55%, and an engagement rate of 87%. Results demonstrate scalable efficient architecture deployment. Keywords: AI-driven networks; Edge-IoT; sports analytics; fan engagement; real-time communication. DOI: 10.1504/IJICT.2026.10078354
Abstract: This paper proposes a federated learning framework integrated with adaptive graph convolution for accurate and privacy-preserving carbon emission calculation in cross-regional power grids. It addresses data silos and privacy concerns by training models locally, avoiding raw data transfer. The adaptive graph convolution component automatically captures the dynamic spatial dependencies and carbon flow effects between grid regions. Validated on a Chinese grid dataset, the method reduces calculation errors by 22.3% and 14.7% compared to centralised and traditional distributed approaches, respectively, while demonstrating strong robustness against grid topology and operational fluctuations. Keywords: federated learning; adaptive graph convolutional networks; grid carbon emissions; collaborative computing; privacy protection. DOI: 10.1504/IJICT.2026.10078002
Abstract: With the rapid growth of data-intensive applications, achieving low-latency and reliable content retrieval in complex networks has become a major challenge. Information-centric networking (ICN) leverages content naming and pervasive in-network caching to enable retrieval from multiple replicas, making replica selection crucial for performance. However, selection is complicated by replica capacity limits, bursty workloads, and dynamic path variations. To address these issues, we propose a replica selection strategy that integrates the multi-armed bandit (MAB) framework with dynamic redundancy control. By modelling selection as an MAB problem, the strategy incorporates path variability, service heterogeneity, and blocking risk into decision-making, enabling adaptive exploration and exploitation. An additional load-aware redundancy mechanism adjusts redundancy levels to curb exploration overhead and suppress tail latency. Simulations on a real-world topology show that the method significantly reduces latency and improves robustness. Compared with nearest-replica routing, it reduces average latency by 32.09% and P99 tail latency by 45.76%. Keywords: information-centric networking; ICN; multi-armed bandits; MAB; adaptive redundancy; in-network cache; replica selection. DOI: 10.1504/IJICT.2026.10078035
Abstract: Accurately attributing Chinese second language grammatical errors is crucial for optimising teaching strategies. However, traditional methods are prone to being disturbed by confounding factors such as the learner's level, making it difficult to distinguish between superficial correlation and true causation. To address this, this paper introduces the framework of counterfactual causal inference for the first time. By simulating 'correction' interventions on specific grammatical points, it aims to identify the root causes of the errors. Experiments based on a large-scale public Chinese proficiency test dynamic composition corpus show that this method achieves an accuracy rate of 87.5% in error attribution, an improvement of 8.2% over the best baseline model; its causal effect ranking quality reaches 0.92, significantly outperforming traditional correlation analysis. This method provides interpretable and verifiable causal insights for Chinese second language teaching, and can directly serve the construction of personalised learning paths. Keywords: second language acquisition; attribution of grammatical errors; dual machine learning; DML. DOI: 10.1504/IJICT.2026.10077934
Abstract: Addressing psychological and social factors in graduate employment prediction, this paper proposes a graph network model that integrates psychological time-series data with dynamic social relationships. Traditional methods use static academic data and cannot capture key psychological factors like anxiety and career efficacy, or their interaction with peer and alumni resources. By constructing a time-series graph from psychological scales and social ties, tested on public graduate data, the model achieves an area under the curve of 0.891 for employment prediction. It significantly outperforms long short-term memory networks (area under the curve 0.801) and static graph neural networks (area under the curve 0.832), with normalised discounted cumulative gain at rank position 5 of 0.882, demonstrating reliable destination ranking. This work provides a data-driven approach for precise employment guidance through psychological monitoring. Keywords: temporal graph convolutional network; T-GCN; employment trajectory prediction; mental health; dynamic social network. DOI: 10.1504/IJICT.2026.10078197
Abstract: In view of the increasing role of culture and tourism in promoting tourism, it is an urgent issue to assess exactly how effective they are. In this paper, a multi-modality transformer-based model is proposed to evaluate the communication efficiency of cultural and tourist videos. This model combines VI, BERT and MFCC for the extraction of image, text and audio characteristics. Cross-modal attention and consistency constraints can be used to improve the convergence of information. In order to verify the effectiveness of the proposed model, a series of experiments were carried out to evaluate the performance of the proposed model, such as user engagement, dissemination breadth, and sentiment fluctuation. Experimental results show that the prediction accuracy of the proposed model is 89.2%, the prediction of user interaction is 87.1%, and the correlation coefficient is 0.78. Compared with the conventional single-mode model, the performance of this model is significantly improved, which indicates that multimodal data fusion plays an important role in the evaluation of communication efficiency. Keywords: multimodal learning; cross-modal fusion; short video marketing; emotional fluctuation analysis; cultural tourism dissemination; sentiment analysis; user behaviour analysis. DOI: 10.1504/IJICT.2026.10078252
Abstract: News interests shift quickly, and collecting fine-grained reading logs in one place is increasingly risky, so privacy-preserving personalisation must handle heterogeneous clients and unstable feedback. This paper proposes a dynamic-threshold federated reinforcement learning scheme for personalised news delivery. In the scheme, first, each device learns a sequential policy from local interactions to optimise long-horizon utility. Then, each round estimates update reliability and adjusts a participation cutoff to filter noisy client contributions. Finally, the server aggregates selected shared updates while keeping lightweight personalisation on device. Experimental results show that the proposed scheme raises NDCG at ten from 0.401 to 0.423, improves diversity from 0.287 to 0.319, increases cumulative reward from 1.866 to 2.034, and reduces communication per round from 10.9 to 7.8 megabytes, achieving a stronger balance of utility, diversity, and efficiency. Keywords: dynamic threshold; federated reinforcement learning; personalised news recommendation; client heterogeneity; communication efficiency. DOI: 10.1504/IJICT.2026.10078353
Abstract: Combining personalised expression with high-fidelity geometric reconstruction has been a challenging task. To generate realistic virtual humans, this paper proposes an effective new framework. By combining users' emotional preferences with implicit neural radiance fields, personalised virtual human bodies are generated. This method encodes multi-modal user input into structured conditional variables, and then guides the conditional neural radiance field model to generate facial images with emotional expressiveness. The innovative learnable user-specific embeddings can capture individual expression styles. Additionally, the attention-based fusion module ensures precise alignment between emotional semantics and facial details. Through experiments on standard datasets, the proposed method achieved a fréchet inception distance score of 15.38 and an emotion recognition accuracy of 0.892, significantly outperforming three baseline approaches. These results demonstrate its substantial advantages in emotional accuracy, identity preservation, and overall visual quality. Keywords: virtual human generation; implicit neural radiance fields; affective computing; conditional generation. DOI: 10.1504/IJICT.2026.10078000 |
Open Access
