Forthcoming Articles

International Journal of Information and Communication Technology

International Journal of Information and Communication Technology (IJICT)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Information and Communication Technology (54 papers in press)

Regular Issues

  •   Free full-text access Open AccessA neural network-based quality assessment model for English-to-Chinese text translation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Tiejiang Hu 
    Abstract: Addressing the urgent need for cross-language text translation quality assessment, this paper proposes a neural network-based model for evaluating English-Chinese translation quality. Current widely adopted automated evaluation methods exhibit significant limitations in handling specialised terminology and nuanced semantics, particularly when addressing culture-specific concepts. The neural network model constructed in this study integrates deep semantic representation with contextual correlation analysis, achieving remarkable results on the Chinese-English test set of the public WMT 2020 metrics shared task dataset. It achieved a core correlation metric (Pearsons r) of 0.682, along with a multi-dimensional classification evaluation (macro-F1) of 0.689 and a ranking quality metric (normalised discounted cumulative gain @10) of 0.927, comprehensively outperforming mainstream baseline models. This model provides a reliable technical tool for cross-language text quality control.
    Keywords: neural network; translation quality assessment; cross-language application.
    DOI: 10.1504/IJICT.2026.10078367
     
  •   Free full-text access Open AccessMusic generation controllable dance based on improved transformer model and style consistency
    ( Free Full-text Access ) CC-BY-NC-ND
    by Xiaoyun Cao, Yibo Wang 
    Abstract: Music-driven dance generation can effectively improve the efficiency and popularity of artistic creation, but existing generation methods have problems such as insufficient dance-music correlation, poor stability of long sequence generation, and inconsistent styles. Therefore, a novel dance generation and completion framework that integrates improved transformer and style consistency control is proposed. This framework first constructs a bidirectional attention mechanism cross-modal generation model, enhances the correlation between dance and music through bidirectional interaction perception between music and action modalities, and adopts a planned sampling strategy to alleviate exposure bias in autoregressive generation. By extracting and integrating music features, key action features, and global dance style features, the completed dance segments ensure consistency in music synchronisation and overall style. Experiments showed that the generative model significantly outperformed mainstream comparison models in Frechet distance (25.7), beat coverage (59.7%), hit rate (52.4%), and diversity metrics. The complementary model achieved a style classification accuracy of 95.4% and a style retention rate of 90.2% in dance completion tasks. From this, the model proposed by the research can effectively improve the correlation and style consistency of generated dance, and promote the popularisation of art.
    Keywords: transformer; bidirectional attention; BA; style consistency; music-driven dance generation; controllable dance.
    DOI: 10.1504/IJICT.2026.10078390
     
  •   Free full-text access Open AccessDesign of a cross-domain resource integration learning path generation model for innovative talent cultivation using bi-directional GAN and deep contrastive clustering network
    ( Free Full-text Access ) CC-BY-NC-ND
    by Lian Tong, Liyan Zhou 
    Abstract: To enhance the utilisation efficiency of interdisciplinary learning resources in cultivating innovative talents, this study proposes a fusion generative model integrating a bi-directional generative adversarial network (Bi-GAN) with a deep contrastive clustering network (DCCN). The model integrates multi-domain curriculum resource features via attention mechanism, uses Bi-GAN for feature analysis and enhancement, and finally applies DCCN to cluster and serialise resources into a coherent learning path. Experimental results show that: The silhouette coefficient, normalised mutual information, adjusted rand index, and path coherence score of the proposed model on the test set reach 0.36, 0.72, 0.56, and 0.78, respectively. Compared with the best results in the baseline methods, the proposed model achieves relative improvements of 24.1%, 10.8%, 14.3%, and 16.4% in SC, NMI, ARI, and PCS, respectively. Overall, the proposed model effectively realises deep cross-domain knowledge integration and coherent learning path generation, and provides a solution for personalised educational resource organisation.
    Keywords: two-way generation countermeasure network; deep contrast clustering; cross-domain resource integration; learning path generation; cultivation of innovative talents.
    DOI: 10.1504/IJICT.2026.10078416
     
  •   Free full-text access Open AccessAI enabling mechanism of lighthouse factory from the perspective of complex system theory
    ( Free Full-text Access ) CC-BY-NC-ND
    by Lingjiao Wu 
    Abstract: In the global wave of digital transformation in manufacturing, the in-depth exploration of the AI empowerment mechanism of lighthouse factories, as industry benchmarks, holds significant importance. This study innovatively analyses the AI empowerment mechanism of lighthouse factories from the perspective of complex systems theory. By constructing a theoretical framework based on the characteristics of complex systems, including multi-scale and nonlinear interactions, it derives dynamic evolution equations to accurately depict the complex operational state of lighthouse factories. The research findings show that AI empowerment significantly enhances key performance indicators of lighthouse factories, such as reducing order delivery cycles by 46.4% and improving energy efficiency by 23.6%. Additionally, it clarifies the phase transition points of complex systems, providing critical guidance for corporate digital transformation. This study provides a solid theoretical basis and practical solutions for manufacturing to achieve intelligent upgrades through AI, helping companies enhance their competitiveness in a complex and ever-changing market environment.
    Keywords: lighthouse factory; AI enabled; complex systems theory; digital transformation.
    DOI: 10.1504/IJICT.2026.10078435
     
  •   Free full-text access Open AccessResearch on intelligent generation and interactive display method of traditional art for immersive experience
    ( Free Full-text Access ) CC-BY-NC-ND
    by Panpan Yun 
    Abstract: Faced with the challenges of high cost, long cycle and insufficient innovation in traditional art design, this paper devotes itself to developing an integrated solution of intelligent generation and interactive display. By building a knowledge-enhanced multi-view diffusion interaction model (KEMDIM), this solution effectively integrates domain knowledge graphs to enhance semantic understanding, multi-view diffusion generation to ensure geometric consistency, and metaverse dynamic rendering technology to support real-time interaction. The verification of this method on data sets such as WikiArt shows that the generation quality FID index of this method is reduced to 12.1 (an increase of 19.7%), the PSNR reaches 34.8, the user experience satisfaction score is 4.6 points, and the multi-view alignment error is reduced by 32%. The results show that the framework significantly improves the accuracy and immersion of digital generation of traditional art. Although it has limitations such as relying on the completeness of knowledge base and high demand for computing resources, it provides a new technology path for intelligent protection and innovative design of cultural heritage, and its multi-modal fusion mechanism and scalability have important practical value in promoting the development of AI-empowered traditional processes.
    Keywords: immersive experience; traditional art; intelligent generation; interaction.
    DOI: 10.1504/IJICT.2026.10078436
     
  •   Free full-text access Open AccessDesign and development of mobile learning UI based on situational cognition theory
    ( Free Full-text Access ) CC-BY-NC-ND
    by Chen Liu 
    Abstract: In the context of the current inefficiency in mobile learning application interface design and the difficulty in adapting to diverse user scenarios, this study explores ways to enhance UI generation effectiveness and user experience through automation technology. By transforming the principles of environmental interaction and dynamic cognition emphasised in situational cognition theory into computable neural network components, an encoder-decoder model based on CNN and transformer is constructed. This model introduces two-dimensional spatial position encoding in the encoder to simulate users spatial perception of interface layout, and utilises an attention mechanism in the decoder to achieve dynamic adaptation to different task scenarios. Experiments show that this method achieves a BLEU-4 score of 82.4% on the RICO dataset, with the edit distance reduced to 7.1. Furthermore, its performance degradation is minimal after adding noise and blur interference, demonstrating good robustness. In practicality evaluation, the generated interface received a comprehensive score of 4.23 from participants, especially receiving the highest recognition among the mobile learning teacher group. The method proposed in this paper effectively achieves accurate and stable generation from high-fidelity images to interface trees, providing a solution with both theoretical guidance and practical value for the automated design of mobile learning UIs.
    Keywords: situational cognition; mobile learning; UI design; development.
    DOI: 10.1504/IJICT.2026.10078450
     
  •   Free full-text access Open AccessAdvancing the application of intelligent design systems in adaptive co-creation models using AIGC and reinforcement learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Tian Liu 
    Abstract: In current intelligent design systems, text prompt word optimisation is a key challenge to improve the quality of image generation. Aiming at the problems of uncertain direction of prompt words and difficult quality evaluation, this paper develops an adaptive co-creation based on AIGC and reinforcement learning. The model adopts a three-stage training framework: first, supervised fine-tuning of the mapping relationship of prompt word pairs is performed, then multi-modal visual feedback of PickScore and aesthetic value models is fused through reward modelling, and finally reinforcement learning is used to fine-tune and optimise the generation strategy. The model performs well in many indexes. In the performance test, the FID of the model decreases to 15.2 +- 1. 5 and the IS increases to 8.9 +- 0. 6. In the robustness test, the FID of the model is 17.5 +- 1. 7 under the noise prompt. In addition, in the practical test, the overall user satisfaction reaches 4.4 +- 0. 3. These results show that the model realises the collaborative optimisation of prompt words and image generation through adaptive mechanism, and provides an efficient end-to-end optimisation scheme. However, the model still needs to be further refined. Therefore, in future research directions, it is necessary to focus on reducing complexity and enhancing real-time feedback integration.
    Keywords: artificial intelligence generated content; AIGC; reinforcement learning; adaptive; co-creation model.
    DOI: 10.1504/IJICT.2026.10078495
     
  •   Free full-text access Open AccessDeep learning-based public crisis event identification for multimodal data contexts
    ( Free Full-text Access ) CC-BY-NC-ND
    by Wei Gao 
    Abstract: Public crisis events surface across text, sensors, imagery, and logs, yet single-source detectors miss early weak cues. To address fragmented evidence, this study presents a multimodal bidirectional transformer for crisis recognition. First, source-aware tokens preserve time, space, provenance, and quality while synchronisation gates align asynchronous streams. Then, cross-source attention separates corroboration from dissent and memory tokens retain long-range hints. Finally, self-supervised pretraining and calibrated classification deliver auditable alerts. On composite streams, the method reaches AUPRC 0.612, AUROC 0.915, F1 0.672, ECE 0.038, and average lead time 31.6 minutes, exceeding the best baseline by 7.9 AUPRC points, 3.2 AUROC points, 6.9 F1 points, and 8.5 minutes. These gains provide earlier, more reliable, and well-calibrated alerts for public response.
    Keywords: event identification; multimodal data; bidirectional transformer; spatiotemporal alignment.
    DOI: 10.1504/IJICT.2026.10078527
     
  •   Free full-text access Open AccessIntegrating deep learning and GIS technology for optimising rural tourism development paths
    ( Free Full-text Access ) CC-BY-NC-ND
    by Yahui Sun 
    Abstract: In response to the problems of homogenisation in rural tourism development and inefficient resource allocation, this study has designed a rural tourism development path optimisation method that integrates geographic information system and deep learning. Firstly, multiple sources of spatial data are integrated, and convolutional neural networks are used to automatically predict the potential of rural tourism development. Subsequently, using the potential spatial distribution as input, a mathematical model for path optimisation is constructed, and an improved deep reinforcement learning method is employed, incorporating local operators to perform neighbourhood search and iteratively improve the initial solution. The experiments show that the net benefit value achieved by the proposed method on the standard test set is 44.05, and the solution time is only 3.14 seconds, which is significantly better than the comparison algorithms, providing an effective solution for the optimisation of rural tourism development paths.
    Keywords: rural tourism; optimisation of development path; deep learning; deep reinforcement learning; neighbourhood search.
    DOI: 10.1504/IJICT.2026.10078528
     
  •   Free full-text access Open AccessDynamic sound field reconstruction with multi-channel broadcasting systems in immersive virtual environment
    ( Free Full-text Access ) CC-BY-NC-ND
    by Zhimei Li, Junping Huang, Mingzhu Zhang 
    Abstract: The free movement of sound sources and listeners in immersive virtual reality poses a significant challenge for dynamic sound field reconstruction. This study proposes a real-time, high-fidelity reconstruction method based on a multi-channel loudspeaker system. A time-varying spherical harmonic coefficient field is first constructed to parametrically represent the dynamic sound field. An optimisation algorithm integrating perceptual weighting and sparse constraints is then designed to achieve high-quality reconstruction under limited physical loudspeaker channels. Experimental results demonstrate that the proposed method significantly outperforms conventional higher-order ambisonics decoding, vector base amplitude panning, and existing deep learning approaches. Key improvements include reduced normalised field error, lower perceptual spectral distortion, and higher azimuth estimation accuracy, all while satisfying real-time processing requirements. Experiments show that, the proposed method reduces the normalised field error by more than 40% compared to the next-best method, while maintaining a frame processing latency of less than 20 milliseconds. These advancements collectively enhance the auditory immersion in dynamic virtual environments.
    Keywords: dynamic sound field reconstruction; immersive audio; spherical harmonics; sparse optimisation; compressed sensing; spatial hearing.
    DOI: 10.1504/IJICT.2026.10078529
     
  •   Free full-text access Open AccessEvolution of cultural community interaction networks and information propagation based on dynamic interest graph
    ( Free Full-text Access ) CC-BY-NC-ND
    by Lu Wang 
    Abstract: This study addresses the inaccuracy of traditional static network models in predicting rapidly evolving interests within cultural communities by proposing an interactive network evolution model based on dynamic interest graphs. We find that members shifting interests make information propagation paths unpredictable - for instance, trending topics in music communities may cycle as frequently as every two weeks. To tackle this challenge, we developed a graph model incorporating time-decay factors that captures real-time changes in interest similarity. Experiments demonstrate that compared to traditional static graph methods, our model achieves a 12.7% improvement in community structure prediction accuracy and reduces prediction error for information reach by 18.3%. This work offers new insights into understanding the dynamic evolution of cultural communities and enabling precise content dissemination.
    Keywords: dynamic interest graph; cultural communities; network evolution; information dissemination.
    DOI: 10.1504/IJICT.2026.10078530
     
  •   Free full-text access Open AccessDynamic resource allocation in smart laboratories based on multi-agent reinforcement learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Hua Yang, Xiangdong Liang, Jingwei Li 
    Abstract: To address issues such as uneven equipment utilisation and delayed user demand response in traditional smart laboratory resource allocation, this paper proposes a multi-agent deep reinforcement learning framework based on attention mechanisms. The research motivation stems from the collaborative scheduling challenges posed by heterogeneous equipment and dynamic task requests. This method employs a centralised training and distributed execution architecture, enabling agents to learn cooperative strategies in partially observable environments. It further incorporates a demand forecasting module to enhance allocation foresight. Experiments on public datasets and simulation environments demonstrate that the proposed method significantly outperforms traditional genetic algorithms and single-agent reinforcement learning approaches in both resource allocation quality (normalised discounted cumulative gain @5 reached 0.87) and overall utilisation (area under the curve improvement of 15.2%), validating its effectiveness and adaptability in complex laboratory scenarios.
    Keywords: multi-agent reinforcement learning; MARL; smart laboratory; dynamic resource allocation; attention mechanism.
    DOI: 10.1504/IJICT.2026.10078531
     
  •   Free full-text access Open AccessVisual feedback-driven active perception by drone swarms for proactive crowd anomaly capture
    ( Free Full-text Access ) CC-BY-NC-ND
    by Zhaorong Han, Dongqi Liu 
    Abstract: This study addresses the inefficiency and passivity of surveillance in detecting crowd anomalies across wide, dynamic environments using unmanned aerial vehicles. To address this, this paper proposes an active perception framework for drone swarms that is driven by real-time visual-semantic feedback. The framework couples a spatiotemporal graph attention network, which models crowd interactions and infers anomaly probabilities, with a cooperative multi-agent reinforcement learning decision making module. This integration enables the swarm to dynamically and collaboratively optimise viewpoints based on live semantic cues. Evaluated on the VisDrone dataset, our approach achieves an anomaly capture rate of 89.7%, an average response delay of 1.9 seconds, an operational efficiency of 1.86 events per kilometre flown, and a low observation redundancy of 22.1%. These results demonstrate that embedding visual semantics into a closed perception-control loop significantly enhances the performance of proactive monitoring systems compared to existing baseline methods.
    Keywords: drone swarm; active perception; crowd anomaly detection; visual feedback; cooperative reinforcement learning.
    DOI: 10.1504/IJICT.2026.10078532
     
  •   Free full-text access Open AccessGeneration of virtual character interaction logic driven by multimodal behavioural data
    ( Free Full-text Access ) CC-BY-NC-ND
    by Pengfei Ma 
    Abstract: The generation of adaptive interaction logic for virtual characters remains challenging, as traditional rule-driven methods often produce rigid and contextually insensitive behaviours. To overcome this, we present the multimodal meta-generation network, a multimodal behaviour data-driven framework that synthesises natural and socially appropriate interaction logic from streams including speech, posture, and facial expression. The framework employs cross-modal temporal alignment and hierarchical reinforcement learning to fuse asynchronous signals and enable joint strategy planning with action execution. A causal reasoning module is integrated to enhance social rationality. Experiments on public multimodal interaction datasets demonstrate that our method significantly outperforms baseline models, achieving an F1-score of 0.795 in accuracy and a human subjective score of 4.3 out of 5.0 in naturalness. This research provides a practical solution for deploying adaptive virtual characters in fields such as the metaverse, intelligent education, and remote collaboration.
    Keywords: multimodal learning; virtual characters; interaction logic generation; reinforcement learning; behaviour analysis.
    DOI: 10.1504/IJICT.2026.10078533
     
  •   Free full-text access Open AccessEmotion representation and recognition in oil paintings via meta-learning and semantic augmentation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Wei Li 
    Abstract: Automated emotion analysis in visual art remains a significant challenge, primarily due to the paucity of annotated data and the profound stylistic and semantic gap between generic image understanding and domain-specific artistic interpretation. This study introduces a novel meta-learning framework enhanced with structured semantic knowledge for few-shot emotion recognition in oil paintings. The proposed model integrates a dual-path architecture: a meta-learning pathway for rapid visual adaptation and a semantic pathway that incorporates contextual art historical knowledge. These pathways are fused through a hierarchical cross-modal attention module, which dynamically aligns visual features with relevant semantic concepts during the learning process. Extensive evaluations on the ArtEmis dataset demonstrate the frameworks superior performance, achieving state-of-the-art macro-accuracy of 68.7% (1-shot) and 81.3% (5-shot). The results confirm the models efficacy in achieving robust, generalisable, and interpretable emotion analysis with limited data, advancing the field of computational art understanding.
    Keywords: oil painting emotion recognition; few-shot learning; meta-learning; semantic enhancement; interpretable artificial intelligence.
    DOI: 10.1504/IJICT.2026.10078534
     
  •   Free full-text access Open AccessDeep learning-driven multimodal early warning analysis for intelligent security in coal mine camps
    ( Free Full-text Access ) CC-BY-NC-ND
    by Chaoyi Zhou, Lanfeng Zhang, Huiwei Wang 
    Abstract: Coal mine safety is crucial for both life and production. However, traditional monitoring relies on a single sensor, resulting in a high rate of missed alarms in complex underground environments. To address multiple challenges such as changes in light, dust interference, etc., this study proposes a deep learning early warning system that integrates video, infrared, and vibration data. Through cross-modal feature fusion and multi-task learning, it achieves collaborative perception of abnormal human behaviors and equipment failures. Experimental results show that the system achieves an area under the curve of 0.982 for abnormal behavior detection on public datasets, which is approximately 7% higher than that of a single visual model; The accuracy of fire warning reaches 96.7%, and the false alarm rate is reduced by 5.3%. This method provides a highly reliable and scalable technical path for intelligent safety monitoring in coal mines around the clock.
    Keywords: coal mine security; multi-modal fusion; anomaly detection; intelligent early warning.
    DOI: 10.1504/IJICT.2026.10078535
     
  •   Free full-text access Open AccessUtility-driven simulation modelling and multi-objective evolutionary optimisation for BIM-based construction emission reduction
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ling Yuan 
    Abstract: This research tackles the dynamic optimisation of construction carbon emissions by proposing a novel integration of building information modelling, utility-driven simulation, and multi-objective evolutionary computation. A core methodological contribution is a formalised utility function that quantifies and embeds project-specific decision-maker preferences for time, cost, and emissions directly into the optimisation search process. This function guides a bespoke evolutionary algorithm to automatically generate efficient, low-carbon construction plans, which are accurately evaluated by a high-fidelity discrete-event simulation engine using enriched building information modelling data. In an experimental study based on a publicly available office building dataset, our framework demonstrates superior performance, outperforming state-of-the-art benchmarks with a 5.6%-15.3% gain in hypervolume and achieving an 18.7% reduction in on-site emissions. This work provides a rigorous and actionable decision-support system for advancing sustainable construction practices.
    Keywords: building information modelling; BIM; construction carbon emission; multi-objective optimisation; utility function; discrete-event simulation.
    DOI: 10.1504/IJICT.2026.10078536
     
  •   Free full-text access Open AccessAdversarial machine learning algorithms for English translation quality estimation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Sumei Dou 
    Abstract: Reliable evaluation of the quality of machine translation is essential to ensure a reliable automatic translation system. However, adversial attacks can reduce evaluation performance by subtly disturbing sentences and endanger the security of key applications. This paper proposes a comprehensive confrontational robustness enhancement framework specially designed for translation quality evaluation, an adversarial robustness enhancement framework. The framework integrates a multi-grained confrontation sample generator, a dynamic confrontation training mechanism based on the relaxation of master and apprentice labels, and online defense module. The experiment was verified on the machine translation and multilingual quality evaluation seminar and post-editing task data set: this method increased the robustness of the model by 34.2%, reduced the average prediction error from 18.7% to 12.3% in the attack state. The framework shows stable performance in multiple fields providing an effective solution for building safe and reliable actual scene translation quality evaluation system.
    Keywords: translation quality assessment; adversarial machine learning; robustness assessment; natural language processing; NLP; model security.
    DOI: 10.1504/IJICT.2026.10078537
     
  •   Free full-text access Open AccessSocial sentiment early warning system integrating transformers and explainable SHAP values
    ( Free Full-text Access ) CC-BY-NC-ND
    by Yutan Wang 
    Abstract: In recent years, social media has become a key medium for public emotion and thought dynamics, making its sentiment analysis crucial for event prediction. However, while mainstream deep learning models achieve accurate predictions, their black box decision process hampers reliable warning. This paper thus innovatively integrates the powerful transformer model with interpretable Shapley additive explanations values to construct a social sentiment warning framework with both high accuracy and transparency. Experiments on public datasets show the methods comprehensive warning performance significantly outperforms traditional models: area under the curve reaches 0.872, which is approximately 7.5% higher than the classic long short-term memory model, overall warning accuracy rises to 85.6%, and the false alarm rate drops by nearly 12%. This provides an effective solution for reliable and interpretable automated social sentiment perception and early risk warning.
    Keywords: emotional warning; transformer; explainable artificial intelligence; SHAP value.
    DOI: 10.1504/IJICT.2026.10078538
     
  •   Free full-text access Open AccessFault diagnosis and self-healing of power line carrier communication enabled by artificial intelligence: smart grid application based on data mining
    ( Free Full-text Access ) CC-BY-NC-ND
    by Benrong Wang, Wei Huang 
    Abstract: Aiming at the core problems faced by the power line carrier communication system of smart grid, such as severe channel attenuation, complex and changeable interference sources, delayed fault location and slow self-healing response, and the existing methods have significant shortcomings in reliability, real-time and scene adaptability, this paper proposes an intelligent optimisation scheme based on the integration of data mining and artificial intelligence (IOSDM-AI). Through the multi-model coupling mechanism, the scheme is driven by real-time communication data flow to achieve accurate fault diagnosis, rapid location and adaptive self-healing, while ensuring the stable operation of the system. The results show that the accuracy of IOSDM-AI algorithm is 94.7%, the diagnosis delay is reduced to 108.3 ms, and the missed diagnosis rate and misdiagnosis rate are as low as 0.7% and 2.1%, respectively. The self-healing success rate is 96. 8%, the average self-healing time is reduced to 2.3 s, the communication link stability index is 9.28, and the self-healing strategy execution accuracy (FSE) is 428.
    Keywords: smart grid; power line carrier communication; artificial intelligence; fault diagnosis; self-healing mechanism; data mining.
    DOI: 10.1504/IJICT.2026.10078539
     
  •   Free full-text access Open AccessA multi-objective optimisation model for the spatial layout of public art
    ( Free Full-text Access ) CC-BY-NC-ND
    by Xin Wen 
    Abstract: This study proposes a multi-objective optimisation model to address the strategic placement of public art, aiming to balance spatial efficiency, social equity, and economic cost. The data-driven framework incorporates dynamic mobility patterns, multidimensional socioeconomic indices, and urban walkability networks. It concurrently optimises three objectives: maximising weighted accessibility coverage, minimising the Gini coefficient of accessibility, and reducing total expenditure. A novel cognitive heuristic adaptive search algorithm is introduced, which embeds domain knowledge of urban spatial structure to solve this high-dimensional problem. Empirical validation using Manhattan data confirms the algorithms superior performance, demonstrating a 10.8% to 24.1% improvement in the hypervolume metric over standard multi-objective evolutionary algorithms. The resulting Pareto-optimal solutions quantify clear trade-offs, such as a 142% gain in coverage efficiency or a 48% reduction in accessibility inequality, thereby establishing a scientific basis for equitable cultural resource planning.
    Keywords: public art placement; spatial optimisation; social equity; multi-objective optimisation; data-driven decision support.
    DOI: 10.1504/IJICT.2026.10078540
     
  •   Free full-text access Open AccessLightweighting and end-side deployment of multi-modal large model based on cross-modal attention distillation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Kaijie Liu, Yun Dong, Zhipeng Meng, Qi Chen, Qiwen Tan, Zonghui Wei, Qi Meng 
    Abstract: To address the issue of high cost incurred by multimodal large models in visual-language tasks, this paper proposes a lightweight model, CAD-LM, based on cross-modal attention distillation. It designs an adaptive modal reparameterisation module that utilises a multi-branch structure to enhance representational power during training and reparameterises it into an efficient single-branch structure during inference. This paper integrates an end-to-end deployment optimisation process that encompasses hardware-aware pruning, mixed-precision quantisation, and efficient parameter fine-tuning. Experiments show that CAD-LM successfully compresses the model parameter count to 118.4 M and reduces computational complexity to 12.1 GFLOPs, achieving approximately 20% and 30% reduction, respectively. Its performance on benchmark tasks such as Flickr30k and VQA v2.0 significantly surpasses baseline models like the original-scale CLIP and ALBEF. Edge deployment verification reveals that the final model occupies only 89.4 MB of memory and boasts millisecond-level inference latency, achieving an excellent balance between model performance, computational efficiency, and engineering practicality. This provides an efficient solution for multi-modal applications in resource-constrained environments.
    Keywords: attention distillation; multi-modal; large model; lightweight; end-side deployment.
    DOI: 10.1504/IJICT.2026.10078541
     
  •   Free full-text access Open AccessA utility-aware scheduling model for online learning tasks based on dynamic psychological cognitive load perception
    ( Free Full-text Access ) CC-BY-NC-ND
    by Zhao Wang, Jingru Yu 
    Abstract: In online learning scenarios, the dynamic fluctuations of students cognitive load directly impact learning outcomes. Existing scheduling models, lacking real-time perception of cognitive load states, often result in mismatches between task assignments and learner capabilities. To address this, this paper first processes student contextual features using feature selectors and self-attention mechanisms, then predicts response performance based on cognitive load state. Subsequently, an online learning task utility scheduling model is constructed based on cognitive load diagnosis results. A multidimensional utility function for online learning tasks is designed, establishing a scheduling objective function that maximises this utility. Finally, an improved particle swarm optimisation algorithm solves the objective function to derive the optimal online learning task utility scheduling strategy. Experimental results demonstrate that the proposed method achieves scheduling times of 3.3 ms and a success rate of 98.3%, outperforming baseline methods with significantly higher scheduling efficiency.
    Keywords: online learning; task utility scheduling; cognitive load; cognitive diagnosis; attention mechanism.
    DOI: 10.1504/IJICT.2026.10078542
     
  •   Free full-text access Open AccessA case study on vision-based intelligent inspection for surface defect detection in CNC machining using YOLOv7 and transfer learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Peng Zhou, Shanshan Kong 
    Abstract: This paper introduces vision-based intelligent inspection system to detect surface defects in CNC-machined parts by using a YOLOv7-based transfer learning model. The manual inspection systems used traditionally are laborious, subjective, and prone to mistakes, thus necessitating automated solutions. The suggested YTL-ISDD system takes advantage of the high-resolution images taken under controlled lighting to detect defects, including scratches, cracks, pits, and burrs. The model makes use of pre-trained YOLOv7 weights that improve feature extraction and minimise the training time. The amount of data augmentation, such as rotation, scaling, flipping, and contrast adjustment, is used to enhance robustness and generalisation in different conditions of surfaces. The system can be easily integrated into CNC production environments, and it can detect defects in real-time, accurately, and consistently with minimum human involvement, which enhances the quality of products and efficiency of production.
    Keywords: CNC machining; surface defect detection; YOLOv7; transfer learning; intelligent inspection.
    DOI: 10.1504/IJICT.2026.10078543
     
  •   Free full-text access Open AccessMultimodal knowledge fusion and intelligent generation model in decision support systems for energy industry
    ( Free Full-text Access ) CC-BY-NC-ND
    by Wenke Li, Xinping Miao, Ruoyan Dong, Yue Tian, Qingbo Kong 
    Abstract: Faced with the challenges of complexity and uncertainty of multi-source heterogeneous data in energy industry decision support systems and the dynamic environment adaptation needs brought about by smart grids and renewable energy access, this study aims to build an adaptive, strong and robust multi-modal knowledge fusion and intelligent generation model to improve the accuracy and reliability of decision-making. By innovatively integrating hypergraph attention networks to achieve multi-modal feature fusion, cloud model quantification of data uncertainty, and reinforcement learning framework-driven intelligent policy generation, the model achieves an accuracy rate of 93.5%, an F1 score of 92.8% and RMSE 0.08 in performance tests. In addition, robustness tests show that the model has a change rate of only 3.7% under 30% noise, the decision delay is optimised to 65 milliseconds, and the accuracy rate increases to 89.2% after fine-tuning in cross-scenario generalisation. Overall, the model effectively solves the semantic gap and data quality problems, and provides efficient support for energy scheduling, fault prediction and other scenarios.
    Keywords: energy industry; decision-making; multi-modal; knowledge fusion; intelligent generation.
    DOI: 10.1504/IJICT.2026.10078544
     
  •   Free full-text access Open AccessZero-sample accounting standards migration framework empowered by meta-learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Xianyi Xiong, Shibing Zhang 
    Abstract: With the global economys deep integration, traditional manual migration methods cannot meet efficiency and accuracy needs. This study explores meta learning techniques application in zero sample accounting standard transfer and constructs an intelligent framework to adapt to new accounting standards in various fields. By analysing meta learning mechanisms, an innovative transfer framework is designed to use a small amount of source domain data for rapid adaptation and precise transfer of accounting standards in new fields. Experimental results show that the meta learning empowerment framework significantly improves transfer performance under zero sample conditions. Compared with traditional methods, the average accuracy is up by 12.3%. At different data scales, as the data volume increases from 100 to 1,000, the accuracy improves by 8.5%, 10.2%, and 13.1% respectively. In new accounting standard testing, the average accuracy reaches 85.6%, a 9.4% improvement over traditional methods.
    Keywords: meta-learning; zero-sample learning; accounting standards migration; intelligent framework.
    DOI: 10.1504/IJICT.2026.10078545
     
  •   Free full-text access Open AccessMulti-modal advertising visual information fusion using dynamic heterogeneous graph neural networks
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ruiqi Du 
    Abstract: To enhance the fusion performance of multimodal advertising content in visual comprehension and behaviour prediction, this paper proposes a DHGNN-based multimodal information fusion model. The model comprises modal feature embedding, dynamic node/edge attribute modelling, improved heterogeneous graph convolutions, and a multi-task output layer, optimising CTR, CVR, and semantic classification. Compared with the conventional Hetero GCN and MM Attn models, the proposed DHGNN improved the AUC by 3.8-5.9 p.p. on CTR, and the F1-score was 0.726, which was 8.8 percentage points above Homogeneous GCN. Moreover, the inference latency is reduced to 19.4 ms, while maintaining competitive predictive accuracy. Across Ali-AD and Tencent-Ad360, DHGNN achieves AUC gains of 3.85.9 percentage points on CTR and reaches an F1-score of 0.726 on semantic classification, providing quantitative evidence that the proposed dynamic heterogeneous graph modelling improves multimodal fusion under heterogeneous and time-varying advertising data for multimodal recommendation systems.
    Keywords: multimodal fusion; dynamic heterogeneous graph neural network; DHGNN; ad recommendation; semantic alignment; cross-modal modelling; temporal evolution modelling; heterogeneous graph attention; click-through rate prediction; semantic consistency learning; cold-start advertising scenario.
    DOI: 10.1504/IJICT.2026.10078569
     
  •   Free full-text access Open AccessStudent performance and health management technology based on GPA model and psychological data mining
    ( Free Full-text Access ) CC-BY-NC-ND
    by Miao Li, Jibing Liu, Meng Wang, Yanhua Yang, Yanling Qu 
    Abstract: In the information age, managing student performance and mental health is critical for educational development. Addressing current analytical limitations, this study proposes an intelligent management framework grounded in data analytics and knowledge integration. The system utilises a four-layer architecture integrating distributed databases and deep learning. Specifically, performance prediction uses a GPA-based model, while an improved two-stream encoder network (TSEN) enables mental health monitoring. Experimental results demonstrate the performance model achieves an accuracy of 0.947, a 0.949 F1 score, and an area under the curve of 0.963 for top predictions. For mental health analysis, using eight influencing factors yields 87.5% sensitivity and 90.1% recognition accuracy, with a Matthews correlation coefficient of 0.962 over a 12-week sequence. These results confirm that the approach effectively integrates academic and psychological data, significantly enhancing analytical accuracy and providing robust decision support for student development and educational management.
    Keywords: knowledge-driven student management; GPA-based performance prediction; student mental health analytics; two-stream encoder networks; attention-based data mining; educational decision support systems.
    DOI: 10.1504/IJICT.2026.10078570
     
  •   Free full-text access Open AccessIntegrated design of graphic creativity and visual information visualisation in multimodal contexts
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ming Ren, Tianyuke Wang 
    Abstract: With the rapid advancement of big data and artificial intelligence, information presentation has shifted to a multimodal environment encompassing text, images, audio, video, and biological signals. Traditional visual information visualisation is challenged by limited expression dimensions and insufficient emotional and narrative capacity. The purpose of this study is to explore the theory and technical path of deep integration of graphic creativity and visual information visualisation in multimodal context. Firstly, the theoretical basis of multimodal cognition and perception is constructed; then, a technical framework integrating multimodal data perception, cross modal feature alignment, creative semantic generation and visual mapping is proposed; by introducing generative AI, attention mechanism and style transfer technologies, a data story aesthetic driven fusion design method is designed; finally, two case studies verify its effectiveness in enhancing information interpretation efficiency and user experience, offering theoretical support and technical solutions for complex information design.
    Keywords: multimode; graphic creativity; visual information visualisation; fusion design.
    DOI: 10.1504/IJICT.2026.10078571
     
  •   Free full-text access Open AccessGenerative AI and cybersecurity in 6G for intelligent English language learning systems
    ( Free Full-text Access ) CC-BY-NC-ND
    by Lixia Xiang, Mei Wan, Ruobing Li, Cong Lin, Pan Zhang 
    Abstract: The convergence of generative artificial intelligence (AI) and sixth-generation (6G) communication technologies is changing how intelligent English language learning systems work. This paper presents learning with intelligible generative AI and 6G-based secure architecture (LINGUA-6G), a cohesive framework that amalgamates generative AI with ultra-low-latency, high-reliability 6G networks. Tests show that fluency improved by 28.7%, vocabulary retention by 32.4%, and latency by 41.2%. Network testing showed a response time of less than 90 ms, an availability rate of more than 96%, and stable performance. Blockchain identity verification and AI-based intrusion detection cut down on unauthorised access by 89.5% and found 96.8% of threats, proving that 6G-enabled AI learning ecosystems are safe, scalable, and efficient.
    Keywords: generative artificial intelligence; 6G communication networks; intelligent language learning; blockchain-based identity verification; AI-driven intrusion detection.
    DOI: 10.1504/IJICT.2026.10078572
     
  •   Free full-text access Open AccessExplainable AI-assisted creative system for visual communication design: based on diffusion models and user intent understanding
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jue Wang, Wu Song 
    Abstract: At present, visual communication design AI assistant tools have problems such as user intention understanding deviation, unexplainable generated results, fragmented interaction and low cross-module collaboration efficiency. This paper proposes an interpretable AI assistant creative system integrating diffusion model, user intention understanding and advanced communication technology. Its core is DesignXAI with the logic of intent transfer-controllable generation-process interpretation-collaborative optimisation. Experiments show the system outperforms mainstream models, with 91% intention understanding accuracy, 89.3 user satisfaction and 0.87 intention-result semantic consistency, providing an efficient intelligent auxiliary method.
    Keywords: visual communication design; explainable AI; diffusion model; user intent understanding; semantic communication; edge computing.
    DOI: 10.1504/IJICT.2026.10078573
     
  •   Free full-text access Open AccessSTEAM art course design combining generative AI and prompt engineering
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jie Shi, Yikun Li 
    Abstract: This study proposes a dual-driven STEAM art course framework integrating generative AI and prompt engineering to address key challenges in creativity support, personalisation, and AI tool integration. A four-layer topology aligns course goals, content, interaction, and evaluation, combining generative AIs creative abilities with prompt engineerings precision. Hierarchical prompt strategies enable stepwise creative guidance, while a course-AI feedback loop adapts to learner needs. A multidimensional evaluation system assesses creative expression, skill development, and thinking growth. Results show a 42.3% increase in creative work scores, 91.7% skill proficiency, 4.8 satisfaction (out of 5), and 35.6% higher teaching efficiency. Personalised teaching coverage rose from 38% to 89%. The framework performs effectively across diverse age groups and skill levels, offering a scalable path for intelligent art education in K-12 and training contexts.
    Keywords: STEAM art course; generative AI; prompt engineering; course design; creative cultivation.
    DOI: 10.1504/IJICT.2026.10078574
     
  •   Free full-text access Open AccessEmotion-driven recommender system for low-carbon products: sentiment feedback and satisfaction evaluation from online reviews
    ( Free Full-text Access ) CC-BY-NC-ND
    by Dongyi Zhang, Shulan Yu 
    Abstract: The aim of this research is to fully examine the consideration and emotional inclination of Chinese consumers towards low-carbon products and offer empirical evidence in order to raise the level of awareness of the population on low-carbon consumption. Towards this, 62,271 online reviews of common low-carbon products in JD.com which is a Chinese e-commerce were gathered. The reviews contained six types of consumer goods used every day: paper products and cleaning supplies, household goods, electronic appliances, clothing and accessories, home improvement and decoration, and beauty and personal care. SnowNLP natural language processing sentiment analysis component was used to determine the intensity of emotion of consumers on different low-carbon products. Moreover, most of the online reviews were analysed and latent topics were identified with the LDA topic model as part of establishing the impact of these latent topics on the positive and negative feelings of the consumers.
    Keywords: low-carbon products; consumer sentiment; online reviews; consumer satisfaction; topic modelling; ICT-based analytics; e-commerce.
    DOI: 10.1504/IJICT.2026.10078575
     
  •   Free full-text access Open AccessMulti-modal e-commerce data analysis system based on deep learning: visual perception and emotional computing
    ( Free Full-text Access ) CC-BY-NC-ND
    by Chunsheng Zhang 
    Abstract: E-commerce platforms generate vast multi-modal data (product images and user reviews), whose integrated analysis is crucial for enhancing user experience and decision making. However, existing methods often treat visual perception and text sentiment analysis separately, limiting cross-modal semantic collaboration. Therefore, a multi-modal hierarchical collaborative fusion model (MHCFM) that unifies product visual attributes, aesthetic quality, scene context, and textual emotion is proposed via cross-modal alignment and hierarchical adaptive fusion. The model integrates a hierarchical visual transformer, a dual-branch aesthetic network, a graph convolutional scene module, and a hierarchical adaptive fusion network. Experiments on public and large-scale e-commerce datasets showed the sentiment analysis accuracy exceeded 93%, the inference time was 2223 ms, outperforming mainstream models. In cross-cultural and multi-category tests, the average accuracy was 91.5%, demonstrating robustness. The proposed model enhances visual-textual collaboration, offering an efficient solution for intelligent product analysis and user experience optimisation in e-commerce.
    Keywords: multi-modal sentiment analysis; visual transformer; cross-modal alignment; hierarchical adaptive fusion; e-commerce; multi-modal hierarchical collaborative fusion model; MHCFM.

  •   Free full-text access Open AccessData fusion-driven choreography-multimedia integration: a collaborative framework for coherent multi-sensory performance experiences
    ( Free Full-text Access ) CC-BY-NC-ND
    by Yuyu Li, Bin Li 
    Abstract: This paper presents a data fusion-driven framework that integrates choreography with multimedia technology to create coherent and immersive multi-sensory performance experiences. The study aims to enhance audience engagement by seamlessly embedding dynamic visual and auditory elements into choreographic expressions, thereby redefining traditional performance boundaries. A collaborative model between choreographers and multimedia technologists is proposed to ensure artistic and technical coherence. Methodologically, the research develops and implements techniques for embedding multimedia features into choreographic sequences, along with detection algorithms for synchronising digital and live performance components. The results demonstrate that the proposed integration framework effectively facilitates real-time coordination between movement and multimedia content, enriching narrative expression and sensory impact. This work contributes to the evolving field of digital performance by offering a structured approach to multidisciplinary collaboration and technological integration in the arts.
    Keywords: multimedia; choreography; integration.
    DOI: 10.1504/IJICT.2026.10078624
     
  •   Free full-text access Open AccessDesign of sharing model of information-based teaching materials upon deep learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Zhou Zhou, Zishuai Zhou, Fangfang Zhang 
    Abstract: The imbalance of the regional economy and the tilt of national policy lead to the degree of development among universities. To help more backward universities improve their academic level and narrow the development gap between universities, we propose a teaching information-based resource-sharing (TIRS) model based on deep learning (DL) for college. Firstly, we analyse the types and characteristics of teaching resources and establish a sharing platform for university teaching resources. Then, we propose a label quantification method for teaching resources upon DL to extract the features of each resource in the sharing platform and assign labels. Finally, we propose a teaching resources retrieval method by the bag of words model to improve the efficiency of the TIRS model. The experiment demonstrates that the TIRS model for colleges by deep learning can provide good teaching services for teachers and students, and the objective accuracy and subjective accuracy of retrieval can reach 87.2%and 86.7% respectively, which provides technical support for resource sharing of colleges.
    Keywords: teaching resources; sharing model; deep learning; DL.
    DOI: 10.1504/IJICT.2026.10078625
     
  •   Free full-text access Open AccessMeasuring echo chamber effects in youth online communities with temporal clustering
    ( Free Full-text Access ) CC-BY-NC-ND
    by Xieer Jiang 
    Abstract: Echo chambers in youth online communities intensify opinion polarisation, yet existing methods rely on static snapshots and single-modal features, neglecting integration with social-psychological theories. This paper proposes the temporal clustering for echo chamber measurement framework, unifying dynamic graph neural networks with joint temporal clustering. Using Reddit data from 2018 to 2022, temporal user interaction networks are constructed, and theory-driven indicators interaction homogeneity, topic convergence, and attitude polarisation are extracted based on social identity and cognitive dissonance theories. A temporal graph attention network learns evolving node representations, followed by joint optimisation of temporal K-means and spectral clustering to identify echo chambers and quantify intensity trajectories. Experiments show the framework outperforms baselines, achieving an adjusted Rand index of 0.89 and an F1-score of 0.88. It captures echo chamber dynamics during major events, offering an interpretable tool for understanding online polarisation in youth communities.
    Keywords: echo chamber effect; temporal clustering; dynamic graph neural networks; youth online communities; social media analysis.
    DOI: 10.1504/IJICT.2026.10078626
     
  •   Free full-text access Open AccessGenerative AI and multimodal learning spaces for perceiving and regulating English learning anxiety
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jian Zu 
    Abstract: This study has constructed a generative artificial intelligence-driven multimodal learning space for real-time perception and regulation of the anxiety states of English learners. By integrating facial expression analysis, speech feature extraction, and behaviour data modelling, the system achieved an accuracy rate of 87% in identifying learning anxiety, which was over 9% higher than that of single-modal methods. In a six-week intervention experiment, the state anxiety scores of the experimental group decreased by 31.6% compared to the control group, while the oral fluency of the experimental group improved by 24.3%. The research proves that the collaborative intervention of multimodal emotion computing and generative artificial intelligence can effectively break through the bottlenecks of delayed anxiety identification and single intervention methods in traditional teaching, providing a feasible path for the emotional adaptation of intelligent language learning environments.
    Keywords: generative artificial intelligence; multimodal learning; english learning anxiety; affective computing.
    DOI: 10.1504/IJICT.2026.10078627
     
  •   Free full-text access Open AccessAn attention-based multi-engine architecture enhances the ability of English translation to resolve ambiguity
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ziao Zhang 
    Abstract: Language ambiguity is the core challenge in machine translation. To overcome the limitation of existing neural machine translation (NMT) models that are prone to semantic deviations in complex contexts, this paper proposes a multi-engine architecture based on the attention mechanism. It achieves ambiguity resolution by dynamically integrating the context modelling of neural networks, the grammar constraints of the rule engine, and the historical knowledge of the statistical engine. Experiments on public datasets such as workshop on machine translation and discourse in machine translation show that this system improves the bilingual evaluation understudy score by 2.1 points in the English-German translation task, increases the accuracy of ambiguous phrase translation by 15.7%, and its semantic coherence (normalised discounted cumulative gain) is significantly better than the mainstream baseline models. This research provides an effective solution for building a robust and accurate context-aware translation system.
    Keywords: multi-engine architecture; attention mechanism; resolution of ambiguity; neural machine translation; NMT; semantic coherence.
    DOI: 10.1504/IJICT.2026.10078628
     
  •   Free full-text access Open AccessModelling news dissemination networks using a community-based graph traversal algorithm and its performance evaluation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Wei Ren, Yu Wang 
    Abstract: In this study, a directed weighted graph containing 500,000 user nodes and 100,000 news records was constructed, node attributes are labelled with user activity and news sentiment tendency, edge weights were determined according to propagation time attenuation (5%/hour) and interaction frequency, the community was divided by Louvains algorithm, and the core nodes were identified by fusing node betweenness centrality and PageRank, and path traversal was optimised with 0.2 restart probability. Compared with traditional methods such as the shortest path algorithm and static community random walk, CGTA achieves 92.3% (76.6% for traditional methods) and 89.1% for core node recall (71.4% for traditional methods), which are 15.7% and 17.7% higher respectively. The structural equation model quantified that user activity (38.2%), emotional tendency (29.5%), and propagation time (22.3%) dominated the path formation, and the propagation speed decreased by 12.3% (r = -0.78) for every 1 hop increase in path length, and the core node expansion effect reached 67.5%. The study integrates multi-source data from Weibo and Toutiao, and cross-validates it with 5% to confirm the effectiveness of the algorithm in breaking/regular news and communities of different sizes.
    Keywords: CGTA algorithm; news communication path; core node; communication influence factors.
    DOI: 10.1504/IJICT.2026.10078713
     
  •   Free full-text access Open AccessAttentional dual-branch shallow feature enhancement and gated fusion for improved image copy-move forgery detection
    ( Free Full-text Access ) CC-BY-NC-ND
    by Zirui Qi, Yilihamu Yaermaimaiti, Fusheng Zhao 
    Abstract: Detecting subtle tampering traces within complex backgrounds remains a significant challenge in image copy-move forgery detection, primarily due to the inadequacy of shallow feature extraction. To overcome these limitations, this paper proposes an enhanced DeepLabV3+ architecture designed for efficient multi-scale feature fusion. The framework utilises a lightweight MobileNetV3 backbone within an encoder-decoder structure, integrated with an improved atrous spatial pyramid pooling (ASPP) module employing depthwise separable dilated convolutions. To strictly preserve low-level details, we introduce a dual-branch shallow feature enhancement module (dual-branch SFEM) augmented by efficient channel attention (ECA). Furthermore, the feature fusion stage is optimised through architectural restructuring to reduce computational complexity while maintaining performance. A key innovation is the inclusion of a lightweight gating network that generates spatially adaptive weights, dynamically balancing the trade-off between semantic abstraction and detail preservation. Extensive experiments on the CASIA, DEFACTO, and COVERAGE datasets demonstrate the model's superiority over state-of-the-art methods. Specifically, the proposed method achieves an AUC of 95.41% and an F1 score of 77.24% on the DEFACTO dataset, while exhibiting robust generalisation capabilities on CASIA 1.0 (AUC: 78.93%, F1: 57.68%).
    Keywords: image forgery detection; gated fusion; efficient channel attention mechanism; dual-branch shallow feature enhancement module; DB-SFEM.
    DOI: 10.1504/IJICT.2026.10078363
     
  •   Free full-text access Open AccessBlockchain-enabled secure distance learning platforms for higher education
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jianmin Chen, Xue Chang 
    Abstract: Some people are concerned about the protection of students' privacy, the authenticity of the materials used in the courses, and the safety of students communicating with one another in virtual learning settings. The growing number of educational institutions offers online degree programs. There is a single point of failure in older systems, which could be exploited to alter academic records, gain unauthorised access to systems, and cause problems. It is caused by the fact that earlier systems were designed to function well with centralised architecture. BESDL, which stands for blockchain-enabled secure distance learning, is one approach that could address these issues. The blockchain technology's permanent record and decentralised consensus make data more trustworthy, open, and dependable than it would otherwise be. The three primary components of this system are the secure content-based access control (SCBAC), the decentralised identity management (DIM), and the encrypted content delivery (ECD) protocols, which collaborate to safeguard educational resources. According to the test findings, BESDL enhances system security, maintains examinable academic records, and accelerates the checking process. To summarise, BESDL is not only dependable but also flexible, making it an excellent option for future higher education institutions that will combine online learning.
    Keywords: blockchain; distance learning; higher education; smart contracts; secure authentication.
    DOI: 10.1504/IJICT.2026.10078368
     
  •   Free full-text access Open AccessPopular music accompaniment generation methods based on the MuseFlow model and sliding window design
    ( Free Full-text Access ) CC-BY-NC-ND
    by Yufeng Wang 
    Abstract: To enhance pop music creation, this study proposes an automatic accompaniment generation method combining sliding window technology with the MuseFlow model. The sliding window segments long music sequences into short-time overlapping frames, balancing time and frequency resolution to capture local signal characteristics. MuseFlow employs an enhanced bidirectional mapping architecture and training objectives to accurately model complex relationships in multi-track music data. Experimental results show that MuseFlow achieves Fréchet inception distance (FID) scores of 26.3 on the POP909 dataset and 25.4 on the FreeMidi dataset, significantly outperforming baseline models. These findings demonstrate that the proposed method generates high-quality, diverse accompaniments compatible with main melodies, providing an efficient tool for music creators.
    Keywords: MuseFlow; sliding windows; SWs; popular music accompaniment; STFT; audio quality; bass track generation; multi-track coordination.
    DOI: 10.1504/IJICT.2026.10078378
     
  •   Free full-text access Open AccessIntelligent generation algorithm for digital image artworks based on decoupling representation and content-aware
    ( Free Full-text Access ) CC-BY-NC-ND
    by Liyuan Zhang 
    Abstract: Focused on calligraphy, this research addresses style transfer distortion and inadequate compositional aesthetics in AI-generated art. We propose an algorithm that integrates decoupling representation learning with content-aware layout modelling. A dual-encoder architecture separates character structure and brushstroke style features, enabling precise and controllable style transfer via dynamic instance normalisation. A visual-linguistic bimodal network with hierarchical spatial modules is introduced to model relationships at the character, line, and global levels. The proposed method achieves a style similarity of 0.751, a content preservation PSNR of 36.9 dB, and an 8%-16% improvement in cross-font generalisation accuracy on unseen characters. For layout generation, the framework maintains a line-spacing fluctuation coefficient of 0.032, achieves a layout aesthetics score of 4.8, and demonstrates strong long-text stability with a cross-page style consistency of 0.94. Ablation studies further confirm the effectiveness of the dynamic weight adjustment mechanism, achieving an optimisation efficiency of 0.98. This work addresses key technical bottlenecks in digital calligraphy generation, providing a practical tool for cultural heritage preservation and a transferable framework for other structured art generation tasks, thereby advancing the integration of artificial intelligence with traditional arts.
    Keywords: decoupled representation learning; content-aware; generative adversarial networks; GANs; digital art generation; deep learning.
    DOI: 10.1504/IJICT.2026.10078377
     
  •   Free full-text access Open AccessPersonalised learning path optimisation in digital English learning environments via multi-factor knowledge tracing and reinforcement learning
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jie Chen 
    Abstract: Digital English learning environments generate massive interaction data, offering potential for adaptive learning path optimisation. However, many existing approaches treat learning state estimation and learning path recommendation as separate tasks, restricting long-term personalised learning support. This study proposes a framework that integrates a multi-factor fusion knowledge tracing (MFFKT) model with reinforcement learning. It jointly analyses behaviour sequences, item attributes, knowledge structures and temporal features to dynamically capture learners' knowledge states, which serve as environment states for long-term reward-driven path optimisation. Experiments on ASSISTments 2017 and EdNet-KT4 show that MFFKT achieves AUC scores of 0.834 and 0.812, surpassing baseline models. Ablation studies validate the efficacy of multi-dimensional feature fusion. When combined with conservative Q-learning, the methods outperform greedy, rule-based, and random strategies in cumulative reward, completion rate, and efficiency. Overall, the proposed framework enables coordinated modelling of learning states and learning path decisions, providing an effective technical approach for adaptive and personalised English learning within digital learning environments.
    Keywords: multi-factor fusion knowledge tracing; MFFKT; RL; digital English learning; personalisation; sequential decision-making.
    DOI: 10.1504/IJICT.2026.10078362
     
  •   Free full-text access Open AccessComputer vision simulation with multimodal data for real-time user interaction in industrial design
    ( Free Full-text Access ) CC-BY-NC-ND
    by Jing Li, Hui Yuan 
    Abstract: To solve the problems of modal heterogeneity, temporal asynchrony and cognitive adaptation imbalance in multimodal real-time interaction, a CLT-driven multi-modal real-time fusion architecture was proposed. Experimental verification on HoloAssist dataset shows that the interactive intention prediction accuracy of the proposed architecture reaches 95.2% ± 1.3%, which is 3.5 percentage points higher than that of AlignMamba model. The end-to-end delay is 0.18 s ± 0.02 s, and the alignment delay is as low as 0.028 s. The subjective score of cognitive load was 3.2 ± 0.8, which was significantly better than the baseline model. Ablation experiments confirm that each core module is crucial to performance improvement, and the model has excellent robustness in scenarios with modal loss and noise interference. This research provides support for the implementation of real-time multimodal interaction technology.
    Keywords: multi-modal fusion; cognitive load theory; optimal transmission; real-time human-computer interaction; adaptive weight.
    DOI: 10.1504/IJICT.2026.10078361
     
  •   Free full-text access Open AccessCollaborative optimisation of emotion regulation and audio synthesis based on PerformanceNet and multi-emotional music generation model
    ( Free Full-text Access ) CC-BY-NC-ND
    by Li Chai 
    Abstract: In response to the three major challenges in AI music generation - limited chord representation, monotonous emotions, and low audio fidelity - this research proposes a novel end-to-end framework termed PEMF that integrates PerformanceNet with a multi-emotion music generation model. The core innovations include a structured four-dimensional chord encoding method using root, third, fifth, and crown notes to expand harmonic diversity to 60 chord types, a dual-encoding transformer architecture that independently processes melody and chord streams for superior structural coherence, and a fine-grained emotion regulation mechanism mapping pitch histograms and rhythm density parameters to Russell's two-dimensional emotion space for continuous control. For audio synthesis, an asymmetric U-net structure combined with a multi-band residual learning mechanism and a flooding loss strategy significantly enhances spectral fidelity and training stability. Experimental results demonstrate that PEMF achieves a chord vocabulary coverage near 1.0, an emotion recognition accuracy of 92.3% significantly outperforming symbolic transformer's 78.6%, a high-frequency energy retention rate of 89.1%, and a Fréchet audio distance of 0.5. System performance shows a 36.9% improvement in emotional consistency and a 64.2% reduction in latency compared to staged training, validating its efficacy in practical applications like music therapy and film scoring.
    Keywords: PerformanceNet; multi-emotional music generation model; emotional regulation; audio synthesis.
    DOI: 10.1504/IJICT.2026.10078376
     
  •   Free full-text access Open AccessPersonalised learning path recommendation and knowledge tracing model for large-scale online education
    ( Free Full-text Access ) CC-BY-NC-ND
    by Yanhong Song 
    Abstract: This study proposes a unified framework that jointly models personalised learning path recommendation and knowledge tracing to improve individualised learning support in large-scale online education. The framework integrates learners' knowledge states, prerequisite relationships, learning load, and preferences within a single space, enabling dynamic tracking and coordinated optimisation. An online-updatable knowledge tracing model captures mastery levels, which inform a scoring and recommendation mechanism that adapts as knowledge states evolve. Experiments on the EdNet-KT1 dataset show the proposed model achieves superior prediction accuracy and lower mean absolute error than recent baselines, with reduced parameters and training time. This approach balances predictive performance and computational efficiency, offering a practical solution for personalised learning support.
    Keywords: large-scale online education; knowledge tracing; personalised learning path recommendation; deep learning; educational data mining.
    DOI: 10.1504/IJICT.2026.10078366
     
  •   Free full-text access Open AccessOil painting emotion recognition using multi-modal adaptive deep network
    ( Free Full-text Access ) CC-BY-NC-ND
    by Guixiang Chang 
    Abstract: In order to solve the problem of how to integrate visual content and semantic information into oil paintings well, this paper puts forward an emotion recognition model for oil paintings based on a multimodal adaptive deep network. Visual and textual information are handled with a two-path system in the model; it gets deep visual features out of paintings and contextual semantic features from connected texts. Adaptive feature fusion module is created to adaptively adjust the fusion weights of different modality features by using cross-modal attention and gating mechanisms. On the ArtEmis oil painting dataset, the experiment shows that the proposed model has achieved 76.8% accuracy in discrete emotion classification task and 0.319 RMSE in continuous emotion dimension prediction. Compared with the basic model, it has better classification accuracy, which proves the validity of the adaptive fusion mechanism in the analysis of multimodal art emotions.
    Keywords: emotion recognition; oil painting analysis; multimodal learning; adaptive fusion.
    DOI: 10.1504/IJICT.2026.10078365
     
  •   Free full-text access Open AccessDynamic optimisation of the extraction process for natural food antioxidants based on multi-agent simulation
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ximei Wan 
    Abstract: The extraction efficiency of natural antioxidants is influenced by the dynamic coupling of multiple factors, and traditional static optimisation methods are unable to cope with real-time disturbances. This study proposes a dynamic optimisation framework based on multi-agent simulation, which realises real-time precise control of the extraction process by simulating the autonomous decisions and collaboration of agents such as solvents, temperature, and equipment. The experiment uses an open dataset of antioxidant kinetics for verification. Compared with traditional methods, this method increases the extraction rate by an average of 12.5%, and the improvements in key indicators (area under the curve, normalised discounted cumulative gain) have passed statistical significance tests (p < 0.05), providing a new idea for solving the dynamic optimisation problem in food processing processes.
    Keywords: multi-agent simulation; dynamic optimisation; process control; antioxidant extraction.
    DOI: 10.1504/IJICT.2026.10078373
     
  •   Free full-text access Open AccessSimulating academic stress formation via causal discovery and temporal sequence analysis
    ( Free Full-text Access ) CC-BY-NC-ND
    by Fu Yao, Mengting Yan 
    Abstract: Academic pressure has a significant impact on students' physical and mental health and academic performance. Understanding its dynamic formation mechanism is crucial for effective intervention. This paper proposes an innovative framework that combines causal discovery with time series modelling to simulate the development path of academic pressure. The framework first uses the systematic causal discovery algorithm to learn the directional acyclic diagram to present the causal correlation between key factors; then builds a causal constraint time series model to simulate the dynamic evolution process of student pressure. Based on the comprehensive longitudinal data evaluation of 300 students, compared with the traditional benchmark model, this model notably increases the average accuracy of pressure precursor recognition by 7.4% and effectively reduces the trajectory simulation error by 28.4%. This research finding provides operational insights for early warning of stress and personalised intervention strategies.
    Keywords: academic pressure; simulation; causal discovery; time series modelling; oriented acyclic diagram.
    DOI: 10.1504/IJICT.2026.10078360
     
  •   Free full-text access Open AccessGenerative adversarial network and grammar rule constraint optimisation for English interlanguage error correction
    ( Free Full-text Access ) CC-BY-NC-ND
    by Liu Liu 
    Abstract: Aiming at the problems of semantic distortion and over-correction that often occur in the text error correction of English learners by existing generative models, this paper accordingly proposes a novel generative adversarial network method - integrating grammatical rule constraints (generative adversarial networks with grammatical rule constraints). By introducing formal grammatical knowledge as a flexible constraint into the network training process, the model is effectively guided to correct errors while better maintaining the original meaning and overall fluency of sentences. Experiments conducted on the public learner corpus show that this method significantly increases the error correction accuracy by 12% and effectively reduces the number of over-correction cases by 16%. The research thereby provides an effective way to solve the persistent balance problem between accuracy and naturalness in automatic grammar correction.
    Keywords: English interlanguage correction; generative adversarial network; grammatical rule constraints; semantic retention; excessive correction.
    DOI: 10.1504/IJICT.2026.10078364
     
  •   Free full-text access Open AccessImplicit neural representation and error control for solving mathematical partial differential equations
    ( Free Full-text Access ) CC-BY-NC-ND
    by Huaizhe Zhang 
    Abstract: This research tackles the persistent challenge of uncontrolled approximation errors and unreliable convergence in neural network-based methods for solving partial differential equations. We introduce a novel error-controlled implicit neural representation framework, which incorporates a trainable error indicator network and an adaptive weighting scheme to dynamically steer the optimisation process. Our approach utilises a dual-encoding architecture to represent physical fields with high fidelity and a cooperative training mechanism that iteratively estimates and reduces local errors. Experimental validation on a national aeronautics and space administration turbulent flat-plate boundary layer benchmark demonstrates that error-controlled implicit neural representation achieves a relative L2 error of 8.73 × 10-4, outperforming the best existing baseline by 42.6% and improving boundary-layer accuracy by 52.0%. Moreover, the proposed method reduces training time by 34.7%-55.2% while maintaining physically consistent solutions, confirming its efficacy and efficiency in error-aware numerical simulation.
    Keywords: partial differential equations; PDEs; adaptive optimisation; numerical simulation; implicit neural representation; INR.
    DOI: 10.1504/IJICT.2026.10078359
     
  •   Free full-text access Open AccessFaster R-BERT multimodal fusion real-time psychological stress recognition system
    ( Free Full-text Access ) CC-BY-NC-ND
    by Ming Zhang 
    Abstract: The psychological stress problems among college students are increasingly prominent, requiring efficient and objective identification methods. However, existing real-time systems struggle to balance accuracy with processing speed and lack deep integration of multi-source information (such as expressions, voices, and texts). This study proposes a real-time recognition system based on faster robust bidirectional encoder representations from transformers multimodal fusion, significantly improving computing efficiency through an innovative lightweight fusion mechanism. Experiments on public datasets show the system achieves 86.5% accuracy in stress recognition, significantly improving on traditional methods (e.g., 73.2% for single-modal convolutional neural network). Its inference speed meets real-time requirements (30 fps), with the key area under the curve indicator increasing to 0.91 (from 0.82). This study provides an effective approach for non-intrusive, real-time psychological state monitoring in campus environments.
    Keywords: psychological stress; multimodal fusion; real-time system; mental health.
    DOI: 10.1504/IJICT.2026.10078371