Forthcoming Articles
International Journal of Information and Communication Technology

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.
Register for our alerting service, which notifies you by email when new issues are published online.
International Journal of Information and Communication Technology (52 papers in press) Regular Issues
Abstract: In current intelligent design systems, text prompt word optimisation is a key challenge to improve the quality of image generation. Aiming at the problems of uncertain direction of prompt words and difficult quality evaluation, this paper develops an adaptive co-creation based on AIGC and reinforcement learning. The model adopts a three-stage training framework: first, supervised fine-tuning of the mapping relationship of prompt word pairs is performed, then multi-modal visual feedback of PickScore and aesthetic value models is fused through reward modelling, and finally reinforcement learning is used to fine-tune and optimise the generation strategy. The model performs well in many indexes. In the performance test, the FID of the model decreases to 15.2 +- 1. 5 and the IS increases to 8.9 +- 0. 6. In the robustness test, the FID of the model is 17.5 +- 1. 7 under the noise prompt. In addition, in the practical test, the overall user satisfaction reaches 4.4 +- 0. 3. These results show that the model realises the collaborative optimisation of prompt words and image generation through adaptive mechanism, and provides an efficient end-to-end optimisation scheme. However, the model still needs to be further refined. Therefore, in future research directions, it is necessary to focus on reducing complexity and enhancing real-time feedback integration. Keywords: artificial intelligence generated content; AIGC; reinforcement learning; adaptive; co-creation model. DOI: 10.1504/IJICT.2026.10078495
Abstract: Public crisis events surface across text, sensors, imagery, and logs, yet single-source detectors miss early weak cues. To address fragmented evidence, this study presents a multimodal bidirectional transformer for crisis recognition. First, source-aware tokens preserve time, space, provenance, and quality while synchronisation gates align asynchronous streams. Then, cross-source attention separates corroboration from dissent and memory tokens retain long-range hints. Finally, self-supervised pretraining and calibrated classification deliver auditable alerts. On composite streams, the method reaches AUPRC 0.612, AUROC 0.915, F1 0.672, ECE 0.038, and average lead time 31.6 minutes, exceeding the best baseline by 7.9 AUPRC points, 3.2 AUROC points, 6.9 F1 points, and 8.5 minutes. These gains provide earlier, more reliable, and well-calibrated alerts for public response. Keywords: event identification; multimodal data; bidirectional transformer; spatiotemporal alignment. DOI: 10.1504/IJICT.2026.10078527
Abstract: In response to the problems of homogenisation in rural tourism development and inefficient resource allocation, this study has designed a rural tourism development path optimisation method that integrates geographic information system and deep learning. Firstly, multiple sources of spatial data are integrated, and convolutional neural networks are used to automatically predict the potential of rural tourism development. Subsequently, using the potential spatial distribution as input, a mathematical model for path optimisation is constructed, and an improved deep reinforcement learning method is employed, incorporating local operators to perform neighbourhood search and iteratively improve the initial solution. The experiments show that the net benefit value achieved by the proposed method on the standard test set is 44.05, and the solution time is only 3.14 seconds, which is significantly better than the comparison algorithms, providing an effective solution for the optimisation of rural tourism development paths. Keywords: rural tourism; optimisation of development path; deep learning; deep reinforcement learning; neighbourhood search. DOI: 10.1504/IJICT.2026.10078528
Abstract: The free movement of sound sources and listeners in immersive virtual reality poses a significant challenge for dynamic sound field reconstruction. This study proposes a real-time, high-fidelity reconstruction method based on a multi-channel loudspeaker system. A time-varying spherical harmonic coefficient field is first constructed to parametrically represent the dynamic sound field. An optimisation algorithm integrating perceptual weighting and sparse constraints is then designed to achieve high-quality reconstruction under limited physical loudspeaker channels. Experimental results demonstrate that the proposed method significantly outperforms conventional higher-order ambisonics decoding, vector base amplitude panning, and existing deep learning approaches. Key improvements include reduced normalised field error, lower perceptual spectral distortion, and higher azimuth estimation accuracy, all while satisfying real-time processing requirements. Experiments show that, the proposed method reduces the normalised field error by more than 40% compared to the next-best method, while maintaining a frame processing latency of less than 20 milliseconds. These advancements collectively enhance the auditory immersion in dynamic virtual environments. Keywords: dynamic sound field reconstruction; immersive audio; spherical harmonics; sparse optimisation; compressed sensing; spatial hearing. DOI: 10.1504/IJICT.2026.10078529
Abstract: This study addresses the inaccuracy of traditional static network models in predicting rapidly evolving interests within cultural communities by proposing an interactive network evolution model based on dynamic interest graphs. We find that members shifting interests make information propagation paths unpredictable - for instance, trending topics in music communities may cycle as frequently as every two weeks. To tackle this challenge, we developed a graph model incorporating time-decay factors that captures real-time changes in interest similarity. Experiments demonstrate that compared to traditional static graph methods, our model achieves a 12.7% improvement in community structure prediction accuracy and reduces prediction error for information reach by 18.3%. This work offers new insights into understanding the dynamic evolution of cultural communities and enabling precise content dissemination. Keywords: dynamic interest graph; cultural communities; network evolution; information dissemination. DOI: 10.1504/IJICT.2026.10078530
Abstract: To address issues such as uneven equipment utilisation and delayed user demand response in traditional smart laboratory resource allocation, this paper proposes a multi-agent deep reinforcement learning framework based on attention mechanisms. The research motivation stems from the collaborative scheduling challenges posed by heterogeneous equipment and dynamic task requests. This method employs a centralised training and distributed execution architecture, enabling agents to learn cooperative strategies in partially observable environments. It further incorporates a demand forecasting module to enhance allocation foresight. Experiments on public datasets and simulation environments demonstrate that the proposed method significantly outperforms traditional genetic algorithms and single-agent reinforcement learning approaches in both resource allocation quality (normalised discounted cumulative gain @5 reached 0.87) and overall utilisation (area under the curve improvement of 15.2%), validating its effectiveness and adaptability in complex laboratory scenarios. Keywords: multi-agent reinforcement learning; MARL; smart laboratory; dynamic resource allocation; attention mechanism. DOI: 10.1504/IJICT.2026.10078531
Abstract: This study addresses the inefficiency and passivity of surveillance in detecting crowd anomalies across wide, dynamic environments using unmanned aerial vehicles. To address this, this paper proposes an active perception framework for drone swarms that is driven by real-time visual-semantic feedback. The framework couples a spatiotemporal graph attention network, which models crowd interactions and infers anomaly probabilities, with a cooperative multi-agent reinforcement learning decision making module. This integration enables the swarm to dynamically and collaboratively optimise viewpoints based on live semantic cues. Evaluated on the VisDrone dataset, our approach achieves an anomaly capture rate of 89.7%, an average response delay of 1.9 seconds, an operational efficiency of 1.86 events per kilometre flown, and a low observation redundancy of 22.1%. These results demonstrate that embedding visual semantics into a closed perception-control loop significantly enhances the performance of proactive monitoring systems compared to existing baseline methods. Keywords: drone swarm; active perception; crowd anomaly detection; visual feedback; cooperative reinforcement learning. DOI: 10.1504/IJICT.2026.10078532
Abstract: The generation of adaptive interaction logic for virtual characters remains challenging, as traditional rule-driven methods often produce rigid and contextually insensitive behaviours. To overcome this, we present the multimodal meta-generation network, a multimodal behaviour data-driven framework that synthesises natural and socially appropriate interaction logic from streams including speech, posture, and facial expression. The framework employs cross-modal temporal alignment and hierarchical reinforcement learning to fuse asynchronous signals and enable joint strategy planning with action execution. A causal reasoning module is integrated to enhance social rationality. Experiments on public multimodal interaction datasets demonstrate that our method significantly outperforms baseline models, achieving an F1-score of 0.795 in accuracy and a human subjective score of 4.3 out of 5.0 in naturalness. This research provides a practical solution for deploying adaptive virtual characters in fields such as the metaverse, intelligent education, and remote collaboration. Keywords: multimodal learning; virtual characters; interaction logic generation; reinforcement learning; behaviour analysis. DOI: 10.1504/IJICT.2026.10078533
Abstract: Automated emotion analysis in visual art remains a significant challenge, primarily due to the paucity of annotated data and the profound stylistic and semantic gap between generic image understanding and domain-specific artistic interpretation. This study introduces a novel meta-learning framework enhanced with structured semantic knowledge for few-shot emotion recognition in oil paintings. The proposed model integrates a dual-path architecture: a meta-learning pathway for rapid visual adaptation and a semantic pathway that incorporates contextual art historical knowledge. These pathways are fused through a hierarchical cross-modal attention module, which dynamically aligns visual features with relevant semantic concepts during the learning process. Extensive evaluations on the ArtEmis dataset demonstrate the frameworks superior performance, achieving state-of-the-art macro-accuracy of 68.7% (1-shot) and 81.3% (5-shot). The results confirm the models efficacy in achieving robust, generalisable, and interpretable emotion analysis with limited data, advancing the field of computational art understanding. Keywords: oil painting emotion recognition; few-shot learning; meta-learning; semantic enhancement; interpretable artificial intelligence. DOI: 10.1504/IJICT.2026.10078534
Abstract: Coal mine safety is crucial for both life and production. However, traditional monitoring relies on a single sensor, resulting in a high rate of missed alarms in complex underground environments. To address multiple challenges such as changes in light, dust interference, etc., this study proposes a deep learning early warning system that integrates video, infrared, and vibration data. Through cross-modal feature fusion and multi-task learning, it achieves collaborative perception of abnormal human behaviors and equipment failures. Experimental results show that the system achieves an area under the curve of 0.982 for abnormal behavior detection on public datasets, which is approximately 7% higher than that of a single visual model; The accuracy of fire warning reaches 96.7%, and the false alarm rate is reduced by 5.3%. This method provides a highly reliable and scalable technical path for intelligent safety monitoring in coal mines around the clock. Keywords: coal mine security; multi-modal fusion; anomaly detection; intelligent early warning. DOI: 10.1504/IJICT.2026.10078535
Abstract: This research tackles the dynamic optimisation of construction carbon emissions by proposing a novel integration of building information modelling, utility-driven simulation, and multi-objective evolutionary computation. A core methodological contribution is a formalised utility function that quantifies and embeds project-specific decision-maker preferences for time, cost, and emissions directly into the optimisation search process. This function guides a bespoke evolutionary algorithm to automatically generate efficient, low-carbon construction plans, which are accurately evaluated by a high-fidelity discrete-event simulation engine using enriched building information modelling data. In an experimental study based on a publicly available office building dataset, our framework demonstrates superior performance, outperforming state-of-the-art benchmarks with a 5.6%-15.3% gain in hypervolume and achieving an 18.7% reduction in on-site emissions. This work provides a rigorous and actionable decision-support system for advancing sustainable construction practices. Keywords: building information modelling; BIM; construction carbon emission; multi-objective optimisation; utility function; discrete-event simulation. DOI: 10.1504/IJICT.2026.10078536
Abstract: Reliable evaluation of the quality of machine translation is essential to ensure a reliable automatic translation system. However, adversial attacks can reduce evaluation performance by subtly disturbing sentences and endanger the security of key applications. This paper proposes a comprehensive confrontational robustness enhancement framework specially designed for translation quality evaluation, an adversarial robustness enhancement framework. The framework integrates a multi-grained confrontation sample generator, a dynamic confrontation training mechanism based on the relaxation of master and apprentice labels, and online defense module. The experiment was verified on the machine translation and multilingual quality evaluation seminar and post-editing task data set: this method increased the robustness of the model by 34.2%, reduced the average prediction error from 18.7% to 12.3% in the attack state. The framework shows stable performance in multiple fields providing an effective solution for building safe and reliable actual scene translation quality evaluation system. Keywords: translation quality assessment; adversarial machine learning; robustness assessment; natural language processing; NLP; model security. DOI: 10.1504/IJICT.2026.10078537
Abstract: In recent years, social media has become a key medium for public emotion and thought dynamics, making its sentiment analysis crucial for event prediction. However, while mainstream deep learning models achieve accurate predictions, their black box decision process hampers reliable warning. This paper thus innovatively integrates the powerful transformer model with interpretable Shapley additive explanations values to construct a social sentiment warning framework with both high accuracy and transparency. Experiments on public datasets show the methods comprehensive warning performance significantly outperforms traditional models: area under the curve reaches 0.872, which is approximately 7.5% higher than the classic long short-term memory model, overall warning accuracy rises to 85.6%, and the false alarm rate drops by nearly 12%. This provides an effective solution for reliable and interpretable automated social sentiment perception and early risk warning. Keywords: emotional warning; transformer; explainable artificial intelligence; SHAP value. DOI: 10.1504/IJICT.2026.10078538
Abstract: Aiming at the core problems faced by the power line carrier communication system of smart grid, such as severe channel attenuation, complex and changeable interference sources, delayed fault location and slow self-healing response, and the existing methods have significant shortcomings in reliability, real-time and scene adaptability, this paper proposes an intelligent optimisation scheme based on the integration of data mining and artificial intelligence (IOSDM-AI). Through the multi-model coupling mechanism, the scheme is driven by real-time communication data flow to achieve accurate fault diagnosis, rapid location and adaptive self-healing, while ensuring the stable operation of the system. The results show that the accuracy of IOSDM-AI algorithm is 94.7%, the diagnosis delay is reduced to 108.3 ms, and the missed diagnosis rate and misdiagnosis rate are as low as 0.7% and 2.1%, respectively. The self-healing success rate is 96. 8%, the average self-healing time is reduced to 2.3 s, the communication link stability index is 9.28, and the self-healing strategy execution accuracy (FSE) is 428. Keywords: smart grid; power line carrier communication; artificial intelligence; fault diagnosis; self-healing mechanism; data mining. DOI: 10.1504/IJICT.2026.10078539
Abstract: This study proposes a multi-objective optimisation model to address the strategic placement of public art, aiming to balance spatial efficiency, social equity, and economic cost. The data-driven framework incorporates dynamic mobility patterns, multidimensional socioeconomic indices, and urban walkability networks. It concurrently optimises three objectives: maximising weighted accessibility coverage, minimising the Gini coefficient of accessibility, and reducing total expenditure. A novel cognitive heuristic adaptive search algorithm is introduced, which embeds domain knowledge of urban spatial structure to solve this high-dimensional problem. Empirical validation using Manhattan data confirms the algorithms superior performance, demonstrating a 10.8% to 24.1% improvement in the hypervolume metric over standard multi-objective evolutionary algorithms. The resulting Pareto-optimal solutions quantify clear trade-offs, such as a 142% gain in coverage efficiency or a 48% reduction in accessibility inequality, thereby establishing a scientific basis for equitable cultural resource planning. Keywords: public art placement; spatial optimisation; social equity; multi-objective optimisation; data-driven decision support. DOI: 10.1504/IJICT.2026.10078540
Abstract: To address the issue of high cost incurred by multimodal large models in visual-language tasks, this paper proposes a lightweight model, CAD-LM, based on cross-modal attention distillation. It designs an adaptive modal reparameterisation module that utilises a multi-branch structure to enhance representational power during training and reparameterises it into an efficient single-branch structure during inference. This paper integrates an end-to-end deployment optimisation process that encompasses hardware-aware pruning, mixed-precision quantisation, and efficient parameter fine-tuning. Experiments show that CAD-LM successfully compresses the model parameter count to 118.4 M and reduces computational complexity to 12.1 GFLOPs, achieving approximately 20% and 30% reduction, respectively. Its performance on benchmark tasks such as Flickr30k and VQA v2.0 significantly surpasses baseline models like the original-scale CLIP and ALBEF. Edge deployment verification reveals that the final model occupies only 89.4 MB of memory and boasts millisecond-level inference latency, achieving an excellent balance between model performance, computational efficiency, and engineering practicality. This provides an efficient solution for multi-modal applications in resource-constrained environments. Keywords: attention distillation; multi-modal; large model; lightweight; end-side deployment. DOI: 10.1504/IJICT.2026.10078541
Abstract: In online learning scenarios, the dynamic fluctuations of students cognitive load directly impact learning outcomes. Existing scheduling models, lacking real-time perception of cognitive load states, often result in mismatches between task assignments and learner capabilities. To address this, this paper first processes student contextual features using feature selectors and self-attention mechanisms, then predicts response performance based on cognitive load state. Subsequently, an online learning task utility scheduling model is constructed based on cognitive load diagnosis results. A multidimensional utility function for online learning tasks is designed, establishing a scheduling objective function that maximises this utility. Finally, an improved particle swarm optimisation algorithm solves the objective function to derive the optimal online learning task utility scheduling strategy. Experimental results demonstrate that the proposed method achieves scheduling times of 3.3 ms and a success rate of 98.3%, outperforming baseline methods with significantly higher scheduling efficiency. Keywords: online learning; task utility scheduling; cognitive load; cognitive diagnosis; attention mechanism. DOI: 10.1504/IJICT.2026.10078542
Abstract: This paper introduces vision-based intelligent inspection system to detect surface defects in CNC-machined parts by using a YOLOv7-based transfer learning model. The manual inspection systems used traditionally are laborious, subjective, and prone to mistakes, thus necessitating automated solutions. The suggested YTL-ISDD system takes advantage of the high-resolution images taken under controlled lighting to detect defects, including scratches, cracks, pits, and burrs. The model makes use of pre-trained YOLOv7 weights that improve feature extraction and minimise the training time. The amount of data augmentation, such as rotation, scaling, flipping, and contrast adjustment, is used to enhance robustness and generalisation in different conditions of surfaces. The system can be easily integrated into CNC production environments, and it can detect defects in real-time, accurately, and consistently with minimum human involvement, which enhances the quality of products and efficiency of production. Keywords: CNC machining; surface defect detection; YOLOv7; transfer learning; intelligent inspection. DOI: 10.1504/IJICT.2026.10078543
Abstract: Faced with the challenges of complexity and uncertainty of multi-source heterogeneous data in energy industry decision support systems and the dynamic environment adaptation needs brought about by smart grids and renewable energy access, this study aims to build an adaptive, strong and robust multi-modal knowledge fusion and intelligent generation model to improve the accuracy and reliability of decision-making. By innovatively integrating hypergraph attention networks to achieve multi-modal feature fusion, cloud model quantification of data uncertainty, and reinforcement learning framework-driven intelligent policy generation, the model achieves an accuracy rate of 93.5%, an F1 score of 92.8% and RMSE 0.08 in performance tests. In addition, robustness tests show that the model has a change rate of only 3.7% under 30% noise, the decision delay is optimised to 65 milliseconds, and the accuracy rate increases to 89.2% after fine-tuning in cross-scenario generalisation. Overall, the model effectively solves the semantic gap and data quality problems, and provides efficient support for energy scheduling, fault prediction and other scenarios. Keywords: energy industry; decision-making; multi-modal; knowledge fusion; intelligent generation. DOI: 10.1504/IJICT.2026.10078544
Abstract: With the global economys deep integration, traditional manual migration methods cannot meet efficiency and accuracy needs. This study explores meta learning techniques application in zero sample accounting standard transfer and constructs an intelligent framework to adapt to new accounting standards in various fields. By analysing meta learning mechanisms, an innovative transfer framework is designed to use a small amount of source domain data for rapid adaptation and precise transfer of accounting standards in new fields. Experimental results show that the meta learning empowerment framework significantly improves transfer performance under zero sample conditions. Compared with traditional methods, the average accuracy is up by 12.3%. At different data scales, as the data volume increases from 100 to 1,000, the accuracy improves by 8.5%, 10.2%, and 13.1% respectively. In new accounting standard testing, the average accuracy reaches 85.6%, a 9.4% improvement over traditional methods. Keywords: meta-learning; zero-sample learning; accounting standards migration; intelligent framework. DOI: 10.1504/IJICT.2026.10078545
Abstract: To enhance the fusion performance of multimodal advertising content in visual comprehension and behaviour prediction, this paper proposes a DHGNN-based multimodal information fusion model. The model comprises modal feature embedding, dynamic node/edge attribute modelling, improved heterogeneous graph convolutions, and a multi-task output layer, optimising CTR, CVR, and semantic classification. Compared with the conventional Hetero GCN and MM Attn models, the proposed DHGNN improved the AUC by 3.8-5.9 p.p. on CTR, and the F1-score was 0.726, which was 8.8 percentage points above Homogeneous GCN. Moreover, the inference latency is reduced to 19.4 ms, while maintaining competitive predictive accuracy. Across Ali-AD and Tencent-Ad360, DHGNN achieves AUC gains of 3.85.9 percentage points on CTR and reaches an F1-score of 0.726 on semantic classification, providing quantitative evidence that the proposed dynamic heterogeneous graph modelling improves multimodal fusion under heterogeneous and time-varying advertising data for multimodal recommendation systems. Keywords: multimodal fusion; dynamic heterogeneous graph neural network; DHGNN; ad recommendation; semantic alignment; cross-modal modelling; temporal evolution modelling; heterogeneous graph attention; click-through rate prediction; semantic consistency learning; cold-start advertising scenario. DOI: 10.1504/IJICT.2026.10078569
Abstract: In the information age, managing student performance and mental health is critical for educational development. Addressing current analytical limitations, this study proposes an intelligent management framework grounded in data analytics and knowledge integration. The system utilises a four-layer architecture integrating distributed databases and deep learning. Specifically, performance prediction uses a GPA-based model, while an improved two-stream encoder network (TSEN) enables mental health monitoring. Experimental results demonstrate the performance model achieves an accuracy of 0.947, a 0.949 F1 score, and an area under the curve of 0.963 for top predictions. For mental health analysis, using eight influencing factors yields 87.5% sensitivity and 90.1% recognition accuracy, with a Matthews correlation coefficient of 0.962 over a 12-week sequence. These results confirm that the approach effectively integrates academic and psychological data, significantly enhancing analytical accuracy and providing robust decision support for student development and educational management. Keywords: knowledge-driven student management; GPA-based performance prediction; student mental health analytics; two-stream encoder networks; attention-based data mining; educational decision support systems. DOI: 10.1504/IJICT.2026.10078570
Abstract: With the rapid advancement of big data and artificial intelligence, information presentation has shifted to a multimodal environment encompassing text, images, audio, video, and biological signals. Traditional visual information visualisation is challenged by limited expression dimensions and insufficient emotional and narrative capacity. The purpose of this study is to explore the theory and technical path of deep integration of graphic creativity and visual information visualisation in multimodal context. Firstly, the theoretical basis of multimodal cognition and perception is constructed; then, a technical framework integrating multimodal data perception, cross modal feature alignment, creative semantic generation and visual mapping is proposed; by introducing generative AI, attention mechanism and style transfer technologies, a data story aesthetic driven fusion design method is designed; finally, two case studies verify its effectiveness in enhancing information interpretation efficiency and user experience, offering theoretical support and technical solutions for complex information design. Keywords: multimode; graphic creativity; visual information visualisation; fusion design. DOI: 10.1504/IJICT.2026.10078571
Abstract: The convergence of generative artificial intelligence (AI) and sixth-generation (6G) communication technologies is changing how intelligent English language learning systems work. This paper presents learning with intelligible generative AI and 6G-based secure architecture (LINGUA-6G), a cohesive framework that amalgamates generative AI with ultra-low-latency, high-reliability 6G networks. Tests show that fluency improved by 28.7%, vocabulary retention by 32.4%, and latency by 41.2%. Network testing showed a response time of less than 90 ms, an availability rate of more than 96%, and stable performance. Blockchain identity verification and AI-based intrusion detection cut down on unauthorised access by 89.5% and found 96.8% of threats, proving that 6G-enabled AI learning ecosystems are safe, scalable, and efficient. Keywords: generative artificial intelligence; 6G communication networks; intelligent language learning; blockchain-based identity verification; AI-driven intrusion detection. DOI: 10.1504/IJICT.2026.10078572
Abstract: At present, visual communication design AI assistant tools have problems such as user intention understanding deviation, unexplainable generated results, fragmented interaction and low cross-module collaboration efficiency. This paper proposes an interpretable AI assistant creative system integrating diffusion model, user intention understanding and advanced communication technology. Its core is DesignXAI with the logic of intent transfer-controllable generation-process interpretation-collaborative optimisation. Experiments show the system outperforms mainstream models, with 91% intention understanding accuracy, 89.3 user satisfaction and 0.87 intention-result semantic consistency, providing an efficient intelligent auxiliary method. Keywords: visual communication design; explainable AI; diffusion model; user intent understanding; semantic communication; edge computing. DOI: 10.1504/IJICT.2026.10078573
Abstract: This study proposes a dual-driven STEAM art course framework integrating generative AI and prompt engineering to address key challenges in creativity support, personalisation, and AI tool integration. A four-layer topology aligns course goals, content, interaction, and evaluation, combining generative AIs creative abilities with prompt engineerings precision. Hierarchical prompt strategies enable stepwise creative guidance, while a course-AI feedback loop adapts to learner needs. A multidimensional evaluation system assesses creative expression, skill development, and thinking growth. Results show a 42.3% increase in creative work scores, 91.7% skill proficiency, 4.8 satisfaction (out of 5), and 35.6% higher teaching efficiency. Personalised teaching coverage rose from 38% to 89%. The framework performs effectively across diverse age groups and skill levels, offering a scalable path for intelligent art education in K-12 and training contexts. Keywords: STEAM art course; generative AI; prompt engineering; course design; creative cultivation. DOI: 10.1504/IJICT.2026.10078574
Abstract: The aim of this research is to fully examine the consideration and emotional inclination of Chinese consumers towards low-carbon products and offer empirical evidence in order to raise the level of awareness of the population on low-carbon consumption. Towards this, 62,271 online reviews of common low-carbon products in JD.com which is a Chinese e-commerce were gathered. The reviews contained six types of consumer goods used every day: paper products and cleaning supplies, household goods, electronic appliances, clothing and accessories, home improvement and decoration, and beauty and personal care. SnowNLP natural language processing sentiment analysis component was used to determine the intensity of emotion of consumers on different low-carbon products. Moreover, most of the online reviews were analysed and latent topics were identified with the LDA topic model as part of establishing the impact of these latent topics on the positive and negative feelings of the consumers. Keywords: low-carbon products; consumer sentiment; online reviews; consumer satisfaction; topic modelling; ICT-based analytics; e-commerce. DOI: 10.1504/IJICT.2026.10078575
Abstract: E-commerce platforms generate vast multi-modal data (product images and user reviews), whose integrated analysis is crucial for enhancing user experience and decision making. However, existing methods often treat visual perception and text sentiment analysis separately, limiting cross-modal semantic collaboration. Therefore, a multi-modal hierarchical collaborative fusion model (MHCFM) that unifies product visual attributes, aesthetic quality, scene context, and textual emotion is proposed via cross-modal alignment and hierarchical adaptive fusion. The model integrates a hierarchical visual transformer, a dual-branch aesthetic network, a graph convolutional scene module, and a hierarchical adaptive fusion network. Experiments on public and large-scale e-commerce datasets showed the sentiment analysis accuracy exceeded 93%, the inference time was 2223 ms, outperforming mainstream models. In cross-cultural and multi-category tests, the average accuracy was 91.5%, demonstrating robustness. The proposed model enhances visual-textual collaboration, offering an efficient solution for intelligent product analysis and user experience optimisation in e-commerce. Keywords: multi-modal sentiment analysis; visual transformer; cross-modal alignment; hierarchical adaptive fusion; e-commerce; multi-modal hierarchical collaborative fusion model; MHCFM. DOI: 10.1504/IJICT.2026.10078762
Abstract: This paper presents a data fusion-driven framework that integrates choreography with multimedia technology to create coherent and immersive multi-sensory performance experiences. The study aims to enhance audience engagement by seamlessly embedding dynamic visual and auditory elements into choreographic expressions, thereby redefining traditional performance boundaries. A collaborative model between choreographers and multimedia technologists is proposed to ensure artistic and technical coherence. Methodologically, the research develops and implements techniques for embedding multimedia features into choreographic sequences, along with detection algorithms for synchronising digital and live performance components. The results demonstrate that the proposed integration framework effectively facilitates real-time coordination between movement and multimedia content, enriching narrative expression and sensory impact. This work contributes to the evolving field of digital performance by offering a structured approach to multidisciplinary collaboration and technological integration in the arts. Keywords: multimedia; choreography; integration. DOI: 10.1504/IJICT.2026.10078624
Abstract: The imbalance of the regional economy and the tilt of national policy lead to the degree of development among universities. To help more backward universities improve their academic level and narrow the development gap between universities, we propose a teaching information-based resource-sharing (TIRS) model based on deep learning (DL) for college. Firstly, we analyse the types and characteristics of teaching resources and establish a sharing platform for university teaching resources. Then, we propose a label quantification method for teaching resources upon DL to extract the features of each resource in the sharing platform and assign labels. Finally, we propose a teaching resources retrieval method by the bag of words model to improve the efficiency of the TIRS model. The experiment demonstrates that the TIRS model for colleges by deep learning can provide good teaching services for teachers and students, and the objective accuracy and subjective accuracy of retrieval can reach 87.2%and 86.7% respectively, which provides technical support for resource sharing of colleges. Keywords: teaching resources; sharing model; deep learning; DL. DOI: 10.1504/IJICT.2026.10078625
Abstract: Echo chambers in youth online communities intensify opinion polarisation, yet existing methods rely on static snapshots and single-modal features, neglecting integration with social-psychological theories. This paper proposes the temporal clustering for echo chamber measurement framework, unifying dynamic graph neural networks with joint temporal clustering. Using Reddit data from 2018 to 2022, temporal user interaction networks are constructed, and theory-driven indicators interaction homogeneity, topic convergence, and attitude polarisation are extracted based on social identity and cognitive dissonance theories. A temporal graph attention network learns evolving node representations, followed by joint optimisation of temporal K-means and spectral clustering to identify echo chambers and quantify intensity trajectories. Experiments show the framework outperforms baselines, achieving an adjusted Rand index of 0.89 and an F1-score of 0.88. It captures echo chamber dynamics during major events, offering an interpretable tool for understanding online polarisation in youth communities. Keywords: echo chamber effect; temporal clustering; dynamic graph neural networks; youth online communities; social media analysis. DOI: 10.1504/IJICT.2026.10078626
Abstract: This study has constructed a generative artificial intelligence-driven multimodal learning space for real-time perception and regulation of the anxiety states of English learners. By integrating facial expression analysis, speech feature extraction, and behaviour data modelling, the system achieved an accuracy rate of 87% in identifying learning anxiety, which was over 9% higher than that of single-modal methods. In a six-week intervention experiment, the state anxiety scores of the experimental group decreased by 31.6% compared to the control group, while the oral fluency of the experimental group improved by 24.3%. The research proves that the collaborative intervention of multimodal emotion computing and generative artificial intelligence can effectively break through the bottlenecks of delayed anxiety identification and single intervention methods in traditional teaching, providing a feasible path for the emotional adaptation of intelligent language learning environments. Keywords: generative artificial intelligence; multimodal learning; english learning anxiety; affective computing. DOI: 10.1504/IJICT.2026.10078627
Abstract: Language ambiguity is the core challenge in machine translation. To overcome the limitation of existing neural machine translation (NMT) models that are prone to semantic deviations in complex contexts, this paper proposes a multi-engine architecture based on the attention mechanism. It achieves ambiguity resolution by dynamically integrating the context modelling of neural networks, the grammar constraints of the rule engine, and the historical knowledge of the statistical engine. Experiments on public datasets such as workshop on machine translation and discourse in machine translation show that this system improves the bilingual evaluation understudy score by 2.1 points in the English-German translation task, increases the accuracy of ambiguous phrase translation by 15.7%, and its semantic coherence (normalised discounted cumulative gain) is significantly better than the mainstream baseline models. This research provides an effective solution for building a robust and accurate context-aware translation system. Keywords: multi-engine architecture; attention mechanism; resolution of ambiguity; neural machine translation; NMT; semantic coherence. DOI: 10.1504/IJICT.2026.10078628
Abstract: In this study, a directed weighted graph containing 500,000 user nodes and 100,000 news records was constructed, node attributes are labelled with user activity and news sentiment tendency, edge weights were determined according to propagation time attenuation (5%/hour) and interaction frequency, the community was divided by Louvains algorithm, and the core nodes were identified by fusing node betweenness centrality and PageRank, and path traversal was optimised with 0.2 restart probability. Compared with traditional methods such as the shortest path algorithm and static community random walk, CGTA achieves 92.3% (76.6% for traditional methods) and 89.1% for core node recall (71.4% for traditional methods), which are 15.7% and 17.7% higher respectively. The structural equation model quantified that user activity (38.2%), emotional tendency (29.5%), and propagation time (22.3%) dominated the path formation, and the propagation speed decreased by 12.3% (r = -0.78) for every 1 hop increase in path length, and the core node expansion effect reached 67.5%. The study integrates multi-source data from Weibo and Toutiao, and cross-validates it with 5% to confirm the effectiveness of the algorithm in breaking/regular news and communities of different sizes. Keywords: CGTA algorithm; news communication path; core node; communication influence factors. DOI: 10.1504/IJICT.2026.10078713
Abstract: This study addresses reasoning deficiencies in English reading comprehension by proposing a dual-channel framework that fuses graph neural networks (GNN) with knowledge graphs (KG). By integrating semantic relationships with GNN node feature transfer, the model significantly outperforms BERT-base and traditional GNNs. Experimental results on 10,000 texts show a 89.7% accuracy in multiple-choice questions and 83.5% in logical relationship recognition. Ablation studies confirm the KG constraints reduce overfitting by 35% and improve efficiency by 20%. This research is the first to couple entity embedding with dynamic aggregation, providing interpretable reasoning paths. The model excels in complex long-text scenarios, offering a quantitative evaluation tool for English teaching through multi-dimensional metrics. Keywords: knowledge graph; graph neural network; GNN; English reading comprehension; reasoning enhancement; semantic fusion. DOI: 10.1504/IJICT.2026.10078784
Abstract: This study explores the stylistic switching and spatial-temporal evolution of revolutionary cultural relics using semantic segmentation and GIS spatial analysis. A dataset of relics from the 1920s1990s was constructed, including architecture (60%), sculpture (25%), and historical sites (15%). The proposed method achieved 85% accuracy in temporal shift tasks, with spatial evolution simulation errors below 5% and overall performance at 92.1%. Key innovations include: 1) applying semantic segmentation to recognize period switching; 2) using GIS to simulate spatial-temporal dynamics; 3) establishing a multi-source evaluation framework for relic features. These findings support scientific conservation and interdisciplinary research on revolutionary heritage. Keywords: revolutionary cultural relics; semantic segmentation; GIS spatial analysis; spatial-temporal evolution. DOI: 10.1504/IJICT.2026.10078786
Abstract: In the knowledge economy, core talent loss threatens technological and trade secret leakage, hindering organisational growth. Current prediction methods often neglect temporal dynamics. This study addresses this by integrating temporal collaborative filtering (TCF) and the Prophet model into a novel early warning algorithm. TCF extracts dynamic temporal patterns in employee behaviour, while Prophet captures trend and seasonal features. A dual-view deep neural network fuses both approaches, supported by a dedicated data pipeline. Experimental results demonstrate that the fusion model achieves 85.2% test accuracy, surpassing TCF-only (76.1%) and Prophet-only (73.4%) models by 9.1 and 11.8 percentage points, respectively, with 82.7% recall and a 0.863 F1 score. Under a 6-month time window and 23,890 training samples, the model attains 85.7% accuracy, 81.5% coverage of key-period loss events, and 0.62-second response time, confirming its effectiveness for talent loss warning. Keywords: talent loss early warning; temporal collaborative filtering; prophet model; fusion algorithm; employee churn prediction; temporal forecasting; early warning system. DOI: 10.1504/IJICT.2026.10078818
Abstract: It is very important to be able to find corporate financial risks in real time in order to protect financial stability and make sure that businesses may grow in a sustainable way. This work presents a distributed reinforcement learning-based detection model to overcome the shortcomings of conventional forecasting techniques. Initially, it utilises a distributed computing architecture to effectively handle multi-source financial data; subsequently, it dynamically acquires knowledge of corporate financial conditions and forecasts risks; ultimately, it improves model adaptability and predictive precision through the integration of multi-source data. To validate the models effectiveness, comparative and ablation experiments demonstrated superior performance. Its prediction accuracy and recall rate reached 87.6% and 82.9% respectively, representing a significant breakthrough over traditional methods. The model provides good technical support for managing corporate financial risk. Keywords: distributed reinforcement learning; corporate financial risk; real-time detection; multi-source data fusion. DOI: 10.1504/IJICT.2026.10078819
Abstract: Identifying financial risks in listed companies is crucial for capital market stability, yet traditional statistical models and single machine learning approaches struggle to address the challenges of high-dimensional nonlinearity and class imbalance in financial data. This paper proposes an ensemble learning framework integrating extreme gradient boosting, light gradient boosting machine, and random forest, while introducing an adaptive optimal threshold algorithm based on Bayesian optimisation to dynamically refine classification boundaries. Empirical analysis using a-share and global environmental, social and governance data (covering 4,837 listed companies) from 2015 to 2023 demonstrates that the ensemble model achieves an area under the curve of 0.964, surpassing the best single model by 3.5%. Adaptive optimal threshold algorithm enhances the ranking metric normalised discounted cumulative gain @10 by 12.7%, significantly improving the identification of scarce risk samples. This study provides regulators and investors with a high-precision risk screening tool. Keywords: financial risk identification; ensemble learning; adaptive thresholding; listed companies. DOI: 10.1504/IJICT.2026.10078852
Abstract: Employee turnover prediction is crucial for corporate human resource strategies, yet traditional static models struggle to capture complex employee relationship networks and behavioural temporal dynamics. To address this, this paper proposes a graph sample and aggregation long short-term memory fusion model that analyses employee collaboration relationships via graph neural networks while modelling historical behavioural sequences using long short-term memory networks, thereby enabling dynamic and precise turnover risk prediction. Experiments on the publicly available dataset demonstrate that the proposed model achieves an area under the receiver operating characteristic curve of 0.92, representing an improvement of approximately 0.07 over a single long short-term memory model. Its accuracy rate reaches 89.5%, surpassing traditional logistic regression by nearly 12%. This research not only validates the effectiveness of integrating graph structure and temporal information to enhance predictive performance but also provides reliable technical support for enterprises to implement early-stage risk intervention. Keywords: employee turnover risk; graph neural networks; GNNs; dynamic prediction. DOI: 10.1504/IJICT.2026.10078853
Abstract: This study addresses the challenge of achieving real-time and adaptive aerial coverage over crowds during sudden abnormal evacuations - a highly nonlinear and dynamic process. This paper proposes a physics-aware hybrid modelling framework that integrates a microscopic social force model for simulating crowd movements with an enhanced artificial potential field approach for quadrotor swarm guidance. Central to our method is a nonlinear function that maps real-time crowd density into a spatially varying potential field, along with a dynamic balancing strategy to avoid local optima. Extensive simulations based on real pedestrian trajectory datasets show that the proposed framework achieves an average coverage efficiency of 82.4% and a recovery latency of only 4.3 seconds following evacuation onset, outperforming state-of-the-art methods in coverage persistence, responsiveness, and energy efficiency. These results validate the frameworks capability to support adaptive monitoring in emergency crowd management scenarios. Keywords: quadrotor swarm; dynamic coverage control; crowd evacuation simulation; physics-informed modelling; emergency monitoring system. DOI: 10.1504/IJICT.2026.10078854
Abstract: To address the issues of insufficient evaluation frequency and unclear feedback orientation in college English oral language teaching, an intelligent oral language assessment and feedback system integrating machine learning was constructed. Based on real classroom speech data, multi-dimensional feature representations of pronunciation accuracy, speaking speed, pause ratio, and fluency were established. On this basis, a multi-model weighted scoring mechanism was formed. Experimental results showed that the system scoring was highly consistent with the manual evaluation, with a comprehensive accuracy rate of 0.84. The scoring deviation was concentrated within the +-3-point range. After introducing a feedback adjustment mechanism based on learning records, the students comprehensive scores increased by an average of 4.7 points within four weeks. Their speaking speed stabilised at 4.5-5.5 syllables per second, and the pause ratio decreased to approximately 0.21. The system is feasible in terms of process evaluation and teaching support, providing a data-driven auxiliary path for college English oral language teaching. Keywords: college oral English; intelligent evaluation; machine learning; personalised feedback. DOI: 10.1504/IJICT.2026.10078855
Abstract: This study leverages microdata from 2016 to 2022 and empirically analyse the impact of Chinas personal income tax reform on household consumption. According to the results, the 2019 tax reform, including measures like raising the deduction threshold and readjusting tax rate, significantly increased residents disposable income, thereby boosting consumption levels. However, there are still certain issues incomplete special additional deductions, a lack of consideration for household income and expenditure, and dubious justifications for deduction amounts. These issues contribute to perceived inequities in tax burdens and a decline in attitudes toward tax compliance to address these challenges, this study proposes several recommendations to address these issues, including implementing joint family tax filing, reasonably adjusting deduction standards, optimising the tax rate structure, and optimising tax administration systems. These recommendations aim to further stimulate household consumption and expand the national economy. The findings not only provide a new perspective on the economic effects of income tax reform but also offer empirical evidence to inform improvements in tax policy. Keywords: personal income tax reform; household consumption; special additional deductions; consumption stimulus; economic effect. DOI: 10.1504/IJICT.2026.10078856
Abstract: Adolescent online traces can reveal early shifts toward harmful trajectories, yet signals are scattered across text, timing, and peer exposure. To address fragile single-modality profiling, this paper proposes an evidence-linked framework that couples behaviour episode graphs with Transformer-based language representations. First, raw logs are segmented into sessions and converted into a heterogeneous interaction graph to encode rhythm and exposure. Then, event-linked texts are embedded to capture stance and intent cues. Finally, an adaptive fusion learner predicts multi-label psychological trait proxies and a risk score with traceable evidence. Experiments on a de-identified dataset of 8,420 users and 3.6 million events show the proposed method achieves AUC 0.879 and F1 0.821, improving over the strongest single-modality baseline by 0.037 AUC and 0.044 F1, with higher precision 0.833 and recall 0.815. The results indicate robust, interpretable profiling for research-oriented prevention. Keywords: large language models; network behaviour analysis; juvenile delinquency; psychological trait mining; multimodal fusion; behaviour graph learning; risk profiling. DOI: 10.1504/IJICT.2026.10078857
Abstract: Diagnosing motor resonance faults in real-time on programmable logic controllers are challenging due to domain shifts across changing conditions and stringent hardware limits. This study introduces a deep causal adversarial migration transfer learning framework. It synergises multi-physics signal fusion with a physics-guided attention mechanism to disentangle invariant fault features from domain-sensitive variations, followed by adversarial domain alignment. The framework is subsequently lightweighted via structured pruning and quantisation for edge execution. Evaluations on the Case Western Reserve University dataset show the method achieves an average accuracy of 96.2% across varying load tasks, outperforming strong baselines by 3.1%. The final model attains a 31-millisecond inference time, which strictly complies with the sub-100 ms real-time requirement for industrial controllers, proving its effectiveness for dependable edge-based diagnosis. Keywords: motor resonance fault diagnosis; unsupervised transfer learning; domain adaptation; edge computing; programmable logic controller; PLC. DOI: 10.1504/IJICT.2026.10078858
Abstract: The amount of garbage continues to rise, making intelligent garbage classification increasingly important for future resource recovery. Current methods still rely largely on static images, which perform poorly in dynamic real-world settings. Moreover, practical applications such as surveillance cameras or mobile inspections face additional challenges including computational efficiency, scene diversity, and long-term robustness, which traditional approaches cannot adequately address. This paper presents a real-time garbage classification framework suitable for both image and video surveillance. We design an encoder-decoder structure that eliminates matrix multiplication, significantly reducing computational cost. Additionally, we introduce a dynamic tanh (DyT) layer to enhance normalisation, replace the traditional feedforward module with a Kolmogorov-Arnold network (KAN) for better interpretability of features, and employ dense layers without matrix multiplication to further boost efficiency. Experiments demonstrate that our method achieves an effective balance of accuracy, computational cost, and robustness, making it well-suited for complex, dynamic garbage detection scenarios. Keywords: waste sorting; image; MatMul-free; transformer; dynamic tanh; DyT; Kolmogorov-Arnold network; KAN. DOI: 10.1504/IJICT.2026.10078892
Abstract: This study addresses the limitations of existing UAV logistics research, which often neglects communication instability and multiple disturbances in real-time path planning. The objective is to optimise coordinated UAV paths by considering line crossing and multifactorial disruptions to improve delivery efficiency and system robustness. The methodology enhances a simulated annealing algorithm with a sub-path local search operator, integrated with a communication sensing strategy and energy consumption model. Results from six test cases show the improved algorithm reduces average distance by 3.135.41%, computation time by 13.617.6%, and boosts path quality by 11.5% in communication-hostile settings. The algorithm effectively manages disturbances like wind, obstacles, and demand changes, enhancing stability and adaptability. This research offers a novel co optimisation method for UAV coordination, balancing communication and energy efficiency, with significant practical value for urban delivery systems. Keywords: terminal logistics; route planning; cruising range; simulated annealing algorithm. DOI: 10.1504/IJICT.2026.10078893
Abstract: To address the issue of existing spelling correction methods neglecting semantic relevance in academic English translation quality analysis, a semantic spelling correction algorithm is proposed. The pipeline first performs pre-analytic validation to reduce spelling and semantic noise, then conducts coherence, grammaticality, and terminology assessments using learned features. A curated corpus comprising 20,000 academic text fragments is utilised for training and evaluation, and benchmark baselines are included to ensure methodological comparability. Quantitative results demonstrate that the designed algorithm achieves a spelling error recognition rate of 93.45% and a processing speed of 240.78 words/second, significantly improving the accuracy and efficiency of spelling correction, while maintaining semantic integrity (cosine similarity 0.87), which is significant for improving the quality of academic English translation. The work reframes correction as methodological infrastructure within quality analysis, integrating a semantic-aware module that safeguards metric fidelity before analytic scoring. Keywords: academic English translation; spelling correction algorithm; semantic analysis; academic quality assessment; deep learning model. DOI: 10.1504/IJICT.2026.10078894
Abstract: In response to the challenges faced by public art design in the urbanisation process, such as low efficiency and insufficient diversity, as well as the issues of black-boxing and poor controllability inherent in traditional generative adversarial networks (GANs), this paper proposes an innovative algorithm that integrates GANs with parametric design. The algorithm aims to decouple structure and style through a dual-branch generator, achieve dynamic regulation through a parametric attention fusion module, and enhance authenticity with a multi-scale discriminator. The dual-branch generator architecture is employed to separately process structural primitives and stylistic details of the form, while introducing a parametric attention fusion module to dynamically modulate the feature fusion process. Additionally, a multi-scale discriminator and decoupling loss function are incorporated to improve generation quality and stability. Experimental results show a FID of 15.2, an inception score (IS) of 8.9, a peak signal-to-noise ratio (PSNR) of 28.5 dB, a robustness FID of only 12.5%, and a user rating of 4.3 Keywords: generative adversarial network; GAN; public art; generation algorithm; parametric design; double branch generator. DOI: 10.1504/IJICT.2026.10078895
Abstract: With societal development, traditional community planning relies on designers experience and static norms, failing to dynamically address residents diverse subjective needs. Existing generative adversarial network (GAN) models aid design but suffer from pixel blurring, structural defects and insufficient UX semantic guidance. This study integrates GIS environmental features, embedded coding social features and scale-analysed UX features; via a gated cross-attention mechanism, it enables the generator to focus on key spatial elements. Multi-discriminator collaboration and cycle consistency loss (CCL) ensure generation quality at pixel, structural and semantic levels. SCD-20k experiments show the peak signal-to-noise ratio of Sustainable Community user experience generation (SCUE-Gen) is 34.6 dB. Its experience vector similarity (EVS) is 0.89, outperforming benchmarks like cGAN and Pix2Pix; professional and non-professional satisfaction both exceed 8.8. The framework fits urban planning workflows, offering iterative schemes balancing professional norms and residents needs, data-driven support for sustainable communities, and interdisciplinary backing for humanistic smart cities. Keywords: fused generative adversarial network; attention model; community user experience; visual attention mechanism; multimodal conditions. DOI: 10.1504/IJICT.2026.10078896
Abstract: This study develops transformer-based multimodal intelligent evaluation model for college Russian translation instruction, tackling frequent pragmatic failures and context deficiency that lead to lagged feedback. It leverages XLM-RoBERTa to capture Russians intricate morphological and syntactic features, adopts ViT for global visual context of accompanying images, and uses multi-head cross-attention (MCA) to deeply integrate and calibrate textual-visual semantics. Experiments on the improved multimodal corpus based on Wikipedia-based image text (WIT) show that the models Pearson correlation coefficient to gauge the consistency of scoring is as high as 0.835, and the error diagnosis reaches 89.4%. In identifying high-order pragmatic inconsistency errors, compared with the ResNet+XLM-R (naive fusion) baseline model, its F1-score significantly improved to 0.87 (p < 0.01), with statistical significance verified. Highly consistent with expert scores in real teaching, the model proves valuable for accurate teacher feedback and cultivating students text-image integrated translation thinking. Keywords: multimodal representation learning; Russian translation teaching; translation intelligence evaluation; XLM-RoBERTa; vision transformer. DOI: 10.1504/IJICT.2026.10078897
Abstract: This study addresses the critical challenge of automatically harvesting high-quality Korean language teaching resources from the open web, where existing methods focus on topical relevance rather than pedagogical suitability. The study proposes a novel quality-aware adaptive crawling and cleansing framework. It integrates a real-time linguistic quality assessment module, powered by universal dependencies parsing, with an adaptive crawling strategy driven by a contextual bandit algorithm. Experimental results demonstrate that quality-aware adaptive crawling and cleansing framework significantly outperforms current state-of-the-art methods. It achieves a high-quality page acquisition rate of 7.47 pages per hour (a 39% improvement), a pedagogical precision of 0.892, and a top-ranking accuracy of 0.915. The framework successfully bridges linguistic theory and web mining, offering an effective solution for building structured, high-quality pedagogical resource repositories. Keywords: adaptive web crawling; quality assessment; universal dependencies; resource cleansing. DOI: 10.1504/IJICT.2026.10078898 |
Open Access
