AI Research Breakthrough: Multimodal Systems Move Beyond Vision and Language banner

Research

AI Research Breakthrough: Multimodal Systems Move Beyond Vision and Language

University of Sheffield and Turing Institute Redefine Ethical AI Development

Indeed, a new framework has developed, which AI researchers at the University of Sheffield and the Alan Turing Institute have evolved to build ever more trustworthy AI. That framework outlines a practical and ethical route for artificial intelligence systems intended to learn from a broader range of data types-not just vision and language-increasing their applicability in setting real-world situations. This paper describes a roadmap for multimodal AI in Future Machine Intelligence. These types of AIs are designed to learn from multimodal information such as text, images, audio or sensor data. Despite having achieved much through AI and computer science-progress areas such as image recognition and natural language processing, researchers observe that, currently, most AI models are still heavily dependent on these two data modalities. Hence, this limits the functionality of AI that could solve complex real-world problems.

Further, self-driving automobiles, healthcare diagnostics, and climate modelling would benefit tremendously, with improved performance and safety by increasing the breadth of data, that is, environmental, clinical, or sensor-based. For example, visual-sensing data integration may help autonomous cars navigate more safely under unpredictable conditions. Similarly, integrating medical, clinical and genomic data could lead to more accurate disease diagnosis and more effective drug development.

The framework can also be used by industry developers and academic institutions as a blueprint. This is especially timely, considering that almost 89% of the AI studies carried out with only these two types of data in 2024 were purely focused on vision and language, as evidenced by the analysis of the papers on arXiv, a major platform for AI research in universities. Pioneering the study at the University of Sheffield's School of Computer Science and Centre for Machine Intelligence, Professor Haiping Lu said that although AI has made great strides in vision and language, reality is a much messier landscape. It is learning to build AI with a much broader mix of data and expert knowledge to address global problems such as pandemic control, sustainable energy and climate change. That, he insisted, is a deployment-focused blueprint for building trustworthy AI that is safe, effective and reliable outside the laboratory.

The research was supported through the Turing Institute's interest group, headed by Professor Lu, under the Meta-learning for Multimodal Data. Last year, this group brought together researchers in AI from around the UK and beyond, pooling their efforts and fostering interdisciplinary collaboration. This initiative also created the UK Open Multimodal AI Network (UKOMAIN): a project now helmed by Professor Lu and awarded £1.8 million by EPSRC. The network aims to promote practical AI development and deployment-focused multimodal systems across the country.

Dr Louisa van Zeeland was a Research Lead at the Turing Institute, highlighting the importance of modelling diverse datasets and their larger sizes. Indeed, this methodology is already creating a new model for environmental forecasts ranging from Arctic conservation to agricultural planning. Her comments are consistent with the Turing Institute's broader agenda in AI research, namely that AI should not only be functionally advanced but also ethical and socially advantageous. This particular work demonstrates how AI in British universities will continue to evolve to meet real-world needs. It represents a good example of how ethical AI can be harnessed in an applied fashion to challenges that affect everybody, from public health to climate resilience. Moulding the future of AI innovation, the University of Sheffield continues to play a leading role for aspirants in AI PhD programs or pursuing computer science with AI.

 

Editor’s Note:

AI researchers at the University of Sheffield and at the Alan Turing Institute have proposed a new framework in which practical and ethical frameworks for trustworthy AI systems can be built. The fifteen-page study, published in Nature Machine Intelligence, lays out an approach by which artificial intelligence is planned to learn from more diverse data types, not only vision or language, but also enables the application of such AIs to real-world settings, where they will have to be more effective and reliable. This integration of text, numerals, images, audio, and sensor data leads an AI to respond better to a challenge in a more accurate and safe manner while dealing with bigger problems. Nevertheless, the cores of most of the AI models in use today are vision and language, which are known limitations in domains like healthcare, climate modelling, and autonomous vehicles. The wider combination of other data sources, which may include clinical, genomic, and environmental inputs in AI, adds to the degree of accuracy of diagnosis, enhances the safety of navigation, and improves forecasting. With 89 per cent of current AI research focusing on only two types of data, the framework is intended to guide academia and industry developers. Supported by the Meta-learning for Multimodal Data Interest Group of the Turing Institute, the initiative also helps establish the UK Open Multimodal AI Network (UKOMAIN), a £1.8m EPSRC-funded initiative led by Professor Haiping Lu. This collaboration reflects the fact that AI university research is increasingly becoming an influential avenue for shaping ethical, practical AI development throughout the UK. Moreover, the University of Sheffield remains a leader in innovation for this new and evolving field, through PhD studies in AI and computer science with AI.

Skoobuzz demonstrates that trustworthy AI is not only possible but also essential, aligning with how AI should be developed for responsible societal service.

 

FAQs

1. What is the new AI framework developed by the University of Sheffield and the Turing Institute?

The framework is a practical and ethical guide for building trustworthy AI systems that learn from multiple types of data—not just vision and language. It supports the development of multimodal AI for real-world applications.

2. What does a trustworthy AI system mean?

A trustworthy AI system is safe, reliable, ethical, and effective in solving real-world problems. It integrates diverse data sources and is designed to work beyond controlled lab settings.

3. How is the University of Sheffield contributing to AI innovation?

The University of Sheffield is leading research in multimodal AI through its School of Computer Science and Centre for Machine Intelligence. It also heads the UK Open Multimodal AI Network (UKOMAIN), a £1.8 million EPSRC-funded initiative.

4. What is the Turing Institute’s role in AI research?

The Alan Turing Institute supports collaborative AI research across disciplines. It backed the development of the new framework through its Meta-learning for Multimodal Data Interest Group and continues to lead in ethical AI and environmental forecasting.

5. What are multimodal AI systems?

Multimodal AI systems are designed to learn from various types of data—such as text, images, audio, and sensor readings—allowing them to form a more complete understanding of complex environments.

6. Why is relying only on vision and language data limiting for AI?

Most current AI models focus on vision and language, which restricts their ability to handle complex, real-world challenges. Broader data integration improves accuracy, safety, and usefulness in fields like healthcare and autonomous transport.

7. What are the practical applications of ethical AI?

Ethical AI can be applied to pandemic response, climate change adaptation, self-driving car safety, disease diagnosis, and drug discovery. It ensures that AI solutions are socially responsible and technically sound.

8. How does this framework support practical AI development?

The framework provides a roadmap for deploying AI systems that work in real-world conditions. It helps developers and researchers build models that are robust, transparent, and aligned with societal needs.

9. What is UKOMAIN, and how does it relate to this research?

UKOMAIN (UK Open Multimodal AI Network) is a national initiative led by the University of Sheffield to advance deployment-focused multimodal AI. It builds on the collaborative foundation established by the Turing Institute’s interest group.

10. How can students study AI at Sheffield University?

Students can pursue AI PhD programmes or study computer science with AI at the University of Sheffield, which is recognised for its leadership in AI university research and ethical innovation.

11. What does the study published in Nature Machine Intelligence reveal?

The study outlines the limitations of current AI research and proposes a new framework for building systems that integrate diverse data types. It highlights the need for trustworthy AI that can address global challenges.

12. How is multimodal AI used in environmental forecasting?

By modelling large and varied datasets, multimodal AI enables accurate predictions across spatial and temporal scales. This supports efforts in Arctic conservation, agricultural planning, and climate resilience.