Kenedict Innovation Analytics

Untangling the State of the Art in Artificial Intelligence by Applying Text Mining and Network Analysis

Artificial intelligence is being applied in a wide variety of areas and has generated a lot of attention over the past few years. Keeping up to date with the advances in this broad field is challenging because of the sheer volume of scientific output. This blog post provides an interactive overview of the most impactful recent AI research through a combination of text mining and network analysis.

Data and Search Strategy

The data used for this analysis was collected using The Lens, a free and open website which serves and integrates scholarly and patent data. The search strategy was quite straightforward: we focused on scientific publications which mention “artificial intelligence”, its subdomains “machine learning” or “deep learning” or the term “neural network” in their titles, with a publication date in 2018. The initial dataset consisted of 7139 documents. To focus on the most impactful articles, the data was then filtered to only include articles with 2 or more citations. This leads to a final dataset of 3676 articles.

The number of citations received by an article can be seen as a key indicator of its overall impact. All articles in the filtered dataset have gathered citations already, despite their relatively short publication lifespans. In this analysis, we see these articles as constituting the ‘state of the art’ in AI research based on the impact they’ve had within a short time frame.

Identifying Research Clusters

The identification of groups or clusters of related documents was carried out using a combination of text analysis and network analysis. The key assumption here is that documents can be linked when they share similar content in their abstracts. TFIDF (term frequency – inverse document frequency) models are a common technique in text analysis to assign numerical vectors to documents (here, their abstracts), which can then be used to calculate pairwise similarities between all documents in the dataset. This is what was done during the first step of the analysis carried out here.

After obtaining the similarity scores for all pairs of documents, the next step of the analysis focused on visualizing the results in an interactive network visualization to explore the dataset. The networks were created using Kenelyze, Kenedict’s network visualization platform. Links were initially drawn between documents when their cosine similarity exceeded 0.1. Clusters were then identified using a community detection algorithm and used to color the nodes in the network. The size of nodes reflects the number of citations the document has received from other documents.

Exploring and Annotating the Network

The graphic above shows the results of the analysis. Clusters of related content were labeled based on manual examination of the documents. As expected, we can see a wide variety of themes and application areas. The left-hand side of the visual shows output relating to advances in various types of neural networks and model optimization, while more practical applications in health care, chemistry and biology can be found on the right-hand side of the network. ‘Classic’ AI tasks such as image and video classification can be found in the clusters at the bottom center of the graphic.

Emerging research areas such as Quantum Machine Learning are represented as well, with various connections to a cluster on molecular/atomic properties:

2018 also saw significant energy-related output, varying from documents on wind speed prediction and streamflow forecasting to general energy forecasting models:

Detection and classification of various types of cancer is an active research area as well:

cancer Interactively Exploring the Visualization

Network visualizations are an excellent way to exploratively dive into a dataset. The interactive visual below allows you to search and zoom to individual documents or filter by keywords of interest. For example, try typing ‘quantum’ in the ‘Filter network’ box to see where documents relating to quantum machine learning are located. Clicking a document shows its properties in a panel and allows you to read its abstract. You can find a full-screen version of the visual here.

Combining Network Analysis and Text Mining for New Insights

Mapping science or IP output in network visualizations is often based on the use of readily available metadata such as citations, author keywords, or listed authors and affiliations. Text mining of documents, as carried out for this blog post, can provide a valuable additional perspective on published output in an area of interest. Grouping output based on shared content/language allows for quick identification of key themes in a dataset and is an excellent way to explore trends and connections between clusters.

If you have any questions about this analysis or would like to apply a similar approach to your own area of interest, please let me know: andre.vermeij@kenedict.com.

Leave a Reply

Kenedict’s mission

To significantly improve Research & Development organisations’ decision-making by ensuring day-to-day usage of a social network perspective in the areas of Open Innovation, Competitive Intelligence, Human Resource Management and R&D Policy.

Contact Kenedict

Phone: +31 61 55 00 888

E-mail: info@kenedict.com