Publications
* denotes the corresponding author(s).
2023
-
Research complexity increases with scientists’ academic age: Evidence from library and information science
Zhentao Liang, Zhichao Ba*, Jin Mao, and Gang LiJournal of Informetrics 2023With the continued aging of the scientific workforce, the impact of this trend on scientists’ research performance has attracted increasing attention. The literature has predominantly focused on the productivity, impact, and collaboration pattern of scientists of different ages. A research gap is found in investigating the differences in the research topics studied by junior and senior scientists. This study focuses on the complexity of a scientist’s research portfolio (RPC). Based on the concept of economic complexity, RPC was measured to characterize the capability of scientists to study complex research topics. An economic algorithm was adopted to estimate RPC on heterogeneous author-topic bipartite networks using bibliographic data from the field of Library and Information Science between 1971 and 2020. Through comparisons among scientist groups, RPC shows promise in distinguishing outstanding scientists from peers who have similar values of other indicators (e.g., citations and H-index). The change in RPC was further probed across scientists’ careers and an increasing trend with academic age was found, even after removing the accumulated advantages of senior scientists. Moreover, top-ranked scientists distinguish themselves from their peers by a higher RPC in the first year and a greater growth rate during their careers. While many researchers have their highest RPC in the first year, most top-ranked scientists reach their peak RPC later in their careers. The results provide helpful references for studies on the aging effect in academia.
-
Bias against scientific novelty: A pre-publication perspective
Zhentao Liang, Jin Mao*, and Gang Li*Journal of the Association for Information Science and Technology 2023Novel ideas often experience resistance from incumbent forces. While evidence of the bias against novelty has been widely identified in science, there is still a lack of large-scale quantitative work to study this problem occurring in the pre-publication process of manuscripts. This paper examines the association between manuscript novelty and handling time of publication based on 778,345 articles in 1,159 journals indexed by PubMed. Measuring the novelty as the extent to which manuscripts disrupt existing knowledge, we found systematic evidence that higher novelty is associated with longer handling time. Matching and fixed-effect models were adopted to confirm the statistical significance of this pattern. Moreover, submissions from prestigious authors and institutions have the advantage of shorter handling time, but this advantage is diminishing as manuscript novelty increases. In addition, we found longer handling time is negatively related to the impact of manuscripts, while the relationships between novelty and 3- and 5-year citations are U-shape. This study expands the existing knowledge of the novelty bias by examining its existence in the pre-publication process of manuscripts.
-
融合“科学-技术”知识关联的高颠覆性专利预测方法
梁 镇涛, 毛 进, and 李 纲*情报学报 2023颠覆性技术的识别与预测研究在服务国家重大科技战略发展需求,保障国家科技产业安全等方面具有重要意义。本文将专利家族视为技术单元,从对技术知识空间的改变视角定义高颠覆性专利,基于两个大规模的专利(PATSTAT)和科学论文数据库(Microsoft Academic Graph, MAG)对专利的颠覆性、技术特征及“科学-技术”知识关联特征进行测度分析,并在此基础上提出了融合“科学-技术”知识关联的高颠覆性专利预测方法。本文将高颠覆性专利预测问题转化为监督式二分类任务,给定专利在其公开当年的“科学-技术”知识关联和其他技术特征,以其五年后的颠覆性指标值高低作为预测目标,训练机器学习模型。研究结果表明:①高颠覆性专利具有前置知识少且非主流、技术团队实力强、商业价值被低估、长期影响力大的特点;②专利的“科学-技术”知识关联属性是对其颠覆性进行预测的重要特征;③LightGBM模型在综合性能与训练效率上取得最佳表现,在半导体器件与电数字数据处理领域的实证结果验证了模型的有效性。但颠覆性技术的预测仍是一个困难的任务,后续研究可从专利语义特征与结合多源数据等角度尝试进一步提高性能表现。
-
Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities
Shiyun Wang, Yaxue Ma*, Jin Mao*, Yun Bai, Zhentao Liang, and Gang LiJournal of the Association for Information Science and Technology 2023Compared to previous studies that generally detect scientific breakthroughs based on citation patterns, this article proposes a knowledge entity-based disruption indicator by quantifying the change of knowledge directly created and inspired by scientific breakthroughs to their evolutionary trajectories. Two groups of analytic units, including MeSH terms and their co-occurrences, are employed independently by the indicator to measure the change of knowledge. The effectiveness of the proposed indicators was evaluated against the four datasets of scientific breakthroughs derived from four recognition trials. In terms of identifying scientific breakthroughs, the proposed disruption indicator based on MeSH co-occurrences outperforms that based on MeSH terms and three earlier citation-based disruption indicators. It is also shown that in our indicator, measuring the change of knowledge inspired by the focal paper in its evolutionary trajectory is a larger contributor than measuring the change created by the focal paper. Our study not only offers empirical insights into conceptual understanding of scientific breakthroughs but also provides practical disruption indicator for scientists and science management agencies searching for valuable research.
2022
-
海外交流对学者职业生涯的影响:以国家建设高水平大学公派联合培养博士研究生为例
梁 镇涛, 毛 进, and 李 纲*图书情报工作 2022[目的/意义] 海外交流是学者探索研究前沿与寻求科研合作的重要途经。国家留学基金委(CSC)选派的联合培养博士研究生作为受资助的主要群体,其海外交流经历对后续职业生涯的影响却较少获得前人研究的关注。为此,本文以联培博士生为研究对象,基于履历和学术成果分析海外交流对其职业生涯的影响,为国家评估公派留学政策效果提供有价值的参考。[方法/过程] 首先,根据期刊论文和博士论文致谢信息识别受CSC资助进行联合培养的博士生作为实验组,并识别与实验组样本同导师同年毕业但并未联培的博士生作为控制组。其次,采集上述样本的简历与论文成果,运用计量方法分析两组样本在就业情况、论文产量和质量上的差异,从而评估海外交流对学者职业生涯的影响。[结果/结论] 进行联合培养的博士研究生在毕业后更可能进入研究型机构继续从事研究工作,其职称和任职机构排名相较于对照组更高,海外交流还有助于提升学者的产量与论文质量。科研管理机构可进一步完善公派留学政策来提高我国的科技人力资本。
-
Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec
Qiang Gao, Xiao Huang*, Ke Dong, Zhentao Liang, and Jiang WuScientometrics 2022The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant.
2021
-
Combining deep neural network and bibliometric indicator for emerging research topic prediction
Zhentao Liang, Jin Mao*, Kun Lu, Zhichao Ba, and Gang LiInformation Processing & Management 2021Predicting emerging research topics is important to researchers and policymakers. In this study, we propose a two-step solution to the problem of emerging topic prediction. The first step forecasts the future popularity score, a novel indicator reflecting the impact and growth, of candidate topics in a time-series manner. The second step selects novel topics from the candidates predicted to be popular in the first step. Terms with domain characteristics are used as candidate topics. Deep neural networks, specifically LSTM and NNAR, are applied with nine features of topics to predict popularity score. We evaluated the models and five baselines on two datasets from two perspectives, i.e., the ability to (1) predict the correct indicator value and (2) reconstruct the optimal ranking order. Two types of training strategies were compared, including a global strategy that trains a model with all topics and two local strategies that train separate models with different groups of topics. Our results show that LSTM and NNAR outperform other models in predicting the value of popularity score measured by MAE and RMSE, while LightGBM is a competitive baseline in ranking the topics in terms of NDCG@20. The performance difference of global and local strategies is not significant. Emerging topics predicted by our approach are compared with those by other methods. A qualitative assessment on nominated emerging topics suggests topics nominated by machine learning methods are more alike than those by the rulebased model. Some important topics are nominated according to a preliminary literature analysis. This study exploited the strengths of both machine learning and bibliometric indicator approaches for emerging topic prediction. Deep neural networks are applied where objective optimization target can be defined and measured. Bibliometric indicator offers an efficient way to select novel topics from candidates. The hybrid approach shows promise in considering various characteristics of emerging topics when making predictions.
-
Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources
Zhentao Liang, Jin Mao*, Kun Lu, and Gang LiScientometrics 2021As an important biomedical database, PubMed provides users with free access to abstracts of its documents. However, citations between these documents need to be collected from external data sources. Although previous studies have investigated the coverage of various data sources, the quality of citations is underexplored. In response, this study compares the coverage and citation quality of five freely available data sources on 30 million PubMed documents, including OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI), Dimensions, Microsoft Academic Graph (MAG), National Institutes of Health’s Open Citation Collection (NIH-OCC), and Semantic Scholar Open Research Corpus (S2ORC). Three gold standards and five metrics are introduced to evaluate the correctness and completeness of citations. Our results indicate that Dimensions is the most comprehensive data source that provides references for 62.4% of PubMed documents, outperforming the official NIH-OCC dataset (56.7%). Over 90% of citation links in other data sources can also be found in Dimensions. The coverage of MAG, COCI, and S2ORC is 59.6%, 34.7%, and 23.5%, respectively. Regarding the citation quality, Dimensions and NIH-OCC achieve the best overall results. Almost all data sources have a precision higher than 90%, but their recall is much lower. All databases have better performances on recent publications than earlier ones. Meanwhile, the gaps between different data sources have diminished for the documents published in recent years. This study provides evidence for researchers to choose suitable PubMed citation sources, which is also helpful for evaluating the citation quality of free bibliographic databases.
-
A Knowledge Representation Model for Studying Knowledge Creation, Usage, and Evolution
Zhentao Liang, Fei Liu, Jin Mao, and Kun Lu*In iConference 2021A knowledge representation model is proposed to facilitate studies on knowledge creation, usage, and evolution. The model uses a three-layer network structure to capture citation relationships among papers, the internal concept structure within individual papers, and the knowledge landscape in a domain. The resulting model can not only reveal the path and direction of knowledge diffusion, but also detail the content of knowledge transferred between papers, new knowledge added, and changing knowledge landscape in a domain. A pilot experiment is carried out using the PMC-OA dataset in the biomedical field. A case study on one knowledge evolution chain of Alzheimer’s Disease demonstrates the use of the model in revealing knowledge creation, usage, and evolution. Initial findings confirm the feasibility of the model for its purpose. Limitations of the study are discussed. Future work will try to address the recognized limitations and apply the model to large scale automated analysis to understand the knowledge production process.
-
A novel approach to measuring science-technology linkage: From the perspective of knowledge network coupling
Zhichao Ba, and Zhentao Liang*Journal of Informetrics 2021Identifying and measuring science-technology linkage is important for understanding interactions between science and technology (S&T). Previous studies have focused mainly on knowledge linkages of knowledge systems between S&T but have ignored their structural linkages. To this end, we propose a novel knowledge network coupling approach to gauge network linkage between S&T by integrating knowledge linkages and structural linkages. Four network construction strategies were first adopted to determine appropriate knowledge networks of S&T, and then their coupling strengths over time were calculated based on similarities of coupling nodes’ degree distribution and similarities of coupling edges’ weight distribution. An experimental study in the field of energy conservation confirms that our approach was indeed successful in revealing interactions between S&T. The proposed approach enriches the current methodology for measuring S&T linkages and provides references for policymakers to conduct policy adjustments, by identifying the lead-lag relationship between S&T.
-
Potential index: Revealing the future impact of research topics based on current knowledge networks
Qiang Gao, Zhentao Liang, Ping Wang*, Jingrui Hou, Xiuxiu Chen, and Manman LiuJournal of Informatics 2021As the volume of scientific publications has been growing at an increasingly rapid speed, it is important to identify prominent research trends for scientists and institutions. While a considerable number of researchers have attempted to map the current state of scientific research, more efforts should be made to reveal potentially influential research topics. In this study, we investigate the relationship between the scientific impact of a research topic and the structure of its knowledge network. A novel indicator, potential index , is proposed to model topic impact based on the structural information. It is an immediate indicator with two components: knowledge novelty and diversity, which are operationalized using the concepts of betweenness centrality and network entropy. The empirical results show that potential index serves as a good predictor of future topic impact, with a high R-2 and positive correlation. Its superiority sustains when used as the input feature of regression models. Moreover, the proposed index achieves better results, and the differences between it and other features become more prominent as the model complexity increases. Quantitative and qualitative analysis on the topic evolution process is also conducted to explain the change in the proposed indicator. This study contributes to the research of scientific impact modeling by establishing an explicit relationship between the impact of topics and the knowledge structure, and is thus helpful in predicting the potential impact of research topics.
-
Exploring the effect of city-level collaboration and knowledge networks on innovation: Evidence from energy conservation field
Zhichao Ba, Jin Mao, Yaxue Ma*, and Zhentao LiangJournal of Informetrics 2021Collaboration and knowledge networks have been proved to play a crucial role in innovation. From a multilevel network perspective, this study integrates research on the two types of net-works and investigates how city-level collaboration and knowledge networks influence innova-tion in the energy conservation field. To this end, we calculate a city’s influence force in its collaboration network based on the weighted PageRank algorithm and propose a novel measure-ment method of network embedding to gauge the embedding depth and embedding breadth of a city’s local knowledge network in the whole knowledge network. Empirical results suggest that a city’s aggregation index and influential force in the collaboration network are positively related to its innovation, while geographical distance shows an inverted U-shaped effect. The embedding depth and embedding breadth of a city’s local knowledge network have a positive effect, and the structural entropy of its knowledge network generates an inverted U-shaped effect on innova-tion. Our research contributes to a better understanding of the impact of city-level collaboration and knowledge networks on innovation and points to several general implications for innovation practice and complex network research.
2020
-
Quantifying cross-disciplinary knowledge flow from the perspective of content: Introducing an approach based on knowledge memes
Jin Mao, Zhentao Liang*, Yujie Cao, and Gang LiJournal of Informetrics 2020Knowledge flow between disciplines is typically measured through citations among publications. In this study, we quantify cross-disciplinary knowledge diffusion from the novel perspective of content by introducing knowledge memes, a special type of knowledge unit. Diffusion cascade is proposed to model the diffusion process of knowledge memes. By taking Medical Informatics (MI) as an exemplary interdisciplinary discipline, we measure the knowledge relationships between it and four related disciplines. The diffusion patterns of cross-disciplinary memes are also identified by analyzing the network structure of the diffusion cascade. The results present the knowledge relationships among disciplines measured by knowledge memes, which are different from those measured by citations. It is shown that preferential attachment takes effect in cross-disciplinary knowledge meme diffusion. In addition, cross-disciplinary knowledge memes generally originate earlier and have higher impact than the memes of MI. This study provides insights into new approaches to quantifying knowledge relationships among disciplines and furthers the understanding of content diffusion mechanisms through measurable knowledge units.
-
基于引文的跨学科领域发展路径分析——以眼动追踪领域为例
梁 镇涛, 巴 志超, and 徐 健*图书情报工作 2020跨学科研究已成为现代科学创新研究的重要范式和必然趋势,探究跨学科领域中学科的发展模式与演化路径,对于揭示跨学科领域形成与发展的动态过程具有重要意义。[方法/过程]以眼动追踪(Eye Tracking, ET)领域为例,对文献引文关系进行提取与学科标注,构建文献和学科层面的引文关系网络;计算各学科的他引比率、他被引比率和普赖斯指数,从宏观层面分析ET领域中主要学科的跨学科发展模式;考察不同阶段内部及不同阶段之间的学科引证关系,探究不同阶段各学科在跨学科发展过程中的关系结构与角色演变;基于引文的中介中心度识别连接不同学科关系的重要文献,考察重要文献、高被引文献以及参考文献之间的引文关系,从微观层面揭示ET领域发展的具体演化路径。[结果/结论] ET领域发展经历潜伏期、发展期和成熟期三个阶段,并呈现独立型、交叉型和学习型三种学科发展模式;各学科之间的引证关系随阶段变化逐渐紧密且分布逐渐均匀,神经学、心理学和临床医学在跨学科发展和知识输出方面处于核心地位;ET领域纵向发展表现为独立型学科的基础理论创新,横向发展表现为3种类型学科的深度融合,并呈现出"独立-线性-网状"的发展路径。
-
基于知识模因级联网络的领域知识扩散模式分析
梁 镇涛, 毛 进*, 操 玉杰, and 李 纲情报理论与实践 2020[目的/意义]知识经济时代,知识的生产、扩散和消费是社会经济发展的重要推动力。其中,知识扩散是充分发挥知识价值的重要过程,从微观层面理解知识扩散的规律对促进知识利用与创新具有重要意义。[方法/过程]以医学信息学领域的科学文献为例,文章基于知识模因的微观层面,采用知识模因识别方法从文献中提取出知识模因来表征知识单元,并基于文献引证网络构建各知识模因的扩散级联网络,计算分析扩散级联网络的基础特征及其特征分布情况,以考察不同知识模因在学科领域内的扩散模式。[结果/结论]医学信息学领域内发现了四种典型的知识模因扩散模式:单起源型、多起源—独立型、多起源—迭代型、多起源—融合型。此外,通过对级联网络的各项属性分布特征进行分析发现,网络呈现无标度性,医学信息学领域极少数的知识模因获得了大量的传播资源,领域内的研究集中于被少数知识模因所代表的研究方向上,而领域内各研究方向的生命周期长度则相对差异较小。
2019
-
Idea diffusion patterns: SNA on knowledge meme cascade network
Zhentao Liang, Jin Mao*, Yujie Cao, and Gang LiIn Proceedings of the 17th International Conference on Scientometrics and Informetrics (ISSI) 2019