国家公园文献大数据研究
Big Data on National Parks Publication
我们运用大数据工具,分析了国家公园这一领域在过去25年的学术论文成果。本文记录了其中一些初步的分析结果。
This page shows some observations we made in a recent big data analysis on National Park research. The purpose of the analysis is to get our feet wet and figure out some basics of this active research field over the past 25 years.
先说结论:(1)国家公园研究正在获得越来越多的关注,学术论文数量成倍增长;(2)研究热点随着时间产生了一些变化;(3)中国在最近几年迅速崛起成为国家公园领域的研究大国。
Some interesting observations are : (1) substantial increase of research output in this field, measured with the number of publication; (2) a few but major changes in key research terms; (3) China becoming a major contributor in recent few years.
我们的数据主要来源于Google Scholar。研究方法如下:(1)以"国家公园"为关键字检索Google Scholar,记录每年学术论文数量。(2)使用网络爬虫抓取1万篇相关度最高的论文,从1999年至2023年每年400篇。(3)提取论文信息,包括题目、出版物,引用次数等。(4)统计论文标题关键词。(5)借助自然语言处理工具,筛选论文标题中与地理位置相关的词,将其转化为国家/地区名称。
The data of our analysis come mainly from Google Scholar. Our analysis method is as follows: (1) Query Google Scholar with the key phrase "National Park", recoding the number of publication by year. (2) Scraping in total 10,000 academic papers by relevance, ie 400 papers each year from 1999 to 2023. (3) Parsing papers' information, including their title, publication name, citation etc. (4) Generating word cloud based on the titles (we are too lazy to carry out a full-body natural language processing analysis). (5) extracting geolocation information from the titles (if any) using natural language processing (NLP) tools, and translating such information into country names.
首先是论文数量。如下图所示,过去二十五年,国家公园相关的论文增长了2.5倍,从1999年的17,000余篇到2023年的43,000余篇。其中,在2012至2019年间一度超过6万篇每年。
Let's start with the number of publication. The annual publication under the key term "National Park" has increased by 2.5 times over the past 25 years, peaked during 2013 to 2019 at over 60,000 per year.
然后是标题关键词。本文开头展示的图片由2023年的文章标题统计生成。其中一些关键词在过去25年中经常出现,另一些出现次数则较少。为了研究变化趋势,我们选取了3个时间段,1999至2003年,2009年至2013年,和2019年至2023年,统计标题中出现频率排名前10的关键词。统计结果如下表,其中绿色表示在三个时间段都排进前10的关键词。
Then, the high-frequency title words. The word cloud picture at the head of this webpage is a statistic of 2023. Over the past 25 years, some of the terms have appeared very frequently; some do not. We select three different time periods for analysis, from 1999 to 2003, from 2009 to 2013, and from 2019 to 2023. The top 10 high-frequency title terms are listed below. The common terms appeared in all 3 lists are marked in green.
黄石国家公园和山地国家公园(后者包括了多个山地国家公园)在1999~2003都进了前10;2009~2013年,两者排名都有所下滑但仍位列前10;到了2019~2023年,两者都掉出了前10(甚至前20)。这说明,过去十到二十年的研究其中大量都围绕着黄石等少数几个地点开展。到了近五年,研究的地点正在变得越来越广。
An interesting observation: Yellowstone National and Mountains National (the latter includes several different national parks) both topped the 1999~2003's research terms (at the 2nd and 9th places of the list separately), dropped slightly in 2009~2013 (to 9th and 10th places separately), and both fell out of the list in 2019~2023. The implication is that research on national park previously focused on a few sites are carried out in many more different places in recent years.
名单上取而代之的是"多样性"、"旅游"、和"分布"。其中,旅游一词是首次出现在前10的名单(以及前20的名单)。它与其他关键词有明显区别。过去25年的关键词,除了国家公园名称之外,大多聚焦自然/环境领域,例如森林(forest)、物种(species)、保护(conservation)等等。旅游研究则聚焦人的活动。它的出现,可以是对传统研究的补充,也可能是一个新的研究热点。无论哪种情况,这对于国家公园研究来说都是一个重要的进展。
The replacements are "diversity", "tourism", and "distribution". Tourism appears for the first time in the list (top 20 list as well). Unlike other research terms such as forest, species, and conservation that focus on natural and/or environmental issues, the primary focus of tourism would be people's activities, such traveling and staying. This could mean a new research focus, or a complement of the previous ones. Either way it can be an important progress of the national park research.
Rank(排名) | 1999~2003 | 2009~2013 | 2019~2023 |
---|---|---|---|
1 | National Park | National Park | National Park |
2 | management | forest | forest |
3 | forest | management | case study |
4 | Yellowstone National | species | conservation |
5 | conservation | conservation | diversity |
6 | vegetation | case study | case study |
7 | species | analysis | management |
8 | analysis | diversity | analysis |
9 | Mountains National | Yellowstone National | tourism |
10 | Kruger National | Mountains National | distribution |
最后是地理位置。下图显示了在论文标题中出现过的国家和地区,以及他们的出现次数随时间的变化。过去25年,无论从学术成果的数量上,还是学术成果的引用次数上(未在本图体现),美国都领先于其他国家。此外,亚非发展中国家贡献巨大。印度、南非等发展中国家持续产出了大量优秀的学术成果。
Last, geolocation in publication. The map below shows the countries/regions and their frequency of occurrence over the past 25 years. US has been the leading player, in both quantity and quality (the latter measured with number of citations, not shown on this figure). Other developing counties like South Africa and India have been making important contributions as well.
中国近年来突飞猛进。2017年以前,中国的国家公园研究数量屈指可数,因为我们那时候还没有国家公园。2017年,中办国办印发《建立国家公园体制总体方案》,随后第一批国家公园建立。同一时期,研究成果开始涌现。到2023年,中国的学术成果数量已经仅次于美国位列第二,其中不乏高引用的优秀成果。
China is rising rapidly. Before 2017, China's research on national parks was nearly absent from the map, because it did not have any national parks at that time. The change happened at 2017, when Chinese Government released its general plan for establishing national park system, followed by establishment of the first batch of five national parks. The plan and later establishment of several national parks in China have sparked immense interest in this area. By 2023, China has the 2nd largest publication after US, with several widely cited papers.
以上是我们运用大数据工具,对国家公园文献做的一些初步探索。这一新兴领域还有大量成果值得深入发掘。有机会的话我们也会做进一步研究。
The above observation is just the tip of the iceberg of National Park research over the past 25 years. There is much more that can be explored and that we will explore in the future.