The Characteristic Word Vectors of Chinese Science Websites
中文关键词: 科普网站,特征词频,向量空间
英文关键词: popular science website,characteristic word frequency,vector space
吴晨生 北京市科学技术情报研究所,北京师范大学系统科学学院 
郭金忠 北京师范大学系统科学学院 
罗植 北京市社会科学院 
廖涛 北京市科学技术情报研究所 
摘要点击次数: 3103
全文下载次数: 1596
      在中国,识别科普网站的内容长期以来主要是依靠专家判断来进行。这种主观判断不仅费时费力,效 果也并不好。这其中最主要的一个原因是网站内容比较丰富,人工浏览效率低下,在一定的时间只能处理有限 的内容,对于整个网站的判断会存在不全面的地方,也具有主观性。对此问题的解决需要提出一个基于人工智 能的可以进行快速定量计算的方法。本文提出的科普网站特征向量就是讲网站内容通过计算机进行处理抽象出 来的一个向量空间模型,它能比较好的表现网站的文字内容和意思,可以最终实现机器自动判断网站内容是否 含有科普成分以及什么性质的科普内容。
      In China,the recognizing whether a website belongs to science websites relies mainly on expert judgment to proceed. This kind of subjective judgment is not only time-consuming,and the results are not reliable. Browsing and judging by experts have low efficient because the rich website content. They only can process very limited part of any website under certain time and energy. Besides this,different people may make different judgments. It is necessary to propose a quantitative method based on machine intelligence. This paper will discuss the feature word vectors of Chinese popular science websites what is processed by computer abstracted from real content based on vector space model. We think it can better the performance of the site’s textual content and meaning. Based on this method,people may make a system to automatically determine the ultimate realization of website content if it contains science ingredients as well as what kind of science content.
查看全文   查看/发表评论  下载PDF阅读器
