科普知识

Popular Science Park

当前位置:首页 > 公园建设 > 科普园地

指尖上的地球大数据

发布时间:2015-05-10 00:00:00 点击:442

[导读] 德国不莱梅雅克布大学的科学家们开发的EARTHSERVER项目,扩大地球大数据处理的领域,能够让研究人员访问和分析从多种渠道获取的多维数据。


中国科技网6月25日报道(张微 编译)扩大地球大数据处理的领域,EARTHSERVER项目能够让研究人员访问和分析从多种渠道获取的多维数据。

地球科学,如地质学,海洋学和天文学,产生海量的大数据。但没有合适的工具可利用,科学家们要么淹没在地球大数据海洋中,要么是大数据躺在档案馆里沉睡,很少被利用。

EARTHSERVER项目的愿景是为研究人员提供“指尖上的地球大数据”,这样他们仅仅通过点击几次鼠标就可以访问和处理海量数据集。

项目协调员,来自德国不莱梅雅克布大学计算机科学专业教授彼得·鲍曼说,“项目是‘推’和‘拉’合力的结果。”在需求方面,我们有处理海量数据的需求。在供给方面,我们有数据立方体技术适合这个领域。数据立方体是一个三维(或更高)数组的值,常用来描述时间序列的图像数据。

数据立方体帮助研究人员访问和可视化数据

EARTHSERVER项目建立了更加先进的数据立方体技术和定制门户网站,使研究人员用三维数据集,二维题图或一维图提取和可视化地球科学数据成为可能。例如,英国地质调查局利用EARTHSERVER技术,用三维方法穿透地球的不同层。

“对于用户来说,数据立方体能够隐藏不必要的复杂数据,”鲍曼教授说。“作为用户,我不想看到一百万个文件:我只想看几个数据立方体。”

地球科学中的海量数据是由传感器、图像,模拟和统计数据来描述,通常都与时间维度相关联。数据通常形成规则或不规则的网格值和时间/空间坐标。EARTHSERVER使这些数组成为数据立方体。

除了使用方便,数据立方体还可以整合不同学科的数据,而且科学家们可以将测量数据与仿真数据相结合。

建立在现有技术基础上

为了有效地处理地球大数据,EARTHSERVER项目需要提高现有技术和标准。例如,SQL数据库查询语言更多地面向字母数字数据的操作。

为了成为数据立方体,这个项目需要在rasdaman基础上建设,这是一个新的数据库管理系统,专门用于多维网格数据,calledrasters数据和数组。Rasdaman能够从任何规模的地球大数据数组中灵活、快速地提取数据。

“实际上,我们促成了SQL数据库语言与图片处理技术的联姻,”鲍曼教授说。“这是现在成为ISO SQL标准的一部分。”

此外,该项目已经对欧洲空间数据基础设施,开发地理空间信息联盟的地球大数据标准产生了重要影响。

EARTHSERVER项目的研究人员还开发了一个‘语义并行化’技术,将一个单一数据库分为多个子查询库。这些都被发送到其它数据库服务器上进行处理。

这种方法允许EARTHSERVER分配单个输入查询到超过1000个云节点上,而且能够在不到一秒钟的时间里快速回答数千兆兆字节的查询。


Big Earth data at your fingertips becomes a reality


Pushing the boundaries of Big Earth Data processing, the EARTHSERVER project allows researchers access and analyse multi-dimensional data from a wide range of sources.

The earth sciences, like geology, oceanography and astronomy, generate vast quantities of Big Data. Yet without the right tools scientists either drown in this sea of Big Earth Data or it sits in an archive, barely used.

The vision of the EARTHSERVER project is to offer researchers 'Big Earth Data at your fingertips' so that they can access and manipulate enormous data sets with just a few mouseclicks.

'The project was the result of a 'push' and a 'pull',' says project coordinator Peter Baumann, Professor of Computer Science at Jacobs University in Bremen, Germany. 'On the demand side there was a need for new concepts to handle the wave of data crashing down on us. On the supply side we had a data cube technology that is well-suited to this domain.' A data cube is a three- (or higher) dimensional array of values, commonly used to describe a time series of image data.

Data cubes help researchers access and visualise data

EARTHSERVER built advanced data cubes and custom web portals to make it possible for researchers to extract and visualise earth sciences data as 3-D cubes, 2-D maps or 1-D diagrams. The British Geological Survey, for example, used EARTHSERVER technology to drill down through different layers of the earth in 3-D.

'For the user, data cubes hide the unnecessary complexity of the data,' says Professor Baumann. 'As a user, I don't want to see a million files: I want to see a few data cubes.'

The massive data in the earth sciences is represented by sensor, image, simulation, and statistics data, often with a time dimension. The data typically form regular or irregular grid values with space/time coordinates. EARTHSERVER made these arrays available as data cubes.

Aside from ease-of-use, the data cubes also made it possible to integrate data from different disciplines, and scientists could combine measurement data with data generated from simulations.

Building on existing technologies

To handle Big Earth Data efficiently, EARTHSERVER needed to extend existing technologies and standards. The SQL database query language, for example, is more oriented towards the manipulation of alphanumeric data.

To enable data cubes, the project was built upon rasdaman, a new type of database management system specialised in multi-dimensional gridded data, calledrasters or arrays. Rasdaman enables the flexible, fast extraction of data from Big Earth Data arrays of any size.

'Essentially, we have married the SQL database language with image processing,' says Professor Baumann. 'This is now becoming part of the ISO SQL standard.'

In addition, the project has strongly influenced the Big Earth Data standards of the Open Geospatial Consortium and INSPIRE, the European Spatial Data Infrastructure.

EARTHSERVER's researchers also developed a 'semantic parallelisation' technology that sub-divides a single database query into multiple sub-queries. These are sent to other database servers for processing.

This method allows EARTHSERVER to distribute a single incoming query over more than 1 000 cloud nodes and rapidly answer queries on hundreds of Terabytes in less than a second.


来源:中国科技网 2015年07月06日 10:12

地址:http://www.wokeji.com/guojipindao/dujiabianyi/201507/t20150706_1375513.shtml

编辑:刘欢欢