While we try to get to grips with the ever-increasing “Big Data”
deluge we recognize that adequate Web services are a key prerequisite
for ubiquitous, flexible, and fast data access. In a massive
concertation effort several large European initiatives have teamed up
now to address the service challenge. From 12 through 13 November 2015
an inaugural EUDAT Workshop on services for Big Data held successfully
at the Supercomputing Center in Barcelona, Spain. Representatives of
three decisive Big Data projects - EUDAT, EarthServer, and EPOS - have
come together to discuss innovative alternatives for value-adding
services.
To consolidate activities around these specific themes the workshop
was divided in several tracks focusing on the topics of Big Data
semantics, federated Data Mining, and multi-dimensional Array
Databases for large time series. Discussions started by capturing best
practices and discussing the current state of development and
activities in the respective areas. Questions like: How can data
processing be orchestrated optimally or how can scientific workflows
make use of EUDAT services were discussed intensively in different
working groups.
Peter Wittenburg, scientific coordinator of the EUDAT Data
Infrastructure, convened a critical variety of expertise from Europe
and the US. Especially the topic of multidimensional arrays was
focused by the experts because of playing a major role in scientific
and engineering data. In a summary Mark van de Sanden, EUDAT
Workpackage Leader, and Peter Baumann, workshop facilitator of the
EUDAT Array Database track, pointed out possible roles of EUDAT in the
future:
- Iaas service provider: providing a cloud infrastructure to run Array Databases
- SaaS service provider: providing an Array Database as an
domain-independent, horizontal service
- Providing tools for easy data movement between EUDAT DCI domain and
User domain
- Providing domain services (e.g., geo, astro, life sciences) based on
a common horizontal platform of array services, thereby leveraging
cross-community effects
Peter Baumann resumed his experiences of running large-scale
infrastructures in his presentation:
“Of course multidimensional arrays do not stand alone, they are
intertwined with other data types, but typically they constitute the
“Big Data” part. Therefore, it makes sense to integrate arrays into
common data management platforms.“ The flexibility of querying data,
achieving data independency, scalability and standards conformance are
critical advantages of Array Database technologies. Among the
challenges spotted were integration of heterogeneous data types,
including arrays, into a single common information space for users.
Array intensive domains like the Earth-, Space- and Life Sciences were
considered as possible candidates of future EUDAT services.
The following presenters contributed their expertise to the Array
Database track:
- Peter Baumann (Workshop Facilitator, Array Database expert) - Jacobs
University Bremen, Germany
- Kwo-Sen Kuo (Array Database expert) - NASA collaborator, US
- Stefan Pröll (Data Citation expert) - SBA Research, Austria
- Simone Mantovani (Atmospheric Analysis expert) - MEEO s.r.l., Italia
- Alessandro Spinuso (Seismology expert) - KNMI, Netherlands
- Luca Trani (Seismology expert) - KNMI, Netherlands
- Thomas Zastrow (expert for Data Analysis in the Humanities) - Max
Planck Gesellschaft, Rechenzentrum Garching, Germany
- Mark van de Sanden (EUDAT Workpackage Leader) - SURFsara, Netherlands
The European Data Infrastructure EUDAT aims to contribute to the
production of a Collaborative Data Infrastructure (CDI). The project´s
target is to provide a pan-European solution to the challenge of data
proliferation in Europe's scientific and research communities.
Increasing complexity and massive growth of data has outpaced the
development of tools to deal with it.
Corresponding to this challenge the intercontinental initiative
EarthServer aims for unleashing the potential of Big Data through a
disruptive paradigm shift in service technology. EarthServer has
established open ad-hoc analytics on massive Earth Science data, based
on and extending leading-edge Array Database technology, rasdaman. Now
the participating data centers are extending this to a Petabyte of 3-D
and 4-D datacubes. Technology advance will allow real-time scaling of
such Petabyte cubes, and intercontinental fusion.
The European Plate Observing System EPOS contributes by planning a
research infrastructure for European Solid Earth science, integrating
existing research infrastructures to enable innovative
multidisciplinary research, recently prioritized by the European
Strategy Forum on Research Infrastructures ESFRI for implementation.
--
Heike Hoenig
thinkdialog!