Significantly reducing the processing times of high speed photometry data sets using a distributed computing model

Paul Doyle, Fred Mtenzi, Niall Smith, Adrian Collins, Brendan O'Shea

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.

Original languageEnglish
Title of host publicationSoftware and Cyberinfrastructure for Astronomy II
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventSoftware and Cyberinfrastructure for Astronomy II - Amsterdam, Netherlands
Duration: 1 Jul 20124 Jul 2012

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume8451
ISSN (Print)0277-786X

Conference

ConferenceSoftware and Cyberinfrastructure for Astronomy II
Country/TerritoryNetherlands
CityAmsterdam
Period1/07/124/07/12

Keywords

  • Cloud computing
  • Distributed computing
  • High-speed photometry

Fingerprint

Dive into the research topics of 'Significantly reducing the processing times of high speed photometry data sets using a distributed computing model'. Together they form a unique fingerprint.

Cite this