IBM, SAP open big data platforms for citizen science

The Guardian US/UK | January 27, 2014

Ant in Amazon Rainforest

Ectatomma tuberculatum, an ant species that lives in the Amazon Rainforest

Sujeevan Ratnasingham is on a race to identify all living species on earth. With the tally anywhere between 10 million to 100 million – and one-third estimated to become extinct by the next century – it’s a Herculean task in the least.

But undiscovered species are just as likely to be found in one’s backyard as the Amazon rainforest. So it’s no surprise that in this age of crowdsourcing and citizen science, the bioinformatics expert and his colleagues at the International Barcode of Life (iBOL), a consortium of universities, natural history museums and research institutes, are asking people around the world to gather samples. Then, back in their labs, scientists can identify the species by sequencing a section of its DNA (a procedure known as barcoding).

With hundreds of millions of records to analyze – and even more data per record poised come in over the next year, iBOL decided to host its database on HANA, SAP’s enterprise platform that makes data available in a computer’s memory. The switch will allow researchers and citizen scientists to quickly analyze the huge volumes of data in the cloud.

By merging their records with other datasets such as weather, researchers can conduct predictive analyses that can reveal patterns between species and location. The results can provide clues into how outside forces – from invasive species to climate change – are affecting the environment, and suggest how to manage wild land and agricultural land more sustainably.

IBM, too, has been working on a platform to support crowdsourced citizen scientist data. Its research lab in São Paulo, Brazil, developed a portal and mobile app as a way to gain more knowledge about biodiversity in the Amazon rainforest. Users of all ages and educational backgrounds will be able to collect data points and identify species.

Sergio Borger, an IBM team lead in São Paulo, devised the crowdsourced approach when Brazil’s Ministry for Environment and Innovation approached the company in 2010. They were looking for a way to create a central repository for the rainforest data.

Borger and his team developed a platform and mobile app that allowed users to upload photos of a plant species and its components, enter its characteristics (such as color and size), compare it against a catalog photo and classify it. The classification results are juried by crowdsourced ratings.

Titled Missions, the platform will enable multiple users to collect data and monitor conditions on the same plant or tree over time through uniquely identifying characteristics such as the diameter of a tree trunk. Borger’s team is currently working through how to handle monitoring of more mobile species, such as frogs and insects.

Borger used the knowledge gained in IBM’s first experiment with gathering crowdsourced data as leverage for the new platform. Developed in partnership with California’s state water agency, the Creekwatch app enables citizens to help the government monitor drought conditions in local watersheds via uploading photos and evaluating water levels, flow rate and amount of trash present.

The company has also developed Accessible Way, an app allowing citizens to report accessibility problems in the urban environment.

The beginning of a new trend?

Could the forays by IBM and SAP signal a larger trend of IT companies opening up their platforms to crowdsourced citizen science projects?

Since the excitement and interest in big data dawned a few years ago, startup Kaggle has helped companies, organizations and researchers gain insight from their data by holding crowdsourced predictive analysis contests, while Crowdflower, another startup, has provided the service of generating the “crowd” itself. Although both Microsoft and Google have engaged in data-related conservation projects, large IT companies have mostly shied away from crowdsourced citizen science, for what could be a seemingly obvious reason, one scientist says.

“Citizen science is not the most lucrative [venture],” said Dawn Wright, academic oceanographer and chief scientist for Esri, the Redlands, California-based company behind the GIS (geographic information system) mapping platform ArcGIS.

Yet despite this disincentive, Wright says she’s seen an increased interest from industry over the past two years, despite its rise in the academic community in the past five.

Benefits to business

But while devoting resources to citizen science projects can be viewed as part of a company’s goodwill efforts, are there other business benefits to be gained?

After all, independent efforts by local groups are already underway. In the San Francisco Bay Area, Nerds for Nature has organized several ‘bioblitz’ events where volunteers document biodiversity using the iNaturalist smartphone app. They’re also working with a small biotech company and hackerspaces to perform DNA barcoding independently.

SAP says it’s not trying to sell HANA for those wanting to analyze iBOL’s biodiversity database. For these users, access will be given to the data platform at no cost.

“This is not an effort to sell our products,” said SAP’s David Jonker, head of big data marketing. “We’re passionate about using our technology for good in this world and applying it to citizen science.”

Still, Mike Gualtieri, an IT industry analyst for Forrester Research, says that there are reasons why large IT companies might be interested in making their products available for free to a non-enterprise audience.

Gualtieri says that the rise of Hadoop – an open source system enabling storage, processing and quick analysis of big data – has disrupted these companies’ core products such as databases, data analytics and data warehousing.

Although Hadoop will not necessarily replace the larger vendors’ technology, Gualtieri says they will have to work with Hadoop.

“They see a threat, so they figure they better get it out there and let people use it,” he said. “By making them available, they’re building awareness among the average user.”

As a result, Gualtieri expects to see more of these large IT companies use their platforms for more crowdsourced citizen scientist data analysis projects in the future.

Commercial applications

In five to 10 years, SAP says the public will have the ability to identify species on the spot, thanks to a DNA barcoding mobile app it’s working on with the International Barcode of Life. While the technology is being developed in part for the citizen science biodiversity project, Jonker says that the technology can be used in a commercial context.

There does appear to be a demand for it, if recent food mislabeling scandals – from horsemeat masquerading as beef, to fox meat sold as donkey meat, and mislabeled fish are any indication. Shopkeepers would be able to verify products by identifying a sample on the spot via DNA barcoding.

SAP is already in talks to commercialize the product with a few partners. In the meantime, the company will release an app enabling anyone to contribute samples to the International Barcode of Life project through uploading a photo (with location metadata) and mailing in a sample for analysis. The app is scheduled for launch in late March.

View the original story here.

Photo of Ectatomma tuberculatum, an ant species that lives in the Amazon Rainforest, by Alex Wild via Wikimedia Commons