MEMORANDO TÉCNICO
_____________________________________________________________________________________________________
BDAgro – CTBE Database of Agricultural
Experiments
Angélica O. Pontes, Liu Yi Ling, Guilherme M. Sanches, Paulo S. G. Magalhães,
João E. Ferreira and Carlos E. Driemeier
Laboratório Nacional de Ciência e Tecnologia do Bioetanol – CTBE
Centro Nacional de Pesquisa em Energia e Materiais – CNPEM
Campinas, November 2014
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
1 BDAgro – CTBE Database of Agricultural Experiments
Summary
1. Introduction .......................................................................................... 2
2. Standard tools for software development ............................................. 2
3. Conceptual data model ........................................................................ 3
4. Logical data model ............................................................................... 5
5. Physical data model ............................................................................. 6
6. Examples of query and table ................................................................ 6
7. Analytical module ................................................................................. 7
8. Conclusion ........................................................................................... 8
9. References .......................................................................................... 9
Figures
Figure 1: Conceptual data model of the CTBE Database of Agricultural Experiments4
Figure 2: Logical data model of the CTBE Database of Agricultural Experiments ...... 5
Figure 3: List of events from one selected experiment, as retrieved through the SQL
query ............................................................................................................. 6
Figure 4: Table of types of attributes specifying the applied linearization and filtering
functions, as visualized in pgAdmin ........................................................................... 7
Figure 5: Table of data preparation for analysis, as visualized in pgAdmin. The table
contains three data states, raw (bruto), linearized (linearizado), and filtered (filtrado),
for all types of attributes ............................................................................................ 8
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
2 BDAgro – CTBE Database of Agricultural Experiments
1. Introduction
Contemporary scientific and technologic research is evolving to become
increasingly data-intensive and collaborative. Proper computing capabilities for data
acquisition, storage, sharing, modelling, and analysis are pivotal in this novel
research perspective (HEY & TREFETHEN, 2003; TANSLEY & TOLLE, 2009). The
pervasive role of computing is observed across virtually all disciplines. In particular,
this is the case of agricultural experiments aiming at enhancing biomass production
in an environmentally benign way. Such agricultural experiments are within the scope
of the Brazilian Bioethanol Science and Technology Laboratory – CTBE. With this
motivation, we developed the CTBE Database of Agricultural Experiments (Banco de
Dados de Experimentos Agrícolas – BDAgro), which is described in the present
Technical Memorandum.
BDAgro was developed with the following specific aims. (i) to store data of
CTBE agricultural experiments in structured form, assuring long-term data
readability; (ii) to enable statistical analysis and knowledge discovery integrated to
the database; and (iii) to pave the way for data-driven collaboration with other
research groups in Brazil and abroad.
BDAgro was a joint development of CTBE e-Science and sugarcane
precision agriculture research groups, with support from the experience of the e-
Science group from the Math and Statistics Institute from the University of São Paulo
(IME-USP). Although database development and initial data sets were associated to
research in sugarcane precision agriculture, BDAgro was modelled with the aim of
supporting all CTBE agricultural experiments.
2. Standard tools for software development
BDAgro as well as other databases and software developed within the CTBE
e-Science group will preferentially follow selected tools. Free, open-source platforms
will be always preferred and adopted whenever possible.
Concretely, BDAgro was developed with basis on the following tools:
PostgreeSQL as relational database management system;
pgAdmin as database administration and development platform;
R programming language for statistical computing integrated to the database;
Python as auxiliary programming language employed primarily to create SQL
scripts to input data into the database.
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
3 BDAgro – CTBE Database of Agricultural Experiments
In addition to these selected tools, it is important to mention that the glossary
of BDAgro is in Portuguese language. Furthermore, BDAgro was developed in IME-
USP computer server and will be soon migrated to a local CNPEM server.
3. Conceptual data model
The BDAgro conceptual model, i.e. entity-relationship model, (ELMASRI &
NAVATHE, 2010) is shown in Figure 1. The model comprises the following entities:
Project (projeto): One project is defined by the contract terms with a research
granting agency (CNPq, FAPESP, etc.) or a company, or an internal project
of CTBE/CNPEM.
Experiment (experimento): One experiment is defined by a certain land area
during a certain period of time. Land area is most often an open agricultural
field, but may also be inside close environments such as greenhouses.
Event (evento): One event is an important fact within one experiment. Events
may be of three types: (i) intervention, associated with change in
experimental land area (e.g., harvest); (ii) characterization, associated with
data acquisition without change in land area (e.g., characterization of soil
granulometry); and (iii) planning, representing a record associated with
neither physical change in land area nor new data acquisition (e.g., nutrient
application recipes).
Person (pessoa): One person is an individual that may be responsible for a
project, an event, or the data of an event.
Static data (dado estático): Data generated by events are termed static
because events are defined at specific moments within one experiment.
Static data has x and y spatial coordinates as attributes. Additional attributes
depend on the type of static data, with each type of static data stored in a
dedicated table. Soli granulometry (granulometria solo), soil apparent
electrical conductivity (condutividade elétrica aparente), and harvest yield
(produtividade) are examples of static data types.
Dynamic data (dado dinâmico): Data acquired continuously during the course
of one experiment is termed dynamic data. Date is one attribute of dynamic
data. Additional attributes depend on type of dynamic data. Meteorological
information is one example of dynamic data.
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
4 BDAgro – CTBE Database of Agricultural Experiments
Figure 1: Conceptual data model of the CTBE Database of Agricultural Experiments
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
5 BDAgro – CTBE Database of Agricultural Experiments
4. Logical data model
The logical data model of BDAgro is presented in Figure 2, representing data
tables, their attributes, and relationships.
Figure 2: Logical data model of the CTBE Database of Agricultural Experiments
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
6 BDAgro – CTBE Database of Agricultural Experiments
5. Physical data model
The physical model is encoded in the SQL script that created BDAgro. The
script is available at the following address accessible to CNPEM personnel. The
script will be provided to external researchers upon request.
Central de Documentos > Central de Documentos > Programa de Pesquisa Básica > e-Science > BDAgro >
Memorando Tecnico > BDAgro_141002.sql
6. Examples of query and table
Information retrieval from BDAgro is obtained through explicit SQL queries.
As an example, the following script shows the table of events from one selected
experiment.
SELECT id_evento, E.data, AT.descricao AS atividade, RD.nome_pessoa AS responsavel_dado, RC.nome_pessoa AS responsavel_campo, OB.descricao AS objeto, TE.descricao AS evento, TS.descricao AS laboratorio_sensor, EX.nome_experimento AS experimento, DB.descricao AS tp_modo_aquisicao_dado, detalhamento_evento FROM evento E INNER JOIN tp_atividade AT ON AT.id_tp_atividade = E.id_tp_atividade INNER JOIN pessoa RD ON RD.id_pessoa = E.id_responsavel_dado INNER JOIN pessoa RC ON RC.id_pessoa = E.id_responsavel_campo INNER JOIN tp_objeto OB ON OB.id_tp_objeto = E.id_tp_objeto INNER JOIN tp_evento TE ON TE.id_tp_evento = E.id_tp_evento
LEFT OUTER JOIN tp_laboratorio_sensor TS ON TS.id_tp_laboratorio_sensor = E.id_tp_laboratorio_sensor
INNER JOIN experimento EX ON EX.id_experimento = E.id_experimento LEFT OUTER JOIN tp_modo_aquisicao_dado DB ON DB.id_tp_modo_aquisicao_dado = E.id_tp_modo_aquisicao_dado
order by id_evento;
Figure 3: List of events from one selected experiment, as retrieved through the SQL query
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
7 BDAgro – CTBE Database of Agricultural Experiments
7. Analytical module
Data analysis is performed through analytical steps recorded in tables distinct
from those of raw data. This strategy creates a separate analytical environment
within BDAgro. This analytical environment is not represented in the conceptual and
logical data models of Figures 1 and 2, respectively. Two tables associated with
analysis of static data are shown in Figures 4 and 5. The table in Figure 4
(analise_tp_atributo) delineates types of attributes, which hold primary keys
(id_tp_atributo). This table also identifies the functions for linearization (column
linearizacao) and filtering (column filtro) applied to raw data. These functions create
two data states, linearized (linearizado) and filtered (filtrado), in addition to the raw
(bruto) data state.
Each data state is recorded as one attribute of the data preparation table
(analise_preparacao_pontosxy) shown in Figure 5. Note that this data preparation
step maps raw data from distinct tables and attributes into a single column (bruto) of
the data preparation table. Such data preparation tables will be the base for
construction of data analysis workflows. In a recent publication (DRIEMEIER et al,
2014) we described a version of the analysis workflow for experiments in sugarcane
precision agriculture.
Figure 4: Table of types of attributes specifying the applied linearization and filtering
functions, as visualized in pgAdmin
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
8 BDAgro – CTBE Database of Agricultural Experiments
8. Conclusion
CTBE Database of Agricultural Experiments (BDAgro) was created and it is
described in the present Technical Memorandum. This database will store data of
CTBE agricultural experiments in structured form, enabling long-term data
readability, statistical analysis integrated to the database, and data-driven
collaboration with other research groups.
Figure 5: Table of data preparation for analysis, as visualized in pgAdmin. The table contains
three data states, raw (bruto), linearized (linearizado), and filtered (filtrado), for all types of
attributes
O CTBE integra o CNPEM, Organização Social qualificada pelo Ministério da Ciência, Tecnologia e Inovação (MCTI)
9 BDAgro – CTBE Database of Agricultural Experiments
9. References
DRIEMEIER, C. E.; LING, L. Y.; PONTES, A. O.; SANCHES, G. M.; FRANCO, H. C.
J.; MAGALHÃES, P. S. G.; FERREIRA, J. E. Data analysis workflow for experiments
in sugarcane precision agriculture. In: E-Science (e-Science) IEEE 10th Int. Conf. São
Paulo, 2014.
ELMASRI, R.; NAVATHE, S. B. Fundamentals of Database Systems. 6. ed. Addison-
Wesley, 2010.
FERREIRA, J. E.; FINGER, M. Controle de concorrência e distribuição de dados: a
teoria clássica, suas limitações e extensões modernas. XII Escola de Computação,
IME-USP, 2000.
HEUSER, C. A. Projeto de Banco de Dados. 6. ed. Bookman, 2008.
HEY, A. J. G.; TREFETHEN, A. E. The data deluge: An e-science perspective. 2003.
TANSLEY, S.; TOLLE, K. M. (Ed.). The fourth paradigm: data-intensive scientific
discovery, 2009.