The Calflora Project: Goals and Achievements

John Malpas, February 2003
 
 
Table of Contents
 

 
Objectives and Vision

The Calflora project is a digital library of information about California native plants. This note is an analysis of various aspects of the project for the benefit of other groups wishing to make a similar library of regional plant information, and for the future management of Calflora itself.

Paraphrased from the bylaws of The Calflora Database (a non-profit organization set up in 1999 to administer the project) here are Calflora's primary objectives:

  • to serve as a repository for information on California wild plants in electronic formats from diverse sources, including public agencies, academic institutions, private organizations, and individuals;
  • to provide this information in readily usable, electronic formats for scientific, conservation, and educational purposes;
  • to serve public information needs related to scientific study, land management, environmental analysis, education, and appreciation of California plant life;
  • to coordinate and integrate efforts towards these objectives undertaken by scientists, public agencies, private organizations, and members of the public.
To expand on the last two points, the project sees itself as responsible to the larger botanical community interested in California native plants. This responsibility manifests as a commitment to provide the highest possible quality of scientific information, with full source attribution, and to involve interested professional botanists in setting policy (eg. serving on the Advisory Board) or quality control (eg. reviewing photographic plant identification). Together, these objectives elaborate the vision of the project.

Fulfilling this vision immediately brings up three practical questions:

  • What specific information will the project collect and disseminate?
  • How will that information be acquired and maintained? and
  • How will that information be made available?
The way that each of these questions has been answered by the project to date is discussed below.
 

What Information is Collected

Plant data is collected by Calflora into four interconnected components.

    Species
    The species database contains records for each of 7660 plant taxa (8363 records including species level taxa where there is more than one var. or ssp.). Each record includes summary geographic and ecological distribution information. See Appendix I The Calflora Species Database for a description of how this list of plants was compiled.

    Occurrence
    The occurrence database includes over 800,000 observations of plants at locations within California. By integrating data from public agencies, herbaria, private organizations, and individuals, the database enables users to aggregate data in a variety of ways (eg. across taxonomic groups, geographical areas of interest, or type of observation). See Appendix II About the Occurrence Database for a description of how this database has been collected and maintained.

    Because the records come from diverse sources, Calflora has chosen to add metadata to each record so that datasets within the database can be differentiated. For instance, there are fields present describing the source, observation type (eg. plotlist or undirected search), and documentation type (eg. specimen or reported). For an example of how such metadata can be used and interpreted, see Appendix III Using Calflora's Composite Data for Estimating County-level Species Richness. For an explanation of data quality standards with respect to various kinds of occurrence data, see Appendix IV Why not stick to specimens?.

    Nomenclature
    The nomenclature database describes relationships between plant names throughout the history of botany in California. Suppose a user wants to search the occurrence database for all records concerning a certain species. With the aid of this database, it is possible to search not only for all records referring to the plant by its current taxonomic name, but also for records referring to the plant by any other taxonomic name it might have been known by historically. [need more]

    Photography
    Calflora has collaborated closely with the CalPhotos library of plant images managed by UC Berkeley Digital Library Project. Calflora has made a focused effort to build this photo collection to represent all California species. There are currently about 30,000 images of California plants in the database.

How Information is Acquired and Maintained

    Data Contributors
    The species and nomenclature databases both represent the work of a small number of people, and are relatively static. The occurrence and photograph databases, on the other hand, represent in their current form the active collaboration of a large number of partners-- both institutions and individuals. A lesson of the Calflora experience to date is that such collaboration must be actively solicited. Some large instituations agreed to share only after they received assurances that 1) the data would be presented with attribution and caveats in a scientifically appropriate way, and 2) the data would be actively maintained. For a discussion of the challenges and issues of assimilating data from diverse institutions, see Appendix V How we get data, Appendix VI Observations on Interagency Data Sharing and Appendix VII Comments on Data Mining.

    Sensitive Information
    Information on the location of rare plants is considered sensitive if the plants might be particularly endangered by making that information public. This is an issue important to many would-be contributors. For a discussion of Calflora's policies with respect to sensitive information, and how they were developed, see Appendix VIII From Calflora Data Policy: Sensitive Information.

    Assimilation and Quality Control
    Calflora assimilates occurrence data from contributors in a rigorous way, consistent with a set of policies. Each record is researched so that metadata fields can be filled in appropriately, and so that field meaning is preserved across the entire occurrence database. The contributor has many opportunities to correct possibly erroneous values, during the assimilation process and after. The role of Calflora in this process is consistent with that of a library: values may be translated (eg. from UTM to latitude/longitude) or standardized, but are never "corrected" without instruction from the contributor. In particular, the taxonomic name on an occurrence record may have minor spelling errors fixed with the consent of the contributor, but otherwise is not changed.

    Calflora's approach to quality control is based on thorough documentation (of each field in the database, and of each contributed dataset) and attribution of data, in a manner parallel with the publication of scientific research.

    With respect to the photograph database, Calflora added a system whereby botanical experts can review the identification of a plant in a photograph made by the contributing photographer, and either confirm or reject it.

    Contributions from Individuals
    For several years, photographers have been able to register with CalPhotos and submit plant photos online. Since August, 2002, individuals have been able to register with Calflora and submit plant observations online. The observations are added to the occurrence database, as if the individual were a contributing institution. See Appendix IX Why does Calflora need your plant observations? for a description of this system.

How Information is Made Available

    Data Available for Download
    The species database and the occurrence database are available for download in a delimited text form. The nomenclature database is not available in this way because of an agreement with its principal author.

    Web Interface
    Calflora has an extensive website which makes information from the databases available in discreet chunks. The presentation of data on the web is designed to be both easy to use and scientifically rigorous.

      Species Information
      On the website, species can be queried by almost any of the fields in the species database. It is particularly easy to query by taxonomic name or common name. The result of a query shows a list of species with a thumbnail photograph of each (if a photograph of the species is available from CalPhotos). The detail page for a particular species shows the history of the species in scientific literature, and a map of California counties showing the distribution of the species. There are links to many other online resources concerning the species.

      Occurrence Information
      Occurrence records can be queried by almost any of the fields in the occurrence database. A query by taxonomic name is routed through the nomenclature database, so that the result includes records referring to the plant by any taxonomic name it might have been known by historically. The result of a query shows a list of occurrences and, if any of those occurrences contain geo-referenced locations, may also include the GIS Viewer applet (also from the UC Berkeley Digital Library Project). The GIS Viewer shows a set of occurrences on a layered map.

Conclusion

The Calflora project has both a scientific aspect and a social aspect. One thing that differentiates Calflora from similar projects is how sucessful it has been in the social domain. By positioning itself as responsible to the larger botanical community, it has successfully attracted the contribution of data from many institutions and people.

In a project like this, effort spent on data maintenance and quality control is critical to building a reputation for reliability within the community. To date, Calflora has probably not devoted enough resources to these activities: they require both staff time and the volunteered time of outside botanical experts (for instance, to review the identification of plant photographs). If Calflora can continue to be responsive to the community in a scientifically responsible way, its reputation for legitimacy and accuracy will grow.


Appendices

Appendix I

The Calflora Species Database

This database contains summary geographic and ecological distribution information for 7660 California vascular plant taxa (8363 records including species level taxa where there is more than one var. or ssp.), as well as additional habitat information for rare taxa and species of the Sierra Nevada. It was originally compiled from two major literature source s, an electronic transcription of distribution and lifeform data from A California Flora and Supplement (Munz 1959 and 1968) and the CNPS Inventory of Rare and Endangered Vascular Plants of California ( Skinner and Pavlick 1994, electronic version) These data were supplemented with information from The Jepson Manual (Hickman 1993). Species summaries now include county distributions compiled from current contents of the Calflora Occurrence Database, with links to the specific observations. Species summaries are illustrated in the online version with images from the UC Berkeley Digital Library Project - CalPhotos California Plants and Habitats collection.

Information Sources

The transcription of distribution information from Munz (1968) was created by Kwei-Lin Lum as part of her thesis work with Peter J. Richerson at U.C. Davis. ( Lum 1975, Richerson and Lum 1980). Lum's database was subsequently modified by Richard Walker in the course of his thesis work at U.C. Santa Barbara ( Walker 1992). For Sierran species, Walker added elevation limits and habitat descriptors, and also replaced Lum's 'many plant communities' entries with specific community lists. The Lum/Walker/Munz6 8 data covered all but about 24 of the species-level taxa included in A California Flora and Supplement. In compiling species summaries, we translated the abbreviated names and coded data of the Lum/Walker/Munz68 database. We then used data from the CNPS Inventory to update distribution and habitat data for 893 species in the Lum/Walker/Munz68 database and to add records for 849 taxa not already included in that database (that is, infraspecific and new taxa). Using the CNPS Inventory data, we also added fields for Rarity and Listing Status as well as a field for matching records in this database with records in the CNPS Inventory. Listing Status data has been updated to match the 1997 CNPS electronic inventory.

The database was subsequently expanded to include all infraspecific taxa recognized as occurring in California and updated to reflect current nomenclatural usage. Adjustments to distributions for taxa affected by changes in taxonomic delineation (lumps and splits) were made using information from the Jepson Manual ( Hickman 1993) to modify existing records. Literature-based distribution data for taxa not previously included in Calflora (mostly infraspecific taxa that are not rare) represent our interpretation of distribution descriptions in the Jepson Manual.

Distribution Information Errors

Error characteristics of the occurrence-based distribution data are unknown. Users should make their own decisions on the reliability of data of different types and from different sources. Documentation and source contact information are provided for each occurrence observation.

For literature -based distribution data, is important to note that the type and rate of errors in geographic distribution are very different for the three major literature sources. Lum was transcribing generalized and sometimes vague range descriptions into concrete presence/absence data for specific county and subcounty regions. For most species, data were not verified with any other source of information and, in addition to coding errors, represent only a best estimate of the ran ge of the species in question. Lum ( 1975) showed that while her data are far from error-free, errors of assigning taxa to regions where they do not occur are about equal in number to errors of not assigning taxa to regions where they actually do occur. Consequently, her data produce unbiased and quite accurate estimates of species numbers but somewhat inaccurate species lists for given regions. The CNPS Inventory data, on the other hand, represent only documented natural occurrences of the taxon in question, are not intended to imply that the taxon does not occur elsewhere, and hence represent a minimum estimate of the full range. CNPS data underestimate number of species present in a given region but have a very low rate of error in assigning taxa to counties where they do not occur. Regions used to describe plant distributions in The Jepson Manual span multiple counties. Consequently interpretation of Jepson Manual descriptions in terms of presence and absence in particular counties and subcounties is indefinite. We have identified all items of distribution data that are based on the Jepson Manual to alert the user to this fact.

For species level taxa, the composite literature-based distribution summaries presented in Calflora can be expected to have error characteristics much like those Lum described. It can be expected to perform well for analyzing broad patterns and general relationships, as demonstrated by both Lum and Walker. It clearly performs less well in generating accurate species lists for particular locations, but not so badly that it is not useful for producing preliminary checklists for a variety of applications. For infraspecific taxa, the composite literature-based distribution summaries combine data that tend to underestimate range (CNPS) with data that overestimate range (Jepson Manual). Error characteristics of the combined data are unknown.

Downloading the Calflora Species Database

The species database is available for download in text form for serious users. For more information, please contact Calflora, describing very briefly how you hope to use the database.

Ann Dennis,
Director, The Calflora Database,
August 1999


Appendix II

About the Occurrence Database

Objectives

  • integrate plant observation data from disparate sources so users can create composite summaries for taxonomic groups or geographic areas of interest
  • provide centralized access to plant occurrence data in a form that can be readily imported for use in analytical and modeling applications
  • provide limited visualization tools for web access to databased information.
Our Collection

We have assembled and continue to build a collection that includes various types of data--herbarium specimen records, species lists compiled by professional botanists for known locations, rare plant occurrences documented by the Natural Diversity Database, checklists for nature preserves, parks and other geographic areas, and plot species lists from sampling projects carried out by land management agencies.

Each type of data has strengths and weaknesses.  Specimen data have high reliability on plant identification, but generally have imprecise or inaccessible location and habitat data.  Plot species lists have lower reliability of plant identification, but generally have high location precision and tend to be a rich source of readily usable habitat and co-occurrence data.  Checklist data, used in conjunction with specimen and plot data, provide a basis for analyzing of patterns of abundance and rarity within the range of a species.

By providing ready access to all these types of occurrence data, we seek to facilitate research on questions related to biodiversity, ecology , and conservation, and help researchers use the full power of their geographic analysis and modeling tools.

Special Features

Name data--We explicitly indicate the relationship of current names to the taxon indicated by the original observer.  The user can see the original name, our interpretation of the original name in terms of current usage, and an indication of whether or not changes in taxonomi c delineation or name usage since the date of the observation affect certainty of assignment of given observations to particular current names.

Documentation--We indicate the type of plant ID documentation provided for individual occurrence observations.  We also assess precision of original date and location data, and label individual observations accordingly.  We recognize that 'high quality' to one user may mean 'precise location' , while to another it may mean 'specimen exists'.  Rather than filtering information before it reaches the user, we label information in a way that allows users to apply filters they consider appropriate.

How We Process Data

Our intent is to provide a core set of data about each occurrence  observation and enough information for the user to select only those records that are suitable for a given type of analysis, directing the user to the data source for additional information that may be available.  We examine source data sets and extract:

  • a plant name and a location
  • basic information about the site if available
  • the source of the observation
  • the id used by the source to identify this observation
We then process that data, adding fields to express:
  • name, location, and observation event characteristics in common formats
  • precision and level of documentation of the observation
We maintain each data set as a coherent and identifiable block that can be replaced or updated at any time.  For each data set, we document source contact information, acquisition and update history, details on steps we took in moving data from original fields and formats into our standard formats, and any information provided by the source on data collection methods.

Our Broader Goals

By providing ready access to plant occurrence data, we seek to facilit ate:

  • Access to range and distribution information for California plants,
  • Analysis of patterns in plant distributions and species diversity,
  • Protection of plant diversity at local scales,
  • Protection of geographic range and genetic diversity of native species, both rare and common.
If you have plant occurrence data you would like to contribute, or would like to participate in the Calflora project, please CONTACT us.

A description of the structure of the data is available here.
Comments and questions are welcome: CONTACT


Appendix III

Using Calflora's Composite Data for Estimating County-level Species Richness

If you page through some distribution maps on Calflora for taxa you are familiar with, you can get an idea of the relative contribution of different kinds of data to the full picture on distribution. Each of the 4 data types we keep track of has a different color on the map. We have it set up so that more verified/verifiable types overwrite the less verified types. The question in this area centers on figuring out how much is added by including less verified data types.

It is apparent that most of the maps are predominantly dark blue ('herbarium specimen'), meaning that there are specimens on file for most counties for most taxa. The second most common color is turquoise blue ('documented', meaning vouchered or expert verified), important in filling in the actual range for undercollected species, especially common trees.

Lavender ('reported') is pretty uncommon on the maps. It just shows up for counties where the taxon is on a plot species list or checklist but has never been collected or verified by one of the sources we classify as 'expert verified'. My evaluation of these is that they're mostly errors. If a taxon is common enough to show up on plots, it's likely to have been collected by someone at some point. Again, the exception is common trees (take a look at ponderosa pine).

Yellow ('literature') is also pretty uncommon on most maps. Where you see it most is for taxa that have very general range descriptions in Munz and Jepson, like 'California Floristic Province'. Yellow is generally the part of CA-FP where the thing ISN'T. Again, if it's rare, its true range is pretty well covered by CNDDB or CNPS (turquoise). If it's not rare, it's likely that someone has collected it (dark blue).

So, if the purpose is to get a best available count of species/county, I would only look at occurrence records with a documentation type of 'specimen' and 'documented'. This will likely give you a fairly low rate of error, and I would expect that error to be fairly well balanced between false positive (misidentified specimen or voucher from county where taxon does not occur) and false negative (no specimen or voucher from county where taxon does occur). I would NOT consider occurrence records with a documentaiton type of 'literature' or 'reported'. Adding them will increase both the error rate and the high-side bias of the data.

If you examine occurrence records with a documentation type of 'specimen' for a particular county, you will likely see data contributed by about 5 herbaria. You could also specify the 'dataset' field in the query, so that you will see only records from a single herbarium.

If you examine occurrence records with a documentation type of 'documented', you will find datasets from CNDDB, Dieter Wilkin's CA County data, and the CNPS Inventory. The Lum dataset, although valuable initially when we didn't have much else, by now adds little to the distribution known from better sources, and what it does add is mostly errors.


Appendix IV

Why not stick to specimens?
Ann Dennis, Calflora

Specimen records are a valuable information source, and have the advantage that names are periodically updated based on reexamination of the original plant material. For this reason, specimen data are the main type of historic occurrence information that has been included in existing distribution databases--errors due to changes in taxonomic nomenclature and delineation are held at a minimum, and there is opportunity for verification of specimen identification.

On the other hand, data from existing specimen collections--mostly herbaria that have served as multi-purpose repositories--have a number of short comings. They provide a scant record of earlier times and they generally lack precise locations and consistent habitat and co-occurrence information. In addition, herbarium collections provide an inconsistent sampling of natural patterns of distribution and abundance. Further, they generally underrepresent common taxa that are difficult to collect or preserve such as large conifers-- taxa that are often of particular interest as community dominants. Most of these shortcomings are an understandable result of taxonomic rather than biogeographic focus in the development of the se collections. Still, we must recognize that these data are insufficient for analyzing patterns and trends in species distribution and abundance , the central issues in biogeography and conservation biology today.

In contrast to herbarium collections, vegetation sampling programs and species inventories produce data that has less spatial bias, higher data density, and far more precise location references. For many important applications, such as evaluating patterns and trends in species richness or modeling environmental relationships, unbiased sampling, balanced error properties, and precise locations are more critical than absolute rate of taxon misidentification. Analyses like these generally employ statistical models that weigh error from different sources. Misidentifications add noise, tending to obscure patterns that may actually be present. Sampling bias, on the other hand, can introduce patterns that are, in fact, not present in nature. In most situations, failing to find a pattern significant is far less serious than 'discovering' a pattern that actually doesn't exist.

Of course, without specimens, there is no opportunity for identification verification. The user is left in the position of examining credentials and methods of the observers, rather than specimens, to evaluate probable error rates and suitability of data for a particular application. We should realize that this approach to data evaluation is standard in most areas of science (e.g. your decision if and how to use the CO2 trend data from the Mauna Loa Observatory would not hinge on examining their air samples).

It is important not to confuse verifiability with low rate of identification and recording errors. We have several years experience in exposing various kinds of data to public scrutiny and criticism. We hear many comments, but as yet nothing to suggest that rate of ID error in herbarium datasets is notably lower than in other types of observation datasets submitted by scientists and professional botanists. We have developed a proposal to study this question further.

Specimens are essential for taxonomic work and provide valuable verification for distributions based on other types of observations. But, we recognize that to serve a broad range of purposes in taxonomy, biogeography, and conservation biology we must provide access to data of many kinds, supplying users with the information they need to evaluate the various quality attributes that determine suitability of particular data for particular applications.


Appendix V

How We Get Data:
Data Holdings

At the beginning of the project period, the Calflora system included 43, 468 occurrence observations of exotic species, on the whole far fewer per species than for natives. Work under this contract was intended to improve representation of invasives and other exotics in our system by identifying new data sources and assisting their owners in preparing datasets for inclusion in Calflora. An additional work plan objective in this area was to seek nomenclature data on exotic species recently discovered in California and additional synonyms for taxa previously known.

For this project, we made contact with the major land management agencies and the agencies and organizations involved in weed research or control in California. We conducted interviews to discover extent and nature of weed photo and data holdings, and made informational presentations to publicize our project and solicit additional leads on data sources. We have now established data exchange relationships with six major institutional entities that are likely to be producing weed information in the future , and have brought most of the major existing weed datasets into our data acquisition stream. We added 53,321 occurrence records for exotic and invasive species, obtained nomenclature data on all exotic taxa now known to be naturalized in California, and expanded our historical nomenclature data.

While bringing existing datasets into Calflora is important, we count our major achievements in terms of new relationships formed that will produce data over years to come and that will bring new participants into a circle of open data exchange. Searches conducted under this project confirm that this view is particularly apt for weed data: weed occurrence data resources are, on the whole, not well developed at present. In the past, weed-related efforts have generally been focused on direct control activities rather than surveys and monitoring, the later being the primary sour ce of occurrence data. New weed control funding initiatives and awareness of the need for strategic allocation of effort holds promise for increased data gathering in the next few years. Most of the new weed data we added during the contract period came to us from general floristic datasets, not weed-focused sources.

    Data Acquisition Steps
    1.Identify potential contributor
    2.Develop mutual understanding of data-related activities, motivations, and institutional constraints; identify or develop specific mutual objectives
    3.Acquire sample data set, and show contributor how their data would be prepared for and presented in the Calflora system
    4.Answer contributors questions about policy and technical matters, help contributor become comfortable with open data sharing concept
    5.Wait for contributor to consult with other concerned parties
    6.Develop data sharing agreement
    7.Acquire full dataset, prepare and document, show results to contributor
    8.Get contributors go-ahead to incorporate prepared dataset into online system

Appendix VI

Observations on Interagency Data Sharing

from: July 2000. Calflora: A test of NBII Biological Occurrences Data Management Strategies. Final Report to the National Biological Information Infrastructure Program. Ann Dennis, Ph.D. and Tony Morosco, The Calflora Database.

Calflora has now been serving as the web clearinghouse for California vascular plant information for a number of years. From a single online entry point, it provides simultaneous access to plant occurrence data holding s of many public agencies, academic institutions, and private contributors-holdings that span almost 150 years of plant observations. Users' queries search a library of occurrence datasets and return complete sets of relevant records in spite of disparities in structure, formats, and nomenclatural usage among the original datasets. System function rests on uniformly structured metadata and nomenclatural synonymy tables. With this emphasis on metadata and synonymy, Calflora is, in effect, a regional-scale test of the overarching NBII concept.

Our experience confirms the validity of the NBII concept, as well as the existence of an enormous and diverse audience for biological observations data. However, our experience does not support an expectation that owner-submitted metadata or adoption of data format standards will be a primary path to real-time centralized access to biological observation data. We present the following observations.

    1. The larger the institution and the richer the data holdings, the less the motivation for external data sharing. Problems with internal data management fully absorb available attention and resources.

    2. The greater the historical depth of an institution's data holdings, the less the motivation for conformity to current data standards and nomenclature. From the owner's point of view, changes in data standards and nomenclature detract from usability of long-term datasets and represent a cost without a corresponding benefit.

    3. Owner-reported metadata does not adequately disclose species occurrence data. Many of the richest sources of plant occurrence data are projects in which these observations were incidental to some other purpose (witness-tree information from 19th century land surveys is a classic example ). Species occurrence data are often unmentioned or inadequately described by owners whose primary interest is in other data elements.

    4. Data owners often have serious concerns about improper use or interpretation of their data, and almost always see flaws in their data that the y urgently want to bring to the attention of end users.

    5. Nomenclatural usage and synonymy must be addressed regionally. Relationships between current and past usage are key to accessing past observation data. These relationships differ substantially in different places. A single national synonymy provides an invaluable unifying framework, but is not, on its own, adequate for mediating queries to datasets with historical depth (see 1 and 2 above).

These observations lead us to the conclusion that a successful strategy must give adequate attention to the sociological aspects of data sharing-- the strong forces that legitimately lead institutions with large data holdings to face inward rather than outward, and the legitimate reservations individual data owners have towards anonymous data sharing. Our experience suggests that it is unrealistic to expect institutions to put scarce resources into sharing or standardizing data to conform to external needs or standards. It is also unrealistic to expect institutional owners to be aware of other possible uses of data developed for particular internal purposes, or to readily embrace an ethic of open data access. We have addressed this lay of the land in the following ways.

    1. We take primary responsibility for data discovery and metadata documentation. We develop documentation in consultation with the data owners, but minimize need for expense and exertion on their part.

    2. We take responsibility for data standardization, while scrupulously retaining connection to unmodified original data.

    3. We provide data contributors with a means of conveying caveats to end users of individual observations.

    4. We reward data contributors with recognition and control, and give them genuine opportunities to participate in formation of data sharing and presentation policies.

As we see it, the NBII strategy has not adequately addressed the sociological aspects of data sharing. However, the direction NBII is taking in developing regional and thematic nodes is an important step in addressing some of the problems we are pointing out. As we see it, regional scale and thematic focus have been key to our success. At this scale, we can form the personal and professional relationships that are key to developing trust and motivation on the part of data contributors and responsiveness on our part to user needs.


Appendix VII

Comments on Data Mining
Ann Dennis, November 2002

Facilitating data-mining is, essentially, our primary purpose: we give online users easy access to many repositories of original biological data so they can either download it to their own applications or view it using our online tools. The challenge in this kind of work is the fact that data owners have legitimate fears about plagiarism, inappropriate use, and, most of all, lack of recognition and payment. The existence of tools for automated data harvest makes our work harder, not easier, because they reinforce those fears.

The path we're taking is to promote mechanisms and ethical standards, parallel to ones we're familiar with in print media, to protect authors and users of online data. While that's coming along, we try to keep data owners in the game by working out good-enough solutions to their worries, and giving them plenty of opportunity to control what happens to their data in our system. At a minimum, we force users to recognize that individual records have owners, authors, methods, and levels of precision and documentation. We do that by making sure those fields always show up in our online displays and user-generated download files.

Biological observations data are useful, right now, to a broad range of people. Use of in-house GIS tools and predictive modeling is surprisingly widespread among county planners and HCP/NCCP participants, agency biologists, land managers, weed control groups, students, as well as scientist s. While summaries, interpretations, or model output have many importantapplications, what these users need is original data. In order for original data to be used in a way that is both scientifically sound and respectful of data producers, it must be accompanied by appropriate documentation and usage permission.


Appendix VIII

From Calflora Data Policy: Sensitive Information
Calflora's mission is to be an online library of plant information, and by providing this information, to promote stewardship and conservation of California plants. In this, we are guided by the Code of Ethics of the American Library Association regarding freedom of access, resistance to censorship, and right to privacy, as well as by our own policy of non-discrimination.

In dealing with location information for plants that are subject to vandalism and illegal collection, we find that we must balance competing ethical principles. Our reasoning on these issues is as follows:

    1. As a general rule, information we have is to be available to the public unless we have been directed by the data contributor to withhold it.

    2. We support the right of data contributors to decide at what level of precision their data will be displayed, either by giving us data at that level of precision or by giving us specific instructions. We will vary from this policy only after due consideration of case-specific evidence that such action is necessary to protect a species or particular population s from serious damage..

    3. We support the right of users to privacy with respect to information sought or received. We will not reveal usage records that might be used to identify persons receiving information on species subject to vandalism unless required to do so by a court of law. However, we will inform users via web page display that records do exist that could in some cases be used for that purpose.

    4. We firmly adhere to a policy of non-discrimination in provision of information services. Consequently, we will not use selective permission based on professional credentials, presumed motives, or other screening criteria as a means of controlling access to sensitive information.

    5. In some cases, our display of location information at the level of precision requested by the data contributor could contribute to vandalism or illegal collecting of sensitive species. In such cases, we will reduce the level of precision we display through the following procedure: " A subcommittee of the Calflora Advisory Board will be established to seek out information on species affected by vandalism and illegal collecting, and on the role of Calflora's information display characteristics in incidents of illegal destruction. This committee will also accept and review submitted requests for suppression of information about particular taxa. " The committee will review and decide on the merits of proposals to suppress location information for particular taxa. Such proposals must be supported by identification of specific threats to that taxon or its habitat, and must be supported by justification for the position that a change in Calflora's display will materially reduce those threats. " The committee will submit its findings to the Calflora Board of Directors, who will, upon approval, give appropriate instructions to Calflora staff. The subcommittee will meet at least annually to review new information and proposals, and to reevaluate past findings. Restrictions will be removed after 3 years unless confirmed by the subcommittee.


Note on Access to Precise Location Data for Rare Plants

This is a complex topic. We want to build a base of knowledge that allows us to better protect rare plants. We want to bring information to the public in ways that excite the imagination and build the numbers of people, young and old, who know about rare plants and value them as their own natural heritage. We want to put readily usable information into the hands of people who are using GIS tools to plan proactively for species protection at various scales throughout the state. We want college students to have the information at their fingertips in a time frame that all ows them to do term projects on the ecology of rare species. On the other hand we want to make sure that our actions do not unnecessarily contribute to vandalism or destruction of vulnerable species.

The Calflora Advisory Board held two meetings in 2001 addressing this topic. One thing that was clear, beyond the fact that this is a hot-button emotional issue, was that we really lack information on the extent and nature of vandalism, destruction of plants for the purpose of evading regulation, and destructive collecting. I feel that the best way to proceed is to get more information about the extent and nature of these problems. Then, all parties interested in taking action on this matter should meet (not just people concerned with databases). Our analysis should focus on the problems: What would be the most effective steps we could take to reduce vandalism? What about horticultural collection? What about landowners fearful of regulation?

Before removing information from publicly available sources, we must assure ourselves that this is actually a positive element in a broader course of action, not simply a least-effort alternative to doing nothing. We must also be able to assure ourselves that information suppression is critically needed to prevent irreparable harm, and that benefits of suppression substantially outweigh the benefits of having this information available. These steps are required by our most basic ethical principles.


Appendix IX

Why does Calflora need your plant observations?

Scientists and plant enthusiasts have a lot to learn about where plants grow in California. Contrary to popular opinion, relatively little is known about the detailed distribution of California plants. The average species distribution in Calflora is based on fewer than 100 observations!

Development is rapid across the state. Scientists and agencies are not well equipped to predict how California plants will survive environmental change. Scientists have seen a decline in biodiversity, not only in the tropics, but locally as well. Stories of the amazing wildflower dispays that used to be seen are common. And many new weed species are silently invading our parks, roadsides and wild areas almost unnoticed.

We want to gather information on California plant populations and YOU can help us. Everyone's contribution is important. We are seeking information on ALL plants that grow wild in California, both native and non-native, common and rare species. Every additional piece of information helps us make better decisions and understand our plants. There are many distribution gaps and old information in the Calflora Library that you can update!

It doesn't matter whether you submit just 5 observations for the weeds you saw this morning driving to work, make a special trip to visit a rare plant location, teach your child to identify a local california poppy or live oak, or if make a list of hundreds of native and non-native plants while hiking in a wilderness. You can use Calflora as a tool to store your observations and share them with friends and colleagues. All of your observations are important.

Your observations can help us answer many questions:

* How far north does California Sun Cup grow?

* Is Desert Sand Verbena still growing in Los Angeles county?
           ;    
(The last observation was recorded in 1935)

* Does Five-finger Fern grow in Ventura county?
           ;   
(There are no direct observations, but it has been reported from surrounding counties)

* Does anyone know about that new patch of invasive Artichoke Thistle growing on my local hillside?
           ;   
 (Early alerts to new infestations while they are small are easier to eradicate than well established ones)

* What local biodiversity will be lost if the city decides to allow that new housing development on the edge of town?
           ;    
 (You can record the plants that are growing there now as a record for history)

The data that you collect will be combined with existing information from many different sources to give us a better picture of our native and introduced flora. Each new observation enriches the collection and helps the world make better decisions. Calflora information is used by botanists, land managers, conservationists, state and federal agencies to make decisions on conservation and development. So as we see patterns, discover new questions and insights, we'll be better able to preserve the riches of the California flora. We are committed to building and improving tools that allow you to contribute knowledge, and adding expert review of contributions as resources allow. We hope that you will participate with us and contribute your observations and support.

We encourage you to CONTACT us with questions and comments.

You can Register to become a Calflora User and Contributor.

Once you've registered, submit your plant Observations.