
The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The project is being conducted in three phases: A pilot project phase, a technology development phase and a planned production phase.
Information about the ENCODE Project, consortium membership, data release policies, and additional background can be found by visiting the ENCODE Project page.
topData generated by members of the ENCODE Consortium is housed in a number of public databases, such as the UCSC Genome Browser and NCBI’s Gene Expression Omnibus (GEO). Since issuing queries to these databases is often not intuitive, the ENCODEdb portal was developed to allow biologists to more easily query and retrieve data generated by the ENCODE Consortium. The ENCODEdb portal provides users a single, unified point-of-access to data generated by the ENCODEdb Consortium, regardless of which public database the primary data is housed in.
topThe ENCODEdb portal currently provides unified access to the following resources:
More information about NCBI GEO data types can be found in ENCODEdb’s Summary of GEO Terminology.
topNot choosing a selection from a pull-down menu will simply return all possible values for that particular field.
Making a selection will help narrow down the results returned to those that are actually of interest to the user. Making a selection will also narrow down the choices in any remaining pull-down menus; this is done to prevent combinations that either produce no result or are contradictory. Any selections made may also change any data ranges that are displayed under the Value options, reflecting the valid range of values for the now-selected experiments.
The only place where a selection is required is on the Consortium Data query pages for GEO Datasets, GEO Profiles, and GEO Components. Here, an Experimental Group must be selected in order for ENCODEdb to issue a query against the GEO database.
topENCODEdb has been tested using Internet Explorer, Safari, and Firefox. Most pages require that JavaScript be enabled; please check your browser’s preferences to assure that JavaScript is enabled.
topSimply clicking on the Clear button at the bottom of the page will reset all the pull-down menus, allowing the query to be re-issued from scratch.
topIn some instances, making a selection in one of the pull-down menus will produce additional pull-down menus or generate new fields. ENCODEdb will only display fields that directly pertain to the user’s previous choices, keeping the user interface as simple as possible.
topBED stands for "browser-extensible data." The BED file format is used by UCSC to provide a flexible way of capturing disparate types of data that are to be displayed in an annotation track in a UCSC Genome Browser display. While many kinds of data can be displayed in an annotation track, the data must be specified in a rigid, consistent way.
A complete description of the BED file format, specifying what type of data is in each column of the file, can be found on the UCSC Web site. topGalaxy is a platform for interactive large-scale genome analysis. Developed at Penn State, Galaxy provides a simple Web portal that enables users to search remote resources, combine data from independent queries, and visualize the results. More information on Galaxy can be found on the Galaxy Web site at Penn State.
topThe BED format requires that only one data value be associated with each region (each row in the table), so only one column can be selected from the GEO hybridization tables. The same will occur if Send Query to Galaxy is selected, since Galaxy also uses the BED format.
If you wish to retrieve multiple columns, select Download Selected Columns from Data File instead. This will produce a file (in tabular text format) with all of the selected data that can be used for further analysis.
topThe Galaxy server at Penn State has a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. For any data type of interest, choose Send Query to Galaxy as the output option. You will then be able to use Galaxy’s interface to combine queries as needed.
topThere are two possible way to view the GEO Component data using the UCSC Genome Browser:
To perform a query using a specific chromosome coordinate range, please use the following format:
chr#:begin-end
replacing # with the chromosome number of interest and the two values after the colon with the beginning and end points for the chromosome coordinates of interest.
For example, to specify the region on chromosome 7 between position 116147883 and 116418462, write the query as follows:
chr7:116147883-116418462
topENCODEdb Consortium Data is check every week for available updates.
ENCODEdb Genomic Context Data is updated every month.
top