OpenRefine

OpenRefine
Developer(s) Google, open source community
Initial release November 10, 2010 (2010-11-10)
Stable release
2.5 / December 11, 2011 (2011-12-11) [1]
Repository github.com/OpenRefine/OpenRefine
Development status Active
Written in Java [2]
Platform Microsoft Windows, Linux, macOS
Available in English, Italian, Chinese
Type
License BSD License
Website openrefine.org

OpenRefine, formerly called Google Refine, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling.[3] It is similar to spreadsheet applications (and can work with spreadsheet file formats); however, it behaves more like a database.

It operates on rows of data which have cells under columns, which is very similar to relational database tables. One OpenRefine project is one table. The user can filter the rows to display using facets that define filtering criteria (for example, showing rows where a given column is not empty). Unlike spreadsheets, most operations in OpenRefine are done on all visible rows: transformation of all cells in all rows under one column,[4] creation of a new column based on existing column data, etc. All actions that were done on a dataset are stored in a project and can be replayed on another dataset.

Unlike spreadsheets, no formulas are stored in the cells, but formulas are used to transform the data, and transformation is done only once.[5] Transformation expressions can be written in Google Refine Expression Language (GREL),[6] Jython (i.e. Python) and Clojure.[7]

The program has a web user interface. However, it is not hosted on the web (SAAS), but is available for download and use on the local machine. When starting OpenRefine, it starts a web server and starts a browser to open the web UI powered by this web server.

Possible uses of software

Supported formats from import and export

Import is supported from following formats:[13]

If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools. Archived and compressed files are supported (.zip, .tar.gz, .tgz, .tar.bz2, .gz, or .bz2) and Refine can download input files from a URL. To use web pages as input, it is possible to import list of URLs and then invoke a URL fetch function.

Export is supported in following formats:[15]

Whole OpenRefine projects in native format can be exported as a .tar.gz archive.

History

OpenRefine started life as Freebase Gridworks developed by Metaweb and has been available as open source since January, 2010.[16] On 16 July 2010, Google acquired Metaweb,[17] the creators of Freebase, and on 10 November 2010 renamed their Freebase Gridworks software to Google Refine, releasing version 2.0.[18] On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine.[19][20][21] Since then, the codebase has been in transition to an open source project named OpenRefine.[22]

Books

References

  1. "Project downloads".
  2. "Google code repository viewer". Retrieved 18 April 2012.
  3. "OpenRefine Project Home".
  4. "Editing by transforming: Cell Editing wiki page from Refine documentation". Retrieved 18 April 2012.
  5. "Comparison with spreadsheet software: Cell Editing wiki page in Refine documentation". Retrieved 18 April 2012.
  6. Google Refine expression language OpenRefine/OpenRefine Wiki GitHub. Github.com (2013-04-03). Retrieved on 2013-08-16.
  7. "Expressions: Refine documentation". Retrieved 18 April 2012.
  8. "Screencast: Google Refine 2.0 - Introduction (1 of 3) - editing government data". Retrieved 18 April 2012.
  9. "Stripping HTML: Refine documentation wiki page". Retrieved 18 April 2012.
  10. "FetchingURLsFromWebServices wiki page: Refine documentation". Retrieved 18 April 2012.
  11. "Screencast: Google Refine 2.0 - Data Augmentation (3 of 3) - using Openstreetmap Nominatim for geocoding and Freebase for augmentation". Retrieved 18 April 2012.
  12. "Schema Alignment: Refine documentation wiki page". Retrieved 18 April 2012.
  13. "Importers: Refine documentation wiki page". Retrieved 18 April 2012.
  14. "Changelog for 2.5". Retrieved 18 April 2012.
  15. "Exporting: Refine documentation wiki page". Retrieved 18 April 2012.
  16. https://code.google.com/p/google-refine/source/detail?r=2
  17. "Google Official Blog: Deeper understanding with Metaweb". Retrieved 18 April 2012.
  18. "Google Opensource blog: Announcing Google Refine 2.0, a power tool for data wranglers". Retrieved 18 April 2012.
  19. "[announcement] the future of the Refine projects".
  20. "From Freebase Gridworks to Google Refine and now OpenRefine".
  21. OpenRefine. OpenRefine. Retrieved on 2013-08-16.
  22. google-refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) - Google Project Hosting. Code.google.com. Retrieved on 2013-08-16.

Editing Category:Acquisition of big data

This article is issued from Wikipedia - version of the 11/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.