Machine-Readable Documents

Machine-readable documents are documents whose content can be readily processed by computers. Such documents are distinguished from machine-readable data by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created. Data without context (language use) is meaningless and lacks the four essential characteristics of trustworthy business records specified in ISO 15489 Information and documentation -- Records management:

The vast bulk of information is unstructured data and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the Capability Maturity Model. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for records management functions, provides inadequate evidence for legal purposes, drives up the cost of discovery (law) in litigation, and makes access and usage needlessly cumbersome in routine, ongoing business processes.

There are at least four aspects to machine-readability:

As early as 1981, the U.S. Government Accountability Office (GAO) began reporting on the problem of inadequate record-keeping practices in the U.S. federal government.[1] Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means.[2] However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements.[3] [4] Moreover, more than two decades after a major and formerly highly respected auditing firm, Arthur Andersen, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election.

On January 4, 2011, President Obama signed H.R. 2142, the Government Performance and Results Act (GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format.[5] Additionally, in 2013, he issued Executive Order 13642, Making Open and Machine Readable the New Default for Government Information in general.[6] On July 28, 2016, the Office of Management and Budget (OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable.

In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records. Document-oriented databases have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (XML) is a World Wide Web Consortium (W3C) Recommendation setting forth rules for encoding documents in a format that is both human-readable and machine-readable. Many XML editor tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so.

The W3C's accompanying XML Schema (XSD) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the Organization for the Advancement of Structured Information Standards (OASIS) is a leading standards-developing organization. JSON Schema was proposed by the Internet Engineering Task Force (IETF) but was allowed to expire in 2013 and thus is less mature and a riskier alternative to XSD, the most recent version of which was approved by the W3C in 2012.

The W3C's Extensible Stylesheet Language (XSL) family of languages provides for the transformation and rendering of XML documents for human-readable presentation. Machine-readable documents can be automatically rendered in human-readable format but documents formatted primarily for attractiveness of presentation cannot easily be processed by computers to support usability by human beings.

The Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it. PDF/A is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including XML, into PDF/A conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's XSL-FO (XSL Formatting Objects) markup language is commonly used to generate PDF files

Metadata, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of spoliation of evidence and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of auditing data maintained in databases, thereby reducing the risk of single points of failure associated with the Machiavellian concept of a single source of truth.

Blockchain (database) is a new technology for maintaining continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no single point of failure subject to manipulation and fraud.

See also


External links

This article is issued from Wikipedia - version of the 11/15/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.