Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated specs, removed the foreign keys hack, updated README, table API feedback #28

Merged
merged 5 commits into from
Jul 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 163 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,123 +10,208 @@
A utility library for working with [Table Schema](https://specs.frictionlessdata.io/table-schema/) in php.


## Features
## Features summary and Usage guide

### Installation

```bash
$ composer require frictionlessdata/tableschema
```

### Schema

A model of a schema with helpful methods for working with the schema and supported data.
Schema class provides helpful methods for working with a table schema and related data.

`use frictionlessdata\tableschema\Schema;`

Schema objects can be constructed using any of the following:

* php object
```php
$schema = new Schema((object)[
'fields' => [
(object)[
'name' => 'id', 'title' => 'Identifier', 'type' => 'integer',
'constraints' => (object)[
"required" => true,
"minimum" => 1,
"maximum" => 500
]
],
(object)['name' => 'name', 'title' => 'Name', 'type' => 'string'],
],
'primaryKey' => 'id'
]);
```

* string containing json
* string containg value supported by [file_get_contents](http://php.net/manual/en/function.file-get-contents.php)
```php
$schema = new Schema("{
\"fields\": [
{\"name\": \"id\"},
{\"name\": \"height\", \"type\": \"integer\"}
]
}");
```

You can use the Schema::validate static function to load and validate a schema.
It returns a list of loading or validation errors encountered.
* string containg value supported by [file_get_contents](http://php.net/manual/en/function.file-get-contents.php)
```
$schema = new Schema("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/ecf1b2504332852cca1351657279901eca6fdbb5/datasets/synthetic/schema.json");
```

### Table
The schema is loaded, parsed and validated and will raise exceptions in case of any problems.

Provides methods for loading any fopen compatible data source and iterating over the data.
access the schema data, which is ensured to conform to the specs.

* Data is validated according to a given table schema
* Data is converted to native types according to the schema
```
$schema->missingValues(); // [""]
$schema->primaryKey(); // ["id"]
$schema->foreignKeys(); // []
$schema->fields(); // ["id" => IntegerField, "name" => StringField]
$field = $schema->field("id");
$field("id")->format(); // "default"
$field("id")->name(); // "id"
$field("id")->type(); // "integer"
$field("id")->constraints(); // (object)["required"=>true, "minimum"=>1, "maximum"=>500]
$field("id")->enum(); // []
$field("id")->required(); // true
$field("id")->unique(); // false
```

validate function accepts the same arguemnts as the Schema constructor but returns a list of errors instead of raising exceptions
```
// validate functions accepts the same arguments as the Schema constructor
$validationErrors = Schema::validate("http://invalid.schema.json");
foreach ($validationErrors as $validationError) {
print(validationError->getMessage();
};
```

## Important Notes
validate and cast a row of data according to the schema
```
$row = $schema->castRow(["id" => "1", "name" => "First Name"]);
```

- Table schema is in transition to v1 - but many datapackage in the wild are still pre-v1
- At the moment I am developing this library with support only for v1
- See [this Gitter discussion](https://gitter.im/frictionlessdata/chat?at=58df75bfad849bcf423e5d80) about this transition
will raise exception if row fails validation

it returns the row with all native values

## Getting Started
```
$row // ["id" => 1, "name" => "First Name"];
```

### Installation
validate the row to get a list of errors

```bash
$ composer require frictionlessdata/tableschema
```
$schema->validateRow(["id" => "foobar"]); // ["id is not numeric", "name is required" .. ]
```

### Table

### Usage
Table class allows to iterate over data conforming to a table schema

Instantiate a Table object based on a data source and a table schema.

```php
use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Table;

// construct schema from json string
$schema = new Schema('{
"fields": [
{"name": "id"},
{"name": "height", "type": "integer"}
]
}');
$table = new Table("tests/fixtures/data.csv", ["fields" => [
["name" => "first_name"],
["name" => "last_name"],
["name" => "order"]
]]);
```

// schema will be parsed and validated against the json schema (under src/schemas/table-schema.json)
// will raise exception in case of validation error
Schema can be any parameter valid for the Schema object, so you can use a url or filename which contains the schema

// access in php after validation
$schema->descriptor->fields[0]->name == "id"
```php
$table = new Table("tests/fixtures/data.csv", "tests/fixtures/data.json");
```

// validate a schema from a remote resource and getting list of validation errors back
$validationErrors = tableschema\Schema::validate("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/ecf1b2504332852cca1351657279901eca6fdbb5/datasets/synthetic/schema.json");
foreach ($validationErrors as $validationError) {
print(validationError->getMessage();
};
iterate over the data, all the values are cast and validated according to the schema

// validate and cast a row according to schema
$schema = new Schema('{"fields": ["name": "id", "type": "integer"]}');
$row = $schema->castRow(["id" => "1"]);
// raise exception if row fails validation
// returns row with all native values

// validate a row
$validationErrors = $schema->validateRow(["id" => "foobar"]);
// error that id is not numeric

// iterate over a remote data source conforming to a table schema
$table = new tableschema\Table(
new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv"),
new tableschema\Schema("http://www.example.com/data-schema.json")
);
foreach ($table as $person) {
print($person["first_name"]." ".$person["last_name"]);
}

// validate a remote data source
$validationErrors = tableschema\Table::validate($dataSource, $schema);
print(tableschema\SchemaValidationError::getErrorMessages($validationErrors));

// infer schema of a remote data source
$dataSource = new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv");
$schema = new tableschema\InferSchema();
$table = new tableschema\Table($dataSource, $schema);
```php
foreach ($table as $row) {
var_dump($row); // row will be in inferred native values
var_dump($schema->descriptor()); // will contain the inferred schema descriptor
// the more iterations you make, the more accurate the inferred schema might be
// once you are satisifed with the schema, lock it
$rows = $schema->lock();
// it returns all the rows received until the lock, casted to the final inferred schema
// you may now continue to iterate over the rest of the rows
print($row["order"]." ".$row["first_name"]." ".$row["last_name"]."\n");
};
```

validate function will validate the schema and get some sample of the data itself to validate it as well

```php
Table::validate(new CsvDataSource("http://invalid.data.source/"), $schema);
```

You can instantiate a table object without schema, in this case the schema will be inferred automatically based on the data

```php
$table = new Table("tests/fixtures/data.csv");
$table->schema()->fields(); // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]
```

// schema creation, editing and saving
Additional methods and functionality

```php
$table->headers() // ["first_name", "last_name", "order"]
$table->save("output.csv") // iterate over all the rows and save the to a csv file
$table->schema() // get the Schema object
$table->read() // returns all the data as an array
```

### EditableSchema

EditableSchema extends the Schema object with editing capabilities

```
use frictionlessdata\tableschema\EditableSchema;
use frictionlessdata\tableschema\Fields\FieldsFactory;

// EditableSchema extends the Schema object with editing capabilities
$schema = new EditableSchema();
// set fields
```

edit fields
```
$schema->fields([
"id" => FieldsFactory::field((object)["name" => "id", "type" => "integer"])
"id" => (object)["type" => "integer"],
"name" => (object)["type" => "string"],
]);
// remove field
$schema->removeField("age");
// edit primaryKey
```

appropriate field object is created according to the given descriptor
```
$schema->field("id"); // IntegerField object
```

add / update or remove fields

```
$schema->field("email", (object)["type" => "string", "format" => "email"]);
$schema->field("name", (object)["type" => "string"]);
$schema->removeField("name");
```

set or update other table schema attributes
```
$schema->primaryKey(["id"]);
```


// after every change - schema is validated and will raise Exception in case of validation errors
// finally, you can save the schema to a json file
after every change - schema is validated and will raise Exception in case of validation errors

finally, save the schema to a json file

```
$schema->save("my-schema.json");
```


## Important Notes

- Table schema is in transition to v1 - but many datapackage in the wild are still pre-v1
- At the moment I am developing this library with support only for v1
- See [this Gitter discussion](https://gitter.im/frictionlessdata/chat?at=58df75bfad849bcf423e5d80) about this transition


## Contributing

Please read the contribution guidelines: [How to Contribute](CONTRIBUTING.md)
6 changes: 3 additions & 3 deletions src/DataSources/BaseDataSource.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ public function __construct($dataSource, $options = null)
$this->options = empty($options) ? (object) [] : $options;
}

public function open()
{
}
abstract public function open();

abstract public function getNextLine();

abstract public function isEof();

abstract public function save($outputDataSource);

public function close()
{
}
Expand Down
10 changes: 10 additions & 0 deletions src/DataSources/CsvDataSource.php
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,16 @@ public function close()
}
}

public function save($outputDataSource)
{
$file = fopen($outputDataSource, 'w');
fputcsv($file, $this->headerRow);
while (!$this->isEof()) {
fputcsv($file, array_values($this->getNextLine()));
}
fclose($file);
}

protected $resource;
protected $headerRow;
protected $skippedRows;
Expand Down
7 changes: 7 additions & 0 deletions src/DataSources/DataSourceInterface.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ public function open();
*/
public function getNextLine();

/**
* iterate over all rows and save to the given output data source.
*
* @param $outputDataSource
*/
public function save($outputDataSource);

/**
* @return bool
*/
Expand Down
10 changes: 10 additions & 0 deletions src/DataSources/NativeDataSource.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
*/
class NativeDataSource extends BaseDataSource
{
public function open()
{
// no opening is needed for native data source
}

/**
* @return array
*
Expand All @@ -30,5 +35,10 @@ public function isEof()
return $this->curRowNum >= count($this->dataSource);
}

public function save($output)
{
// no point in saving for native data source
}

protected $curRowNum = 0;
}
13 changes: 11 additions & 2 deletions src/EditableSchema.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

namespace frictionlessdata\tableschema;

use frictionlessdata\tableschema\Fields\FieldsFactory;

class EditableSchema extends Schema
{
public function __construct($descriptor = null)
Expand All @@ -12,9 +14,16 @@ public function __construct($descriptor = null)
public function fields($newFields = null)
{
if (!is_null($newFields)) {
$this->fieldsCache = $newFields;
$this->descriptor()->fields = [];
foreach ($newFields as $field) {
$this->fieldsCache = [];
foreach ($newFields as $name => $field) {
if (!is_a($field, 'frictionlessdata\\tableschema\\Fields\\BaseField')) {
if (!isset($field->name)) {
$field->name = $name;
}
$field = FieldsFactory::field($field);
}
$this->fieldsCache[$name] = $field;
$this->descriptor()->fields[] = $field->descriptor();
}

Expand Down
Loading