Each dataset defines modules to read and write in the
For most datasets the reading module only contains additional
metadata like class labels and distributions.
Let’s consider the MIT1003 dataset as an example.
MIT1003_write is an executable that creates dataset files.
It can be called directly
python -m datadings.sets.MIT1003_write
Three files will be written:
MIT1003.msgpackcontains sample data
MIT1003.msgpack.indexcontains index for random access
MIT1003.msgpack.md5contains MD5 hashes of both files
Reading all samples sequentially,
MsgpackReader as a context manager:
from datadings.reader import MsgpackReader with MsgpackReader('MIT1003.msgpack') as reader: for sample in reader: # do dataset things!
This standard iterator returns dictionaries.
rawiter() method to get samples as messagepack encoded
Reading specific samples:
reader.seek_key('i14020903.jpeg') print(reader.next()['key']) reader.seek_index(100) print(reader.next()['key'])
Reading samples as raw msgpacked bytes:
raw = reader.rawnext() for raw in reader.rawiter(): print(type(raw), len(raw))
Number of samples:
from datadings.reader import Shuffler with Shuffler(MsgpackReader('MIT1003.msgpack')) as reader: for sample in reader: # do dataset things, but in random order!
A common use case is to iterate over the whole dataset multiple times.
This can be done with the
from datadings.reader import Cycler with Cycler(MsgpackReader('MIT1003.msgpack')) as reader: for sample in reader: # do dataset things, but FOREVER!