We are excited to bring Transform 2022 back in person July 19 and almost July 20 – 28. Join AI and data leaders for meaningful talks and exciting networking opportunities. Register now!
All libraries and machine learning projects rely on data to learn, train and operate.
In an effort to help developers more easily benefit from tagged datasets and machine learning models for computer vision, Roboflow today announced the expansion of datasets and AI models as about the Roboflow Universe initiative, which will be one of the most open-source repositories available. Roboflow claims that it already has more than 90,000 datasets that include more than 66 million images in the Roboflow Universe service launched in August 2021.
Roboflow was founded in 2019 and raised $ 20 million in a Series A funding round in September 2021. Roboflow provides the Universe’s open-source repository of datasets and models for computer vision as well as data labeling, model development and hosting capabilities. Roboflow’s business model is to provide free service sets for entry-level users and then as usage grows, or for organizations working on proprietary sets, the company provides compensation support and service options.
Roboflow Universe isn’t about just providing images that can be used by a developer; it’s about providing images that are curated in a way that makes the datasets usable for AI-powered applications.
“A project is basically something that involves a dataset that can be used by one person and a trained model on top of the data set,” said Joseph Nelson, co-founder and CEO of VentureBeat. “The dataset is the images as well as the annotations.”
The better the data, the nicer the marked data
Nelson said organizations often spend a lot of time preparing machine learning data.
The data preparation process involves labeling and classifying the data, so that a model can be effectively trained. Nelson says the labeling of the Roboflow Universe is not just a description of an image.
Labels that can be included in the Roboflow Universe for a given dataset are things like a bounding box, which provides a box around an object, which helps detect the object in a crowded landscape. Another type of labeling that Roboflow does is instance segmentation, which provides a polygon-shaped neat map around the object of interest.
The data labeling formats used in machine learning are often complex and varied. For that purpose, Nelson said Roboflow supports the export of data in 36 data labeling annotation formats. Among the supported formats are COCO JSON, VOC XML and the YOLO Darknet TXT format.
“Making image data widely available and applicable means that someone can easily find a dataset, pull it out of their training pipeline, and get up and go,” Nelson said.
How developers integrate Roboflow Universe datasets into applications
Carrying computer vision data and models into AI-driven applications can often be a complex combination.
Nelson’s purpose at Roboflow is to help reduce complexity. He said Roboflow Universe datasets can be accessed through open APIs. For example, he noted that Roboflow has a Python package hosted by Python Package Index (PyPI) that allows programmatic developers to drag images, annotations and models and then embed directly the components into an application.
Deploying a Roboflow Universe model to popular cloud machine learning services, including AWS Sagemaker or Google’s Vertex is also a straightforward operation via API call, according to Nelson. In addition, Roboflow makes datasets and models available as Docker containers, enabling the deployment of edge devices. There is also a software development kit (SDK) for supporting Apple iOS devices as well.
“If we make it easy to use a model wherever you want to use it, then best, an engineer focuses their time on the thing that actually does their business,” Nelson said.
The intersection of open source models and AI bias
Facilitating access to datasets and models for computer vision to create applications is an important goal for Roboflow. Another effect of having such a large corpus of open source data is to help improve AI bias concerns.
“AI bias is never a solvable problem,” Nelson said. “But providing clarity, access and discovery can help.”
Nelson explains that AI bias is always part of trying to understand why a model makes a particular decision. Basically, the way models are made is based on the data trained by the models. By having a larger dataset that includes more diversity, a model can be more representative, with less risk of bias.
“Ultimately a lot of AI bias problems come from low representation,” Nelson said. “The way to fix under representation is to enable the active collection of data sets of the unrepresented class, and make that data accessible, searchable and usable.”
The mission of VentureBeat is to be a digital town square for technical decision makers to gain knowledge about transformative enterprise technology and transactions. Learn more about membership.