FileMarket | Text Recognition Data | 50,000 Images | Computer Vision Data | AI Model Training Data | Textual data | Annotated Imagery Data
by
Pre-collected OCR datasets include images of natural scenes, handwritten texts, bills and documents, and test papers. The AI training data spans 20 languages, various natural environments, and diverse photographic angles. Annotated Imagery Data FileMarket provides a robust Annotated Imagery Data set designed to meet the diverse needs of various computer vision and machine learning tasks. This dataset is part of our extensive offerings, which also include Textual Data, Object Detection Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each category is meticulously crafted to ensure high-quality and comprehensive datasets that empower AI development. Specifications: Data Size: 50,000 images Collection Environment: The images cover a wide array of real-world scenarios, including shop signs, stop boards, posters, tickets, road signs, comics, cover pictures, prompts/reminders, warnings, packaging instructions, menus, building signs, and more. Diversity: The dataset spans 5 languages and includes images from various natural scenes captured at multiple photographic angles (looking up, looking down, eye-level). Devices Used: Images are captured using cellphones and cameras, reflecting real-world usage. Image Parameters: All images are provided in .jpg format, and the corresponding annotation files are in .json format. Annotation Details: The dataset includes line-level quadrilateral bounding box annotations and text transcriptions. Accuracy: The error margin for each vertex of the quadrilateral bounding box is within 5 pixels, ensuring bounding box accuracy of at least 97%. The text transcription accuracy also meets or exceeds 97%. Unique Data Collection Method: FileMarket utilizes a community-driven approach to collect data, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that our datasets are diverse, real-world applicable, and ethically sourced, with full participant consent. This approach allows us to provide datasets that are both comprehensive and reflective of real-world scenarios, ensuring that your AI models are trained on the most relevant and diverse data available. By integrating our unique data collection method with the specialized categories we offer, FileMarket is committed to providing high-quality data solutions that support and enhance your AI and machine learning projects.
FileMarket | Diverse Human Face Data | 20,000 IDs | Face Recognition Data | Image/Video AI Training Data | Biometric Data
by
Our pre-compiled biometric data set (human faces) includes comprehensive features such as 3D depth, segmentation of facial organs and accessories, key points, facial expressions, alpha matte, and a range of ages. All biometric data is gathered with signed authorization agreements. Biometric Data FileMarket provides a comprehensive Biometric Data set, ideal for enhancing AI applications in security, identity verification, and more. In addition to Biometric Data, we offer specialized datasets across Object Detection Data, Machine Learning (ML) Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each dataset is meticulously crafted to support the development of cutting-edge AI models. Data Size: 20,000 IDs Race Distribution: The dataset encompasses individuals from diverse racial backgrounds, including Black, Caucasian, Indian, and Asian groups. Gender Distribution: The dataset equally represents all genders, ensuring a balanced and inclusive collection. Age Distribution: The data spans a broad age range, including young, middle-aged, and senior individuals, providing comprehensive age coverage. Collection Environment: Data has been gathered in both indoor and outdoor environments, ensuring variety and relevance for real-world applications. Data Diversity: This dataset includes a rich variety of face poses, racial backgrounds, age groups, lighting conditions, and scenes, making it ideal for robust biometric model training. Device: All data has been collected using mobile phones, reflecting common real-world usage scenarios. Data Format: The data is provided in .jpg and .png formats, ensuring compatibility with various processing tools and systems. Accuracy: The labels for face pose, race, gender, and age are highly accurate, exceeding 95%, making this dataset reliable for training high-performance biometric models.
High-quality mobility data aggregated from multiple location-aware mobile apps and SDKs globally. This dataset provides comprehensive insights into movement patterns with daily updates. All data is collected with explicit user consent and anonymized following privacy standards. Key features: 90 billion location records globally Daily data collection and delivery Complete device movement data, including coordinates, timestamps, and accuracy metrics Geographic data including country, state, city, and postal codes Detailed device informatio,n including carriers and user agents Advanced location encoding via geohash, hex8, and hex9 systems Perfect for consumer insight analysis, market intelligence, targeted advertising, and retail analytics applications. Data is available in daily/weekly/monthly/quarterly delivery options.
Factori Visit Data connects people's movements to over 200 million physical locations globally, powering geographical information system (GIS) tools and providing data-driven insights across multiple industries. These aggregated and anonymized data points offer valuable context for the volume and patterns of visits to locations worldwide. Key features: -POI/Place/OOH level insights based on Factori's Mobility & People Graph data -Foot-traffic attribution using combined location attributes -Time-based analysis by day of week and part of day -Home and work location distribution of visitors -Visitor country origin and travel patterns -Visitor demographic breakdowns including age and gender -Device brand, model, and carrier information -Place category and brand affinity metrics -Geo-behavioral interest mapping Perfect for credit scoring applications, retail analytics, market intelligence, and urban planning. Financial services can validate locations for alternative credit scoring, retailers can analyze footfall trends, marketers can study competitive landscapes, and urban planners can build cases for development based on fresh population data. Data is collected dynamically and provided through flexible delivery schedules.
FileMarket | 20,000 Photos of Palms | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data
by
Overview: FileMarket's dataset provides 20,000 high-resolution images of palms, captured in a controlled environment to ensure consistent lighting and clarity. The dataset features a variety of palm types, from different angles and lighting conditions, making it an ideal resource for training AI models in areas such as object detection, plant recognition, and environmental applications. What Makes This Data Unique? This dataset is distinctive for its comprehensive and diverse representation of palms. The images were carefully captured by professional photographers in a studio setting, ensuring uniformity in quality and lighting. The wide range of palm types, along with various angles and poses, allows for nuanced model training, including distinguishing between species, leaf shapes, and growth patterns. The consistency of the imagery eliminates the need for excessive preprocessing, enabling quicker integration into machine learning and deep learning workflows. Data Sourcing: The palm images were sourced through professional shoots in a studio environment, guaranteeing consistency across the dataset. Each image is shot with optimal lighting and framing to enhance visual clarity. The photographers have experience in nature and botanical photography, ensuring that each photo is of exceptional quality and is suited for scientific and technical applications. Primary Use-Cases: This dataset can be leveraged in a wide array of AI and machine learning contexts, including: Object Detection Data: The high clarity and consistent imagery make it perfect for training models that focus on detecting palm trees, their leaves, and different types of foliage. Machine Learning (ML) Data: The diversity of palm species and the variety of captured angles provide a robust dataset for training models aimed at plant identification, classification, and recognition. Deep Learning (DL) Data: The multi-angle shots of palms are ideal for deep learning applications that require complex features, such as image segmentation, object tracking, and even 3D reconstruction of plant structures. Environmental AI Applications: With detailed imagery, this dataset is suited for models used in environmental analysis, where palm trees play a role in ecosystem recognition or climate change studies. Broader Data Offering: This dataset is a valuable addition to FileMarket’s extensive data offerings. It can be easily integrated with other datasets, such as those related to geography, climate, or biodiversity, creating more holistic AI models. Whether you are developing applications for botany research, environmental monitoring, or advanced plant recognition, this dataset is a foundational asset for AI training.
Factori Web Data contains fresh web browsing data of users across desktop and mobile devices, indicating search intent, purchase intent, and online category interests. This comprehensive dataset tracks user activity across popular websites worldwide, delivered as a daily feed via server-to-server transfer. Key features: -Over 2 billion records of web browsing activity -Daily data collection with daily delivery frequency -Six months of historical data accessible -Anonymous user identification across devices -IP address data for geographic segmentation -Search query capture for intent analysis -Website category classification -Cross-device browsing behavior patterns -Interest and intent indicators from browsing activity Perfect for personalized targeting applications, data enrichment projects, market intelligence analysis, and fraud detection/cybersecurity initiatives. This dataset allows organizations to analyze web behavior patterns and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent. Data is collected dynamically and provided through suitable delivery methods on daily, weekly, or monthly intervals.
Factori High Fidelity Mobility data is collected from location-aware partner mobile apps. This dataset includes advanced attributes such as vertical accuracy and altitude measurements to provide exceptionally accurate location intelligence. Key features: -Over 200 billion high-precision location records -Daily data collection with same-day delivery frequency -Enhanced vertical accuracy for multi-level location determination -Altitude measurements for 3D positioning -Comprehensive user demographic information -Anonymous ID linking for privacy protection -Detailed device information -Affluence indicators and economic attributes -Interest categorization and behavioral segments -Travel patterns including visited countries This dataset is used in a wide range of business applications, such as consumer insights applications, targeted advertising campaigns, and retail analytics. All data is collected with explicit opt-in consent and anonymized to ensure privacy compliance. Data is delivered daily with options for custom delivery schedules.
Factori cross-device data identifies customers across multiple internet-connected devices through encrypted MAIDs and IP addresses. We build comprehensive cross-device data by identifying household users across devices and analyzing their data characteristics to determine intent and interests, enabling advertisers to target viewers on Connected TV platforms effectively. Key features: -3 billion records of cross-device connections -Daily data collection with same-day delivery frequency -One full year of historical data accessible -Household-level device mapping -Anonymous user identification across multiple devices -IP address mapping for household determination -Connected TV viewing patterns and preferences -Cross-platform identity resolution -Comprehensive device usage patterns Wide range of business applications such as consumer analytics applications, cross-device attribution, sales forecasting, and coordinated advertising campaigns. This dataset allows organizations to determine household-level reach, conduct aggregated analysis of household device identities, and ensure consumers don't receive repetitive messaging across platforms. Data is collected dynamically and delivered daily via server-to-server transfer.
Factori people data links anonymous IDs to multiple attributes related to demographics, device ownership, audience segments, and key locations. This comprehensive dataset helps companies enrich their existing databases or identify new customer profiles with similar interests to their current customers. Key features: -900 million unique user profiles updated monthly Comprehensive user demographic information -Mobile advertising ID (MAID) linkages -Detailed device ownership information -Location data and movement patterns -Affluence indicators and economic attributes -Interest categorization and behavioral segments -Travel patterns including visited countries Perfect for gaining 360-degree consumer insights, data enrichment projects, sales forecasting, and retail analytics. This dataset enables partner brands to develop a holistic view of consumers based on their personas and instantly gain actionable insights. Data is collected dynamically and provided through suitable delivery methods on daily, weekly, monthly, or quarterly intervals.
Factori comprehensive US consumer graph database provides detailed profiles on over 300 million individuals with more than 100 variables covering location, demographics, lifestyle, hobbies, and behaviors. This data enrichment solution helps fill gaps in customer data for deeper consumer understanding. Key features: -Geographic data including city, state, ZIP, county, CBSA, and census tract -Demographic profiles with gender, age group, marital status, and language preferences -Financial attributes including income range, credit rating, and net worth information -Persona classifications with consumer types, communication preferences, and family structure -Interest mapping across content preferences, brand affinities, shopping habits, and lifestyle choices -Household data including family composition and IP addresses -Behavioral insights covering brand affinity, app usage, and web browsing patterns -Firmographic details including industry, company, occupation, and revenue information -Retail purchase history across stores, categories, brands, and SKUs -Automotive ownership data with make, model, type, and year Housing information including home type, value, ownership status, and year built Perfect for creating 360-degree customer views, data enrichment projects, fraud detection applications, and targeted advertising and marketing campaigns. Data is collected dynamically and provided through suitable delivery methods on daily, weekly, or monthly intervals.
Access to over 1.5 billion unique mobile users across APAC, EU, North America, and MENA regions. This dataset combines raw data signals from 900+ global sources, validated, modeled, and segmented into thousands of mobile audience segments. Key features: -Comprehensive audience categorization including interests, demographics, behavioral, and geographic data -Brand Shoppers segments based on real-world visits to brand outlets -Place Category Visitors segments reflecting high-intent location visits -Demographic data including gender, age, marital status, education (US-specific enhanced data) -Geo-Behavioral segments based on frequency of visits to specific locations -Interest and Intent data derived from online browsing and shopping behavior -Event-related audience segments for sports, culture, and gaming -App usage categorization based on installed mobile applications -Specialized US segments including auto ownership, financial behavior, and B2B audiences Perfect for advertising and marketing campaign ideation, personalized messaging, market intelligence, credit scoring, and retail analytics. Data is collected dynamically and provided monthly with options for pre-made audience segments or custom segments tailored to specific requirements.
Factori Identity Graph data helps brands enhance their first-party data in a privacy-compliant manner, enabling user reach across new platforms and channels. This dataset facilitates matching customer IDs with identities across multiple platforms and devices through deterministic or probabilistic matching methods. Key features: - Over 500 million device data records linked to hashed email data - Comprehensive identity resolution across platforms and devices - Multiple data points for accurate user matching - Privacy-compliant data linking methodology - Historical data covering the past 6 months - Monthly updates ensuring data freshness Perfect for identity resolution use cases to create unified client profiles (B2B/B2C) and data enrichment applications that leverage first-party data to build holistic audience segments for improved campaign targeting. Data is delivered through privacy-compliant data clean rooms, enriching your data based on specific requirements.
Pixta AI | Face recognition | Human | Face ID + 106 key points facial landmark images | 30,000 Stock Images
by Pixta Inc
1. Overview This dataset is a collection of 30,000+ images of Face ID + 106 key points facial landmark that are ready to use for optimizing the accuracy of computer vision models. Images in the dataset includes People image with specific requirements as follow: - Age: above 20 - Race: various - Angle: no more than 90 degree All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. 2. Annotated Imagery Data of Face ID + 106 key points facial landmark This dataset contains 30,000+ images of Face ID + 106 key points facial landmark. The dataset has been annotated in - face bounding box, Attribute of race, gender, age, skin tone and 106 keypoints facial landmark. Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. 3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands.
1. Overview This dataset is a collection of 10,000+ images of Human wearing a mask in multiple scenes that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects. 2. Annotated Imagery Data of car Human wearing a mask This dataset contains 10,000+ images of Human wearing a mask. The dataset has been annotated in face bounding box, body bounding box and attribute of mask, wheelchair, stroller, umbrella, suitcase, bag, backpack, laptop, cellphone, hat, suits, glasses/sunglasses 3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands.
1. Overview This dataset is a collection of 50,000+ images of Human full body with multiple attributes that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects. 2. Annotated Imagery Data of human in full body images This dataset contains 50,000+ images of human in full body. The dataset has been annotated in face bounding box face, body bounding box and Attribute of mask, wheelchair, stroller, umbrella, suitcase, bag, backpack, laptop, cellphone,... 3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands.
Pixta AI | Object Detection | Computer vision | E-commerce apparel dataset | 10,000 Stock Images
by Pixta Inc
1. Overview This dataset is a collection of 10,000+ images of E-commerce apparel dataset that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects. 2. Annotated Imagery Data of car images This dataset contains 10,000+ images of E-commerce apparel dataset. Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. 3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands.
1. Overview This dataset is a collection of 10,000+ images of damaged cars in multiple scenes that are ready to use for optimizing the accuracy of computer vision models. The dataset includes car images with 9 types of small damage (dent, scratch, gouge, crack, glass_shatter, lamp_broken, tire_flat, hail, rust) and balance classification 2. Annotated Imagery Data of damaged car images This dataset contains 10,000+ images of damaged cars. The dataset has annotated in Classification (9 Car Damage label) and Instant segmentation. Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets. 3. About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands.
This dataset is a collection of 100,000+ images of mixed race human face with various expressions & emotions that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.