Datasets:

inductiva
/

windtunnel-20k

Error code:   InfoError
Exception:    ReadTimeout
Message:      (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 05a30bf8-edcb-450d-b474-4984c66f1a47)')
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/split/first_rows.py", line 211, in compute_first_rows_from_streaming_response
                  info = get_dataset_config_info(path=dataset, config_name=config, token=hf_token)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 277, in get_dataset_config_info
                  builder = load_dataset_builder(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1795, in load_dataset_builder
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1671, in dataset_module_factory
                  raise e1 from None
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1640, in dataset_module_factory
                  return HubDatasetModuleFactoryWithoutScript(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1063, in get_module
                  data_files = DataFilesDict.from_patterns(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 721, in from_patterns
                  else DataFilesList.from_patterns(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 634, in from_patterns
                  origin_metadata = _get_origin_metadata(data_files, download_config=download_config)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 548, in _get_origin_metadata
                  return thread_map(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
                  return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
                  return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/tqdm/std.py", line 1169, in __iter__
                  for obj in iterable:
                File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
                  yield fs.pop().result()
                File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 446, in result
                  return self.__get_result()
                File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
                  raise self._exception
                File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
                  result = self.fn(*self.args, **self.kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 527, in _get_single_origin_metadata
                  resolved_path = fs.resolve_path(data_file)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_file_system.py", line 175, in resolve_path
                  repo_and_revision_exist, err = self._repo_and_revision_exist(repo_type, repo_id, revision)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_file_system.py", line 121, in _repo_and_revision_exist
                  self._api.repo_info(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
                  return fn(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 2682, in repo_info
                  return method(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
                  return fn(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 2539, in dataset_info
                  r = get_session().get(path, headers=headers, timeout=timeout, params=params)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
                  return self.request("GET", url, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
                  resp = self.send(prep, **send_kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
                  r = adapter.send(request, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 93, in send
                  return super().send(request, *args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/adapters.py", line 635, in send
                  raise ReadTimeout(e, request=request)
              requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 05a30bf8-edcb-450d-b474-4984c66f1a47)')

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

Wind Tunnel Dataset

The Wind Tunnel Dataset contains 19,812 OpenFOAM simulations of 1,000 unique automobile-like objects placed in a virtual wind tunnel measuring 20 meters long, 10 meters wide, and 8 meters high.

Each object was tested under 20 different conditions: 4 random wind speeds ranging from 10 to 50 m/s, and 5 rotation angles (0°, 180° and 3 random angles).

The object meshes were generated using Instant Mesh based on images sourced from the Stanford Cars Dataset. To make sure the results are stable and reliable, each simulation runs for 300 iterations.

The entire dataset is organized into three subsets: 70% for training, 20% for validation, and 10% for testing.

The data generation process itself was orchestrated using the Inductiva API, which allowed us to run hundreds of OpenFOAM simulations in parallel on the cloud.

Motivation: Addressing the Data Gap in CFD

Recently, there’s been a lot of interest in using machine learning (ML) to speed up CFD simulations. Research has shown that for well-defined scenarios—like a virtual wind tunnel—you can train an ML model to “predict” the results of a simulation much faster than traditional methods, while still keeping the accuracy close to what you’d expect from classical simulations.

That said, the ML/CFD communities are still lacking enough training data for their research. We’ve identified two main reasons for this.

First, there’s a shortage of datasets with high-quality 3D meshes needed for running CFD simulations. Existing 3D object datasets have a lot of limitations: they’re either too small, closed-source, or have low-quality meshes. Without this input data, it’s been really hard to generate large-scale training datasets for realistic CFD scenarios, which almost always involve 3D meshes.

Second, even if you had all the 3D meshes you needed, setting up and running thousands of CFD simulations to generate a large, diverse dataset isn’t easy. To create a dataset like this, you’d need to define an initial simulation scenario (like the wind tunnel setup) and then run enough variations—different meshes, wind speeds, and so on—to cover a wide range of data points for training a robust ML model.

The problem is, running a single CFD simulation can be tricky enough with most software. Orchestrating thousands of simulations and handling all the resulting data? That’s a whole new level of challenge.

While both of these problems are difficult to solve in general, we decided to focus on one common CFD scenario: a virtual wind tunnel for static automobiles. Using the popular OpenFOAM simulation package, we produced a large dataset of CFD simulations.

Next, we’ll explain how we tackled the challenges of generating the data and orchestrating the simulations.

Generating a Large Quantity of Automobile-like 3D Meshes

Since there aren’t many publicly available 3D meshes of automobiles, we decided to use recent image-to-mesh models to generate meshes from freely available car images.

We specifically used the open-source InstantMesh model (Apache-2.0), which is currently state-of-the-art in image-to-mesh generation. We generated the automobile-like meshes by running Instant Mesh on 1,000 images from the publicly available Stanford Cars Dataset (Apache-2.0), which contains 16,185 images of automobiles.

Running the image-to-mesh model naturally results in some defects, like irregular surfaces, asymmetry, holes, and disconnected components. To address these issues, we implemented a custom post-processing step to improve mesh quality. We used PCA to align the meshes with the main axes and removed any disconnected components.

The resulting set of meshes still contains minor defects, like “spikes” or “cavities” in flat areas, unexpected holes, and asymmetry issues. However, we see these imperfections as valuable features of the dataset. From a machine learning perspective, they bring challenges that can help prevent overfitting and contribute to building more robust and generalizable models.

Orchestrating 20k Cloud Simulations—Using Just Python

To tackle the challenge of orchestrating 20,000 OpenFOAM simulations, we resorted to the Inductiva API. The Inductiva platform offers a simple Python API for running simulation workflows in the cloud and supports several popular open-source packages, including OpenFOAM. Here’s an example of how to run an OpenFOAM simulation using Inductiva.

With the Inductiva API, it’s easy to parameterize specific simulation scenarios and run variations of a base case by programmatically adjusting the input parameters and starting conditions of the simulation. More details here. Additionally, users can create custom Python classes that wrap these parameterized simulations, providing a simple Python interface for running simulations—no need to interact directly with the underlying simulation packages.

We used the Inductiva API to create a Python class for the Wind Tunnel scenario, which allowed us to run 20,000 simulations across a range of input parameters.

For more on how to transform your complex simulation workflows into easy-to-use Python classes, we wrote a blog post all about it.

How Did We Generate the Dataset?

Generate Input Meshes: We first generated input meshes using the InstantMesh model with images from the Stanford Cars Dataset, followed by post-processing to improve mesh quality.
Run OpenFOAM Simulations: Using the Inductiva API, we ran OpenFOAM simulations on the input meshes under different wind speeds and angles. The result is an output mesh openfoam_mesh.objthat contains all the relevant simulation data.
Post-process OpenFOAM Output: We post-processed the OpenFOAM output to generate streamlines and pressure map meshes.

The code we used to generate and post-process the meshes is available on GitHub.

Dataset Structure

data
├── train
│   ├── <SIMULATION_ID>
│   │   ├── input_mesh.obj
│   │   ├── openfoam_mesh.obj
│   │   ├── pressure_field_mesh.vtk
│   │   ├── simulation_metadata.json
│   │   └── streamlines_mesh.ply
│   └── ...
├── validation
│   └── ...
└── test
    └── ...

Dataset Files

Each simulation in the Wind Tunnel Dataset is accompanied by several key files that provide both the input and the output data of the simulations. Here’s a breakdown of the files included in each simulation:

input_mesh.obj: OBJ file with the input mesh.
openfoam_mesh.obj: OBJ file with the OpenFOAM mesh.
pressure_field_mesh.vtk: VTK file with the pressure field data.
streamlines_mesh.ply: PLY file with the streamlines.
metadata.json: JSON with metadata about the input parameters and about some output results such as the force coefficients (obtained via simulation) and the path of the output files.

input_mesh.obj

The input mesh we generated using the Instant Mesh model from images in the Stanford Cars Dataset, serves as the starting point for the OpenFOAM simulation.

Details on the mesh generation process can be found here.

Input Mesh	# points of input meshes

import pyvista as pv

# Load the mesh
mesh_path = "input_mesh.obj"
mesh = pv.read(mesh_path)

# Get the vertices (points)
vertices = mesh.points

# Get the faces (connections)
# The faces array contains the number of vertices per face followed by the vertex indices.
# For example: [3, v1, v2, v3, 3, v4, v5, v6, ...] where 3 means a triangle.
faces = mesh.faces

openfoam_mesh.obj

This mesh is the result of the OpenFOAM simulation. The number of points is reduced compared to the input_mesh.obj due to mesh refinement and processing steps applied by OpenFOAM during the simulation.

OpenFoam Mesh	# points of OpenFoam meshes

import pyvista as pv

# Load the mesh
mesh_path = "openfoam_mesh.obj"
mesh = pv.read(mesh_path)

# Get the vertices (points)
vertices = mesh.points

# Get the faces (connections)
# The faces array contains the number of vertices per face followed by the vertex indices.
# For example: [3, v1, v2, v3, 3, v4, v5, v6, ...] where 3 means a triangle.
faces = mesh.faces

pressure_field_mesh.vtk

Pressure values were extracted from the openfoam_mesh.obj and interpolated onto the input_mesh.obj using the closest point strategy. This approach allowed us to project the pressure values onto a higher-resolution mesh. As shown in the histogram, the the point distribution matches that of the input_mesh.obj.

More details can be found here here.

Pressure Field Mesh	# points of Pressure Field meshes
	)

import pyvista as pv

mesh_path = "pressure_field_mesh.vtk"
mesh = pv.read(mesh_path)

# The vertices array contains the coordinates of each point in the mesh.
vertices = mesh.points

# The faces array contains the number of vertices per face followed by the vertex indices.
# For example: [3, v1, v2, v3, 3, v4, v5, v6, ...] where 3 means a triangle.
faces = mesh.faces

# Get the pressure data (scalar named "p")
# This retrieves the pressure values associated with each vertex in the mesh.
pressure_data = mesh.point_data["p"]

streamlines_mesh.ply

Streamlines visually represent the flow characteristics within the simulation, illustrating how air flows around the object

More information can be found here.

Streamlines Mesh	# points of streamlines meshes

metadata.json

This file contains metadata related to the simulation, including input parameters such as wind_speed, rotate_angle, num_iterations, and resolution. Additionally, it includes output parameters like drag_coefficient, moment_coefficient, lift_coefficient, front_lift_coefficient, and rear_lift_coefficient. The file also specifies the locations of the generated output meshes.

  {
    "id": "1w63au1gpxgyn9kun5q9r7eqa",
    "object_file": "object_24.obj",
    "wind_speed": 35,
    "rotate_angle": 332,
    "num_iterations": 300,
    "resolution": 5,
    "drag_coefficient": 0.8322182,
    "moment_coefficient": 0.3425206,
    "lift_coefficient": 0.1824983,
    "front_lift_coefficient": 0.4337698,
    "rear_lift_coefficient": -0.2512715,
    "input_mesh_path": "data/train/1w63au1gpxgyn9kun5q9r7eqa/input_mesh.obj",
    "openfoam_mesh_path": "data/train/1w63au1gpxgyn9kun5q9r7eqa/openfoam_mesh.obj",
    "pressure_field_mesh_path": "data/train/1w63au1gpxgyn9kun5q9r7eqa/pressure_field_mesh.vtk",
    "streamlines_mesh_path": "data/train/1w63au1gpxgyn9kun5q9r7eqa/streamlines_mesh.ply"
}

Dataset Statistics

The dataset includes 19,812 valid samples out of 20,000 simulations, with 188 submissions failing due to numerical errors in OpenFOAM.

The full dataset requires about 300 GB of storage, but you can also download smaller portions if needed.

Downloading the Dataset:

To download the dataset, you’ll need to install the Datasets package from Hugging Face:

pip install datasets

1. Using snapshot_download()

import huggingface_hub

dataset_name = "inductiva/windtunnel-20k"

# Download the entire dataset
huggingface_hub.snapshot_download(repo_id=dataset_name, repo_type="dataset")

# Download to a specific local directory
huggingface_hub.snapshot_download(
    repo_id=dataset_name, repo_type="dataset", local_dir="local_folder"
)

# Download only the simulation metadata across all simulations
huggingface_hub.snapshot_download(
    repo_id=dataset_name,
    repo_type="dataset",
    local_dir="local_folder",
    allow_patterns=["*/*/*/simulation_metadata.json"]
)

2. Using load_dataset()

import datasets

# Load the dataset (streaming is supported)
dataset = datasets.load_dataset("inductiva/windtunnel-20k", streaming=False)

# Display dataset information
print(dataset)

# Access a sample from the training set
sample = dataset["train"][0]
print("Sample from training set:", sample)

OpenFoam Parameters

We used the Inductiva Template Manager to parameterize the OpenFoam configuration files.

Below are some snippets from the templates used in the wind tunnel simulations.

initialConditions.jinja

flowVelocity         ({{ wind_speed }} 0 0);

controlDict.jinja

endTime         {{ num_iterations }};

forceCoeffs.jinja

magUInf         {{ wind_speed }};
lRef            {{ length }};        // Wheelbase length
Aref            {{ area }};        // Estimated

snappyHexMeshDict.jinja

geometry
{
    object
    {
        type triSurfaceMesh;
        file "object.obj";
    }

    refinementBox
    {
        type searchableBox;
        min ({{ x_min }} {{ y_min }} {{ z_min }});
        max ({{ x_max }} {{ y_max }} {{ z_max }});
    }
};

features
(
    {
        file "object.eMesh";
        level {{ resolution + 1  }};
    }
);


refinementSurfaces
{
    object
    {
        // Surface-wise min and max refinement level
        level ({{ resolution }} {{ resolution + 1 }});
    }
}

refinementRegions
{
    refinementBox
    {
        mode inside;
        levels ((1E15 {{ resolution - 1 }}));
    }
}

locationInMesh ({{ x_min }} {{ y_min }} {{ z_min }});

You can find the full OpenFoam configuration on github: https://github.com/inductiva/wind-tunnel/tree/main/windtunnel/templates

What's Next?

If you encounter any issues with this dataset, feel free to reach out at support@intuctiva.ai.

If you spot any problematic meshes, let us know so we can fix them in the next version of the Windtunnel-20k dataset.

To learn more about how we created this dataset—or how you can generate synthetic datasets for Physics-AI models—check out our well-tested 4-step recipe for generating synthetic data or discover how to transform your own complex simulation workflows into easy-to-use Python classes.

You may also be interested in reading our blog post, The 3D Mesh Resolution Threshold - 5k Points is All You Need!, where we explore just how much you can reduce the level of detail in a 3D object while still maintaining accurate aerodynamic results in a virtual wind tunnel built with OpenFOAM.

Downloads last month: 0

Edit dataset card

Size of the auto-converted Parquet files:

119 MB

Number of rows:

19,812