Data Per Vertex and Data Per Streamline#

This tutorial demonstrates how to work with metadata in TRX files. TRX supports two types of metadata:

  • Data Per Vertex (dpv): Information attached to each point along streamlines

  • Data Per Streamline (dps): Information attached to entire streamlines

By the end of this tutorial, you will know how to:

  • Access dpv and dps data in a TRX file

  • Understand the data shapes and organization

  • Use metadata for filtering and analysis

Understanding DPV and DPS#

Data Per Vertex (dpv):

  • Attached to each individual point (vertex) in all streamlines

  • Shape: (NB_VERTICES, 1) for scalar data or (NB_VERTICES, N) for vector data

  • Common uses: FA values at each point, RGB colors, local orientations

Data Per Streamline (dps):

  • Attached to entire streamlines (one value per streamline)

  • Shape: (NB_STREAMLINES, 1) for scalar data or (NB_STREAMLINES, N) for vector data

  • Common uses: bundle ID, mean FA, streamline length, tracking algorithm ID

Loading a TRX file with metadata#

Let’s load a TRX file and explore its metadata.

import os

import numpy as np

from trx.fetcher import fetch_data, get_home, get_testing_files_dict
from trx.trx_file_memmap import load

# Download test data
fetch_data(get_testing_files_dict(), keys="gold_standard.zip")
trx_home = get_home()
trx_path = os.path.join(trx_home, "gold_standard", "gs.trx")

# Load the TRX file
trx = load(trx_path)

print(f"Loaded TRX with {len(trx)} streamlines")
print(f"Total vertices: {trx.header['NB_VERTICES']}")
Loaded TRX with 13 streamlines
Total vertices: 104

Exploring Data Per Vertex (dpv)#

Let’s see what dpv data is available.

print("Data Per Vertex keys:", list(trx.data_per_vertex.keys()))

# Examine each dpv field
for key in trx.data_per_vertex:
    data = trx.data_per_vertex[key]
    print(f"\n  {key}:")
    print(f"    Shape: {data._data.shape}")
    print(f"    Dtype: {data._data.dtype}")
    print(f"    Sample values: {data._data[:3].flatten()}")
Data Per Vertex keys: ['color_y', 'color_z', 'color_x']

  color_y:
    Shape: (104, 1)
    Dtype: float32
    Sample values: [20. 20. 20.]

  color_z:
    Shape: (104, 1)
    Dtype: float32
    Sample values: [60. 60. 60.]

  color_x:
    Shape: (104, 1)
    Dtype: float32
    Sample values: [220. 220. 220.]

Accessing dpv for a specific streamline#

The dpv data is organized to match the streamlines. You can access the dpv values for a specific streamline using the same indices.

if len(trx.data_per_vertex) > 0:
    first_dpv_key = list(trx.data_per_vertex.keys())[0]
    dpv_data = trx.data_per_vertex[first_dpv_key]

    # Get dpv values for the first streamline
    first_streamline_dpv = dpv_data[0]
    print(f"DPV '{first_dpv_key}' for first streamline:")
    print(f"  Shape: {first_streamline_dpv.shape}")
    print(f"  Values: {first_streamline_dpv.flatten()}")
DPV 'color_y' for first streamline:
  Shape: (8, 1)
  Values: [20. 20. 20. 20. 20. 20. 20. 20.]

Exploring Data Per Streamline (dps)#

Now let’s examine the dps data.

print("Data Per Streamline keys:", list(trx.data_per_streamline.keys()))

# Examine each dps field
for key in trx.data_per_streamline:
    data = trx.data_per_streamline[key]
    print(f"\n  {key}:")
    print(f"    Shape: {data.shape}")
    print(f"    Dtype: {data.dtype}")
    print(f"    First 5 values: {data[:5].flatten()}")
Data Per Streamline keys: ['random_coord']

  random_coord:
    Shape: (13, 3)
    Dtype: float32
    First 5 values: [ 7.  1.  5.  2.  4.  5.  9.  8. 10.  4.  6.  3.  6.  3.  5.]

DPS for filtering streamlines#

A common use case is filtering streamlines based on dps values. For example, selecting streamlines with high FA values.

if len(trx.data_per_streamline) > 0:
    # Use the first dps key for demonstration
    first_dps_key = list(trx.data_per_streamline.keys())[0]
    dps_data = trx.data_per_streamline[first_dps_key]

    # Calculate some statistics
    print(f"\nStatistics for '{first_dps_key}':")
    print(f"  Min: {np.min(dps_data):.4f}")
    print(f"  Max: {np.max(dps_data):.4f}")
    print(f"  Mean: {np.mean(dps_data):.4f}")
    print(f"  Std: {np.std(dps_data):.4f}")
Statistics for 'random_coord':
  Min: 0.0000
  Max: 10.0000
  Mean: 5.3077
  Std: 2.6617

File structure for dpv and dps#

In the TRX format, dpv and dps are stored in separate directories:

my_tractogram.trx/
|-- dpv/
|   |-- fa.float16              # FA values per vertex
|   |-- colors.3.uint8          # RGB colors (3 values per vertex)
|   +-- curvature.float32       # Curvature per vertex
|-- dps/
|   |-- bundle_id.uint8         # Bundle assignment per streamline
|   |-- length.uint16           # Length per streamline
|   +-- mean_fa.float32         # Mean FA per streamline
+-- ...

The filename format is: name.dtype or name.dimension.dtype

Working with multi-dimensional data#

Both dpv and dps can have multiple dimensions. For example, RGB colors have 3 values per vertex.

print("\nDemonstrating multi-dimensional data:")

# Check for any multi-dimensional dpv
for key in trx.data_per_vertex:
    data = trx.data_per_vertex[key]
    if len(data._data.shape) > 1 and data._data.shape[1] > 1:
        print(f"  {key}: {data._data.shape[1]}D data per vertex")

# Check for any multi-dimensional dps
for key in trx.data_per_streamline:
    data = trx.data_per_streamline[key]
    if len(data.shape) > 1 and data.shape[1] > 1:
        print(f"  {key}: {data.shape[1]}D data per streamline")
Demonstrating multi-dimensional data:
  random_coord: 3D data per streamline

Relationship between dpv and streamlines#

It’s important to understand how dpv data maps to individual streamlines. Each streamline’s dpv values can be accessed using the streamline’s vertex indices.

# Get vertex counts for first few streamlines
print("\nVertex distribution for first 5 streamlines:")
for i in range(min(5, len(trx))):
    streamline = trx.streamlines[i]
    print(f"  Streamline {i}: {len(streamline)} vertices")

# Total vertices should match
total_from_streamlines = sum(len(trx.streamlines[i]) for i in range(len(trx)))
print(f"\nTotal vertices from streamlines: {total_from_streamlines}")
print(f"Total vertices in header: {trx.header['NB_VERTICES']}")
Vertex distribution for first 5 streamlines:
  Streamline 0: 8 vertices
  Streamline 1: 8 vertices
  Streamline 2: 8 vertices
  Streamline 3: 8 vertices
  Streamline 4: 8 vertices

Total vertices from streamlines: 104
Total vertices in header: 104

Summary#

In this tutorial, you learned how to:

  • Access dpv data using trx.data_per_vertex[key]

  • Access dps data using trx.data_per_streamline[key]

  • Understand the shape conventions for scalar and vector data

  • Use metadata for statistical analysis

  • Understand the file structure for dpv and dps

The TRX format’s metadata system is designed for flexibility, allowing you to attach any kind of information to vertices or streamlines.

Total running time of the script: (0 minutes 0.009 seconds)

Gallery generated by Sphinx-Gallery