geotxt
This commit is contained in:
185
scripts/ingest/geospatial_data_relationships.txt
Normal file
185
scripts/ingest/geospatial_data_relationships.txt
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
GEOSPATIAL DATA RELATIONSHIPS IN PERSONAL TRACKER
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
This document explains how the geospatial datasets in the timeline_csv folder are interconnected
|
||||||
|
and structured to provide a complete picture of location-based personal tracking data.
|
||||||
|
|
||||||
|
OVERVIEW
|
||||||
|
--------
|
||||||
|
The location tracking system uses a hierarchical approach with semantic segments as the master
|
||||||
|
index that coordinates different types of location data. The data is organized into movement
|
||||||
|
periods (travel) and stationary periods (visits), creating a complete chronological timeline.
|
||||||
|
|
||||||
|
CORE DATASETS AND THEIR RELATIONSHIPS
|
||||||
|
=====================================
|
||||||
|
|
||||||
|
1. SEMANTIC_SEGMENTS.CSV - The Master Index
|
||||||
|
--------------------------------------------
|
||||||
|
Purpose: Acts as the central orchestrator that defines time-based segments
|
||||||
|
Key Fields:
|
||||||
|
- segment_index: Unique identifier linking all other datasets
|
||||||
|
- startTime/endTime: Time boundaries for each segment
|
||||||
|
- has_visit: Boolean indicating if segment contains visit data
|
||||||
|
- has_timeline_path: Boolean indicating if segment contains movement data
|
||||||
|
|
||||||
|
This dataset defines the temporal structure and determines which other datasets contain
|
||||||
|
data for each time period.
|
||||||
|
|
||||||
|
2. TIMELINE_PATH_POINTS.CSV - Movement Data
|
||||||
|
--------------------------------------------
|
||||||
|
Purpose: GPS tracking data during travel/movement periods
|
||||||
|
Key Fields:
|
||||||
|
- segment_index: Links to semantic_segments
|
||||||
|
- point_index: Order of GPS points within a segment
|
||||||
|
- time: Precise timestamp for each GPS reading
|
||||||
|
- lat/lon: Geographic coordinates
|
||||||
|
- raw_point: Original coordinate string
|
||||||
|
|
||||||
|
Relationship: Contains data ONLY for segments where has_timeline_path=1 in semantic_segments.
|
||||||
|
These represent periods when the person was moving between locations.
|
||||||
|
|
||||||
|
3. VISITS.CSV - Stationary Location Data
|
||||||
|
-----------------------------------------
|
||||||
|
Purpose: Information about places where the person stayed for extended periods
|
||||||
|
Key Fields:
|
||||||
|
- segment_index: Links to semantic_segments
|
||||||
|
- top_place_id: Google Places API identifier
|
||||||
|
- top_semantic_type: Category (HOME, WORK, UNKNOWN, etc.)
|
||||||
|
- top_lat/top_lon: Geographic coordinates of the visit location
|
||||||
|
- startTime/endTime: Duration of the visit
|
||||||
|
- visit_probability: Confidence that this was actually a visit
|
||||||
|
|
||||||
|
Relationship: Contains data ONLY for segments where has_visit=1 in semantic_segments.
|
||||||
|
These represent periods when the person was stationary at a specific location.
|
||||||
|
|
||||||
|
4. FREQUENT_PLACES.CSV - Location Reference Data
|
||||||
|
------------------------------------------------
|
||||||
|
Purpose: Registry of commonly visited locations with semantic labels
|
||||||
|
Key Fields:
|
||||||
|
- placeId: Google Places API identifier (links to visits.top_place_id)
|
||||||
|
- label: Semantic meaning (HOME, WORK, or empty for unlabeled)
|
||||||
|
- lat/lon: Geographic coordinates
|
||||||
|
|
||||||
|
Relationship: Acts as a lookup table for visits.csv. The placeId field provides
|
||||||
|
cross-references to identify and categorize frequently visited locations.
|
||||||
|
|
||||||
|
5. RAW_SIGNALS.CSV - Raw GPS Data
|
||||||
|
---------------------------------
|
||||||
|
Purpose: Unprocessed GPS signals from the device
|
||||||
|
Key Fields:
|
||||||
|
- raw_index: Sequential identifier
|
||||||
|
- timestamp: When the GPS signal was recorded
|
||||||
|
- lat/lon: Geographic coordinates
|
||||||
|
- accuracyMeters: GPS accuracy measurement
|
||||||
|
- altitudeMeters: Elevation data
|
||||||
|
- speedMetersPerSecond: Movement speed
|
||||||
|
- source: Data source type
|
||||||
|
|
||||||
|
Relationship: This is the foundation data that gets processed into timeline_path_points
|
||||||
|
and visits. It represents the raw GPS signals before semantic interpretation.
|
||||||
|
|
||||||
|
SUPPORTING DATASETS
|
||||||
|
===================
|
||||||
|
|
||||||
|
6. FREQUENT_TRIPS.CSV - Trip Pattern Analysis
|
||||||
|
----------------------------------------------
|
||||||
|
Purpose: Analysis of regular travel patterns (like commutes)
|
||||||
|
Key Fields:
|
||||||
|
- trip_index: Unique identifier for trip patterns
|
||||||
|
- startTimeMinutes/endTimeMinutes: Time of day patterns
|
||||||
|
- durationMinutes: Typical trip duration
|
||||||
|
- commuteDirection: HOME_TO_WORK or WORK_TO_HOME
|
||||||
|
- waypoint_count: Number of stops in the trip
|
||||||
|
|
||||||
|
7. FREQUENT_TRIP_WAYPOINTS.CSV - Trip Waypoint Details
|
||||||
|
------------------------------------------------------
|
||||||
|
Purpose: Specific locations that are part of frequent trips
|
||||||
|
Key Fields:
|
||||||
|
- trip_index: Links to frequent_trips.csv
|
||||||
|
- waypoint_order: Sequence of stops in the trip
|
||||||
|
- waypoint_id: Links to frequent_places.placeId
|
||||||
|
|
||||||
|
8. FREQUENT_TRIP_MODE_DISTRIBUTION.CSV - Transportation Analysis
|
||||||
|
---------------------------------------------------------------
|
||||||
|
Purpose: Analysis of transportation methods used
|
||||||
|
Key Fields:
|
||||||
|
- trip_index: Links to frequent_trips.csv
|
||||||
|
- mode: Transportation type (WALKING, DRIVING, etc.)
|
||||||
|
- percentage: How often this mode was used for this trip
|
||||||
|
|
||||||
|
9. TRAVEL_MODE_AFFINITIES.CSV - User Preferences
|
||||||
|
------------------------------------------------
|
||||||
|
Purpose: User's preferred transportation methods
|
||||||
|
Key Fields:
|
||||||
|
- mode: Transportation type
|
||||||
|
- affinity: Preference score
|
||||||
|
|
||||||
|
DATA FLOW AND RELATIONSHIPS
|
||||||
|
============================
|
||||||
|
|
||||||
|
1. RAW COLLECTION:
|
||||||
|
raw_signals.csv contains all GPS pings from the device
|
||||||
|
|
||||||
|
2. TEMPORAL SEGMENTATION:
|
||||||
|
semantic_segments.csv divides time into logical periods based on movement patterns
|
||||||
|
|
||||||
|
3. MOVEMENT vs. STATIONARY CLASSIFICATION:
|
||||||
|
- Movement periods → timeline_path_points.csv (detailed GPS tracking)
|
||||||
|
- Stationary periods → visits.csv (location identification and categorization)
|
||||||
|
|
||||||
|
4. LOCATION IDENTIFICATION:
|
||||||
|
frequent_places.csv provides semantic meaning to visited locations
|
||||||
|
|
||||||
|
5. PATTERN ANALYSIS:
|
||||||
|
frequent_trips.csv, frequent_trip_waypoints.csv, and frequent_trip_mode_distribution.csv
|
||||||
|
analyze regular patterns and transportation preferences
|
||||||
|
|
||||||
|
EXAMPLE DATA FLOW
|
||||||
|
==================
|
||||||
|
|
||||||
|
Segment 0 (Movement): 2013-12-31 22:00 - 2014-01-01 00:00
|
||||||
|
- semantic_segments: has_timeline_path=1, has_visit=0
|
||||||
|
- timeline_path_points: Contains GPS coordinates during this travel period
|
||||||
|
- visits: No data for this segment
|
||||||
|
|
||||||
|
Segment 1 (Visit): 2013-12-31 22:29 - 2014-01-01 17:10
|
||||||
|
- semantic_segments: has_timeline_path=0, has_visit=1
|
||||||
|
- timeline_path_points: No data for this segment
|
||||||
|
- visits: Shows visit to place ChIJyaJWtZVqdkgRZHVIi0HKLto (HOME)
|
||||||
|
- frequent_places: Confirms this placeId is labeled as "HOME"
|
||||||
|
|
||||||
|
QUERYING STRATEGIES
|
||||||
|
===================
|
||||||
|
|
||||||
|
To get complete journey information:
|
||||||
|
1. Query semantic_segments for time range
|
||||||
|
2. For movement segments: Join with timeline_path_points on segment_index
|
||||||
|
3. For visit segments: Join with visits on segment_index
|
||||||
|
4. Enhance visit data by joining visits.top_place_id with frequent_places.placeId
|
||||||
|
|
||||||
|
To analyze location patterns:
|
||||||
|
1. Use frequent_places for location categories
|
||||||
|
2. Use frequent_trips for commute patterns
|
||||||
|
3. Use travel_mode_affinities for transportation preferences
|
||||||
|
|
||||||
|
COORDINATE SYSTEMS
|
||||||
|
==================
|
||||||
|
All latitude/longitude data uses WGS84 decimal degrees:
|
||||||
|
- Latitude: Positive = North, Negative = South
|
||||||
|
- Longitude: Positive = East, Negative = West
|
||||||
|
- Precision: Typically 6-7 decimal places (meter-level accuracy)
|
||||||
|
|
||||||
|
TIME ZONES
|
||||||
|
==========
|
||||||
|
All timestamps include timezone information (typically +00:00 or +01:00 for UK data).
|
||||||
|
Time ranges in semantic_segments define the boundaries for linking other datasets.
|
||||||
|
|
||||||
|
DATA COMPLETENESS
|
||||||
|
=================
|
||||||
|
- Not all segments have both movement and visit data
|
||||||
|
- Some segments may have neither (gaps in tracking)
|
||||||
|
- Visit probability scores indicate confidence levels
|
||||||
|
- Missing coordinates in raw_signals are represented as empty fields
|
||||||
|
|
||||||
|
This hierarchical structure allows for both detailed movement tracking and high-level
|
||||||
|
pattern analysis while maintaining semantic meaning about the places visited.
|
||||||
Reference in New Issue
Block a user