From be20ba8c41ad668ce0423402a1a3301d1ee977b7 Mon Sep 17 00:00:00 2001 From: Azeem Fidahusein Date: Thu, 25 Sep 2025 21:01:15 +0100 Subject: [PATCH] geotxt --- .../ingest/geospatial_data_relationships.txt | 185 ++++++++++++++++++ 1 file changed, 185 insertions(+) create mode 100644 scripts/ingest/geospatial_data_relationships.txt diff --git a/scripts/ingest/geospatial_data_relationships.txt b/scripts/ingest/geospatial_data_relationships.txt new file mode 100644 index 0000000..b8f05d9 --- /dev/null +++ b/scripts/ingest/geospatial_data_relationships.txt @@ -0,0 +1,185 @@ +GEOSPATIAL DATA RELATIONSHIPS IN PERSONAL TRACKER +================================================== + +This document explains how the geospatial datasets in the timeline_csv folder are interconnected +and structured to provide a complete picture of location-based personal tracking data. + +OVERVIEW +-------- +The location tracking system uses a hierarchical approach with semantic segments as the master +index that coordinates different types of location data. The data is organized into movement +periods (travel) and stationary periods (visits), creating a complete chronological timeline. + +CORE DATASETS AND THEIR RELATIONSHIPS +===================================== + +1. SEMANTIC_SEGMENTS.CSV - The Master Index +-------------------------------------------- +Purpose: Acts as the central orchestrator that defines time-based segments +Key Fields: + - segment_index: Unique identifier linking all other datasets + - startTime/endTime: Time boundaries for each segment + - has_visit: Boolean indicating if segment contains visit data + - has_timeline_path: Boolean indicating if segment contains movement data + +This dataset defines the temporal structure and determines which other datasets contain +data for each time period. + +2. TIMELINE_PATH_POINTS.CSV - Movement Data +-------------------------------------------- +Purpose: GPS tracking data during travel/movement periods +Key Fields: + - segment_index: Links to semantic_segments + - point_index: Order of GPS points within a segment + - time: Precise timestamp for each GPS reading + - lat/lon: Geographic coordinates + - raw_point: Original coordinate string + +Relationship: Contains data ONLY for segments where has_timeline_path=1 in semantic_segments. +These represent periods when the person was moving between locations. + +3. VISITS.CSV - Stationary Location Data +----------------------------------------- +Purpose: Information about places where the person stayed for extended periods +Key Fields: + - segment_index: Links to semantic_segments + - top_place_id: Google Places API identifier + - top_semantic_type: Category (HOME, WORK, UNKNOWN, etc.) + - top_lat/top_lon: Geographic coordinates of the visit location + - startTime/endTime: Duration of the visit + - visit_probability: Confidence that this was actually a visit + +Relationship: Contains data ONLY for segments where has_visit=1 in semantic_segments. +These represent periods when the person was stationary at a specific location. + +4. FREQUENT_PLACES.CSV - Location Reference Data +------------------------------------------------ +Purpose: Registry of commonly visited locations with semantic labels +Key Fields: + - placeId: Google Places API identifier (links to visits.top_place_id) + - label: Semantic meaning (HOME, WORK, or empty for unlabeled) + - lat/lon: Geographic coordinates + +Relationship: Acts as a lookup table for visits.csv. The placeId field provides +cross-references to identify and categorize frequently visited locations. + +5. RAW_SIGNALS.CSV - Raw GPS Data +--------------------------------- +Purpose: Unprocessed GPS signals from the device +Key Fields: + - raw_index: Sequential identifier + - timestamp: When the GPS signal was recorded + - lat/lon: Geographic coordinates + - accuracyMeters: GPS accuracy measurement + - altitudeMeters: Elevation data + - speedMetersPerSecond: Movement speed + - source: Data source type + +Relationship: This is the foundation data that gets processed into timeline_path_points +and visits. It represents the raw GPS signals before semantic interpretation. + +SUPPORTING DATASETS +=================== + +6. FREQUENT_TRIPS.CSV - Trip Pattern Analysis +---------------------------------------------- +Purpose: Analysis of regular travel patterns (like commutes) +Key Fields: + - trip_index: Unique identifier for trip patterns + - startTimeMinutes/endTimeMinutes: Time of day patterns + - durationMinutes: Typical trip duration + - commuteDirection: HOME_TO_WORK or WORK_TO_HOME + - waypoint_count: Number of stops in the trip + +7. FREQUENT_TRIP_WAYPOINTS.CSV - Trip Waypoint Details +------------------------------------------------------ +Purpose: Specific locations that are part of frequent trips +Key Fields: + - trip_index: Links to frequent_trips.csv + - waypoint_order: Sequence of stops in the trip + - waypoint_id: Links to frequent_places.placeId + +8. FREQUENT_TRIP_MODE_DISTRIBUTION.CSV - Transportation Analysis +--------------------------------------------------------------- +Purpose: Analysis of transportation methods used +Key Fields: + - trip_index: Links to frequent_trips.csv + - mode: Transportation type (WALKING, DRIVING, etc.) + - percentage: How often this mode was used for this trip + +9. TRAVEL_MODE_AFFINITIES.CSV - User Preferences +------------------------------------------------ +Purpose: User's preferred transportation methods +Key Fields: + - mode: Transportation type + - affinity: Preference score + +DATA FLOW AND RELATIONSHIPS +============================ + +1. RAW COLLECTION: + raw_signals.csv contains all GPS pings from the device + +2. TEMPORAL SEGMENTATION: + semantic_segments.csv divides time into logical periods based on movement patterns + +3. MOVEMENT vs. STATIONARY CLASSIFICATION: + - Movement periods → timeline_path_points.csv (detailed GPS tracking) + - Stationary periods → visits.csv (location identification and categorization) + +4. LOCATION IDENTIFICATION: + frequent_places.csv provides semantic meaning to visited locations + +5. PATTERN ANALYSIS: + frequent_trips.csv, frequent_trip_waypoints.csv, and frequent_trip_mode_distribution.csv + analyze regular patterns and transportation preferences + +EXAMPLE DATA FLOW +================== + +Segment 0 (Movement): 2013-12-31 22:00 - 2014-01-01 00:00 +- semantic_segments: has_timeline_path=1, has_visit=0 +- timeline_path_points: Contains GPS coordinates during this travel period +- visits: No data for this segment + +Segment 1 (Visit): 2013-12-31 22:29 - 2014-01-01 17:10 +- semantic_segments: has_timeline_path=0, has_visit=1 +- timeline_path_points: No data for this segment +- visits: Shows visit to place ChIJyaJWtZVqdkgRZHVIi0HKLto (HOME) +- frequent_places: Confirms this placeId is labeled as "HOME" + +QUERYING STRATEGIES +=================== + +To get complete journey information: +1. Query semantic_segments for time range +2. For movement segments: Join with timeline_path_points on segment_index +3. For visit segments: Join with visits on segment_index +4. Enhance visit data by joining visits.top_place_id with frequent_places.placeId + +To analyze location patterns: +1. Use frequent_places for location categories +2. Use frequent_trips for commute patterns +3. Use travel_mode_affinities for transportation preferences + +COORDINATE SYSTEMS +================== +All latitude/longitude data uses WGS84 decimal degrees: +- Latitude: Positive = North, Negative = South +- Longitude: Positive = East, Negative = West +- Precision: Typically 6-7 decimal places (meter-level accuracy) + +TIME ZONES +========== +All timestamps include timezone information (typically +00:00 or +01:00 for UK data). +Time ranges in semantic_segments define the boundaries for linking other datasets. + +DATA COMPLETENESS +================= +- Not all segments have both movement and visit data +- Some segments may have neither (gaps in tracking) +- Visit probability scores indicate confidence levels +- Missing coordinates in raw_signals are represented as empty fields + +This hierarchical structure allows for both detailed movement tracking and high-level +pattern analysis while maintaining semantic meaning about the places visited. \ No newline at end of file