GEOSPATIAL DATA RELATIONSHIPS IN PERSONAL TRACKER ================================================== This document explains how the geospatial datasets in the timeline_csv folder are interconnected and structured to provide a complete picture of location-based personal tracking data. OVERVIEW -------- The location tracking system uses a hierarchical approach with semantic segments as the master index that coordinates different types of location data. The data is organized into movement periods (travel) and stationary periods (visits), creating a complete chronological timeline. CORE DATASETS AND THEIR RELATIONSHIPS ===================================== 1. SEMANTIC_SEGMENTS.CSV - The Master Index -------------------------------------------- Purpose: Acts as the central orchestrator that defines time-based segments Key Fields: - segment_index: Unique identifier linking all other datasets - startTime/endTime: Time boundaries for each segment - has_visit: Boolean indicating if segment contains visit data - has_timeline_path: Boolean indicating if segment contains movement data This dataset defines the temporal structure and determines which other datasets contain data for each time period. 2. TIMELINE_PATH_POINTS.CSV - Movement Data -------------------------------------------- Purpose: GPS tracking data during travel/movement periods Key Fields: - segment_index: Links to semantic_segments - point_index: Order of GPS points within a segment - time: Precise timestamp for each GPS reading - lat/lon: Geographic coordinates - raw_point: Original coordinate string Relationship: Contains data ONLY for segments where has_timeline_path=1 in semantic_segments. These represent periods when the person was moving between locations. 3. VISITS.CSV - Stationary Location Data ----------------------------------------- Purpose: Information about places where the person stayed for extended periods Key Fields: - segment_index: Links to semantic_segments - top_place_id: Google Places API identifier - top_semantic_type: Category (HOME, WORK, UNKNOWN, etc.) - top_lat/top_lon: Geographic coordinates of the visit location - startTime/endTime: Duration of the visit - visit_probability: Confidence that this was actually a visit Relationship: Contains data ONLY for segments where has_visit=1 in semantic_segments. These represent periods when the person was stationary at a specific location. 4. FREQUENT_PLACES.CSV - Location Reference Data ------------------------------------------------ Purpose: Registry of commonly visited locations with semantic labels Key Fields: - placeId: Google Places API identifier (links to visits.top_place_id) - label: Semantic meaning (HOME, WORK, or empty for unlabeled) - lat/lon: Geographic coordinates Relationship: Acts as a lookup table for visits.csv. The placeId field provides cross-references to identify and categorize frequently visited locations. 5. RAW_SIGNALS.CSV - Raw GPS Data --------------------------------- Purpose: Unprocessed GPS signals from the device Key Fields: - raw_index: Sequential identifier - timestamp: When the GPS signal was recorded - lat/lon: Geographic coordinates - accuracyMeters: GPS accuracy measurement - altitudeMeters: Elevation data - speedMetersPerSecond: Movement speed - source: Data source type Relationship: This is the foundation data that gets processed into timeline_path_points and visits. It represents the raw GPS signals before semantic interpretation. SUPPORTING DATASETS =================== 6. FREQUENT_TRIPS.CSV - Trip Pattern Analysis ---------------------------------------------- Purpose: Analysis of regular travel patterns (like commutes) Key Fields: - trip_index: Unique identifier for trip patterns - startTimeMinutes/endTimeMinutes: Time of day patterns - durationMinutes: Typical trip duration - commuteDirection: HOME_TO_WORK or WORK_TO_HOME - waypoint_count: Number of stops in the trip 7. FREQUENT_TRIP_WAYPOINTS.CSV - Trip Waypoint Details ------------------------------------------------------ Purpose: Specific locations that are part of frequent trips Key Fields: - trip_index: Links to frequent_trips.csv - waypoint_order: Sequence of stops in the trip - waypoint_id: Links to frequent_places.placeId 8. FREQUENT_TRIP_MODE_DISTRIBUTION.CSV - Transportation Analysis --------------------------------------------------------------- Purpose: Analysis of transportation methods used Key Fields: - trip_index: Links to frequent_trips.csv - mode: Transportation type (WALKING, DRIVING, etc.) - percentage: How often this mode was used for this trip 9. TRAVEL_MODE_AFFINITIES.CSV - User Preferences ------------------------------------------------ Purpose: User's preferred transportation methods Key Fields: - mode: Transportation type - affinity: Preference score DATA FLOW AND RELATIONSHIPS ============================ 1. RAW COLLECTION: raw_signals.csv contains all GPS pings from the device 2. TEMPORAL SEGMENTATION: semantic_segments.csv divides time into logical periods based on movement patterns 3. MOVEMENT vs. STATIONARY CLASSIFICATION: - Movement periods → timeline_path_points.csv (detailed GPS tracking) - Stationary periods → visits.csv (location identification and categorization) 4. LOCATION IDENTIFICATION: frequent_places.csv provides semantic meaning to visited locations 5. PATTERN ANALYSIS: frequent_trips.csv, frequent_trip_waypoints.csv, and frequent_trip_mode_distribution.csv analyze regular patterns and transportation preferences EXAMPLE DATA FLOW ================== Segment 0 (Movement): 2013-12-31 22:00 - 2014-01-01 00:00 - semantic_segments: has_timeline_path=1, has_visit=0 - timeline_path_points: Contains GPS coordinates during this travel period - visits: No data for this segment Segment 1 (Visit): 2013-12-31 22:29 - 2014-01-01 17:10 - semantic_segments: has_timeline_path=0, has_visit=1 - timeline_path_points: No data for this segment - visits: Shows visit to place ChIJyaJWtZVqdkgRZHVIi0HKLto (HOME) - frequent_places: Confirms this placeId is labeled as "HOME" QUERYING STRATEGIES =================== To get complete journey information: 1. Query semantic_segments for time range 2. For movement segments: Join with timeline_path_points on segment_index 3. For visit segments: Join with visits on segment_index 4. Enhance visit data by joining visits.top_place_id with frequent_places.placeId To analyze location patterns: 1. Use frequent_places for location categories 2. Use frequent_trips for commute patterns 3. Use travel_mode_affinities for transportation preferences COORDINATE SYSTEMS ================== All latitude/longitude data uses WGS84 decimal degrees: - Latitude: Positive = North, Negative = South - Longitude: Positive = East, Negative = West - Precision: Typically 6-7 decimal places (meter-level accuracy) TIME ZONES ========== All timestamps include timezone information (typically +00:00 or +01:00 for UK data). Time ranges in semantic_segments define the boundaries for linking other datasets. DATA COMPLETENESS ================= - Not all segments have both movement and visit data - Some segments may have neither (gaps in tracking) - Visit probability scores indicate confidence levels - Missing coordinates in raw_signals are represented as empty fields This hierarchical structure allows for both detailed movement tracking and high-level pattern analysis while maintaining semantic meaning about the places visited.