Project Kids Methodology
Project KIDS has developed (and is continually refining) the methodology to tie the expense and student data together, to allocate resources to each individual student in each district. This heavily involved process requires data collection from multiple sources. The data are then cleaned individually, and merged together at different stages of the process. Algorithms are used to impute missing values, and supplement information in these datasets. A summary of each step of the process to create the visualizations for every district is as follows:
- Data Collection Collect all necessary (or available) data.
- Class Data and Methodology Clean all student-level course data from USBE and create a unique class ID for each unique class.
- Student-level Data Merge Merge all student-level data (including courses, demographics, and performance data) together and export individual files for districts.
- Nutrition Data and Imputation Clean student-level nutrition data (which indicates whether a student ate school meals and if so, how many) and impute missing values using a machine learning algorithm.
- Transportation Data and Imputation Clean A1 reports as well as any bus eligibility data shared by the LEA and impute missing values.
- Teacher Instructional Compensation Assignment Clean payroll data, identify the teachers in the data (using both USBE’s Cactus and UTREx course data), and label each transaction for those teachers as either instructional or non-instructional. Label all compensation for individuals who are not in both Cactus and the course data as non-instructional.
- Expense and Non-Instructional Compensation Assignment Clean expense data (including any payroll data that was marked as non-instructional), and use an algorithm to allocate each expense to spending categories used to allocate the money to each individual student.
- Student Allocation – Instructional Compensation Tie teacher instructional compensation to students through the classes they teach in the course data.
- Student Allocation – Expense and Non-Instructional Compensation Tie expenses and non-instructional compensation to students through the categories each transaction was assigned to. Different rules are created for each category to determine which students receive which resources, using information such as their school meal counts, bus eligibility, demographic information, and course taking patterns.
- Visualization Dashboards Create interactive visualizations for each district, aggregated at different levels, but based on the student-level resource allocation.
The remaining portions of the Detailed Processes section follow the structure summary laid out above, but with more detail for each step of the process.
3.1 Expense and Non-Instructional Compensation
Expense data is pulled from the Transparent Utah website. Once preliminary data quality and location checks have been completed, the expense data is further cleaned and allocated to specific spending categories, or subcodes. Project KIDS has created around 200 unique subcodes that these expenses in the final expense file are each assigned to. Each subcode has a unique rule for which expenses get assigned to it, and a unique rule for how to assign those expenses to the individual students. The spending categories were created with the sole purpose being to assign expenses to students as specifically as possible (see Appendix I for each category’s rules). This means that when an expense could fit in multiple categories, the methodology assigns the expense to the category that would assign the expense to the most specific group of students.
3.1.2 Transparency Query
Detailed expense data is pulled from the Transparent Utah website. This transaction-level data is processed to split out account numbers into their different aspects: fund, location, program, function, and object. Also contained in the expense data are names, aspect-specific descriptions, and a general description for each transaction. Below is an explanation of a common account code, with descriptions corresponding to the USBE standard chart of accounts:
3.1.3 Preliminary Analysis
The expense data is then analyzed at a high level. The data is aggregated to the fund level and totals by year are cross-checked against the fund totals stated in the Comprehensive Annual Financial Report, or CAFR. These typically include funds such as the general fund, the non K-12 fund, the school food services fund, the capital fund, and so on. Any significant differences in totals between the two data sources are flagged, and some school districts were contacted at this point to correct any large discrepancies. Below is an example of a data issue that was caught during this phase and then corrected:
At this point, employee compensation totals are also compared between data sources. This is because the employee compensation data in expense is later swapped with the payroll data received from school districts. Totals by year and by type (pay or benefit) are ensured to be similar during this step.