Monthly Archives: November 2006

Eager Finder SQL

EagerFinderSql allows custom SQL to be specified when doing eager loading of associations through the :include option to find. This allows for purpose-constructed queries to be used and still result in a fully linked object model.

Background

ActiveRecord constructs SQL to satisfy the requirements of a find request. Associations allow for customized SQL to be specified, using the :finder_sql option, but this has not been available when performing eager loading using the :include option. The result is that a standardized query is constructed to bring in the associated tables using LEFT OUTER JOIN. For some queries, this can be result in expensive queries and potentially very large result sets.

Custom SQL

EagerFinderSql addresses this problem by allowing :finder_sql to be added to find options when the :include option is also specified. Columns in the result set are mapped to attributes in the resulting object model through the :column_mapping option. The :column_mapping option specifies a hash containing the following entries:

  • primary_key – indicates the column alias associated with the classes id attribute.
  • columns – is a hash of attribute to column alias associations.
  • associations – is a hash of association mappings for each directly included model. The key for each entry is the name of the association, while the value is a hash similar to the hash for the parent class.

Example

Consider the following query relating authors with many books:

Author.find(:all, :include => :books)

This would likely produce the following SQL:

SELECT
    authors.id AS t0_c0, authors.name AS t0_c1,
    books.id AS t1_c0, books.author_id AS t1_c1, books.title AS t1_c2
  FROM authors
  LEFT OUTER JOIN books ON books.author_id = authors.id

The query might be written with books as the driver, rather than authors as follows:

 SELECT
    authors.id, authors.name,
    books.id AS book_id, books.name AS book_name
  FROM books
  JOIN authors ON authors.id = books.author_id

The Rails query would then be written as follows:

Author.find(:all, :include => books,
           :finder_sql => "
                SELECT
                    authors.id, authors.name,
                    books.id AS book_id, books.name AS book_name
                  FROM books
                  JOIN authors ON authors.id = books.author_id".
           :column_mapping => {
             :primary_key => 'id',
             :columns => {
                  'id'  => 'id',
                  'name => 'name'
              },
              :associations => {
                :books => {
                  :primary_key => book_id,
                  :columns => {
                      'id'        => 'book_id',
                      'author_id' => 'id',
                      'name'      => 'book_name
                  }
                }
              }
            })

This is more verbose, but allows for absolute control of the SQL used to return results across multiple model associations. A more complicated example would be the following:

QUERY = "
SELECT
    a.id AS author_id,
    a.name AS author_name,
    p.id AS post_id,
    p.title AS post_title,
    p.body AS post_body,
    p.type AS post_type,
    c.id AS comment_id,
    c.body AS comment_body,
    c.type AS comment_type
  FROM authors a
  LEFT JOIN posts p ON p.author_id = a.id
  LEFT JOIN comments c ON c.post_id = p.id"

MAPPING = {
  :primary_key => 'author_id',
  :columns => {
    'id' => 'author_id',
    'name' => 'author_name'
  },                         
  :associations=> {
    :posts => {
      :primary_key  => 'post_id',
      :columns => {
        'id' => 'post_id',
        'title' => 'post_title',
        'author_id' => 'author_id',
        'body' => 'post_pody',
        'type' => 'post_type'
      },
      :associations => {
        :comments => {
          :primary_key  => 'comment_id',
          :columns => {
            'id' => 'comment_id',
            'post_id' => 'post_id',
            'author_id' => 'author_id',
            'body' => 'comment_body',
            'type' => 'comment_type'
          }
        }
      }
    }
  }
}
Author.find(:all,
            :include=>{:posts=>:comments},
            :order=>"authors.id",
            :finder_sql => QUERY,
            :column_mapping => MAPPING)

This example shows the recursive nature of associations. It can also be used to render deeper structures that may reference the same model multiple times, such as the following:

Firm.find(:all,
          :include=>{:account=>{:firm=>:account}},
          :order=>"companies.id")

(Here, the query and mapping is left to the reader; or, you can look at the unit test which documents many more possibilities). Install using

script/plugin install svn://rubyforge.org/var/svn/eagerfindersql

The plugin is managed on RubyForge. Rdoc is available.

Project Communication

Many software projects fail to deliver on-time and on-budget and a factor in that is normally inefficient project communication. Studies have shown that software teams that consistently deliver on-time and on-budget communicate in an effective manner. These teams stay in contact constantly, but wisely use each other’s time during the communication process and are careful not to waste other people’s time. Below are some tips for enhancing project communication when working in software development projects:

  1. Set up an online repository for documentation – For effective communication, all your project documents should be online, up-to-date, and available at everyone’s finger tips. This includes requirement documents, detailed designs, test plans, project plans, status reports, user acceptance plans, post mortem documents, etc. Having all of these documents at everyone’s disposal ensures that everyone is working on the same set of deliverables. Many groups usefully solve this problem by setting up a department Wiki that acts as a convenient location for storing and updating important project information such as documentation.
  2. Specify Clear Roles and Responsibilities – For effective communication, everyone on the team should know what their role is and what they are accountable for. Each role should be documented during the initial phases of the project and every team member should sign off on their responsibilities. Post these in your online documentation repository.
  3. Monitor Employee Performance – Each employee should have defined goals and should be measured against their goals monthly or quarterly (depending on project duration). Goals should be specific, measurable and achievable. Progress to goals should be based on objective measurement.
  4. Progress Reports – Progress reports should be created weekly. This can be as informal as creating a weekly status report for management review or as formal as creating reports using a project planning tool. Consider using a tool for this, some good ones are Software Planner and Microsoft Project.
  5. Make Decisions based on Facts – All decisions should be objective and should not be self-serving. Decisions should be based on facts, performance and in the spirit of improving overall team performance.
  6. Project Tracking should be done Online – All phases of the Software Lifecycle should be managed online, preferably via the web. This includes managing of requirements, tasks, issues, defects, test cases. sharing knowledge with discussion forums and your document repository.
  7. Be Careful of People’s Time – It is good to have periodic meetings, but the presence of solid project tracking tools eliminates many of the long-winded meetings that teams have. Team members are most productive when they know their roles, responsibilities and have a way of showing progress online. This can greatly reduce the number of face-to-face meetings that are needed. (Thanks to Steve Miller of Pragmatic Software for insights into these issues.) Gregg Kellogg

Web Design Goals

Working with legacy software systems provides its own considerations, but there are some general goals that can be stated for web-based software applications. To the degree that the nature of system upgrades substantially touches these areas of system design, it is important to consider the following goals within the system design. Well designed applications should meet the following goals in order to provide lasting value:

  • Be robust – Enterprise software is important to an organization. Users expect it to be reliable and bug-free. Therefore, it is incumbent on the design team to use the best possible software practices build robust solutions and ensure that the code is of the highest quality.
  • Be performant and scalable – Enterprise systems must meet the performance expectations of their users. They must also exhibit sufficient scalability – the potential for an application to support increased load, given appropriate hardware.
  • Take advantage of OO design principles – Object Oriented (OO) design principles offer proven benefits for complex systems. Good OO design practice is promoted by the use of proven design patterns – recurring solutions to common problems. The concept of design patterns was popularized in OO software development in Design Patterns: Elements of Reusable Object-Oriented Software.
  • Avoid unnecessary complexity – Practitioners of Extreme Programming (XP) advocate doing the simplest thing that could possibly work. The lesson is to be wary of excessive complexity. Complexity adds to costs throughout the software lifecycle. On the other hand, thorough analysis must ensure that we don’t have a naive and simplistic view of requirements.
  • Be maintainable and extensible – Maintenance is by far the most expensive phase of the software lifecycle. It’s particularly important to consider maintainability when designing Enterprise systems. A well designed application will be useful to an organization for years and must be able to accommodate new business needs. Maintainability and extensibility depend largely on clean design. We need to ensure that each component of the application has a clear responsibility, and that maintenance is not hindered by tightly-coupled components.
  • Be delivered on time – The objectives of clean design must not eclipse the need for an efficient and productive development cycle. The use of development environments and frameworks that ease the development process and help implement proven design patterns is an important part in considering a development project.
  • Be easy to test – Testing is an essential activity throughout the software lifecycle. We should consider the implications of design decisions for ease of testing.
  • Promote reuse – Enterprise software must fit into an organization’s long term strategy. Thus it’s important to foster reuse, so that code duplication is minimized. Code reuse usually results from good OO design practice, while we should also consistently use valuable infrastructure provided by the application server where it simplifies application code.

(Portions of this entry are derived from J2EE Design and Development by Rod Johnson.)

Thoughts on Extreme Programming

A popular trend in systems development circles is the so-called Extreme Programming Paradigm. XP has proven to be quite effective at rapidly producing system functionality, and it includes many practices that should be considered by any development team (e.g., Test Driven Development or TDD). However, there are many pitfalls to XP that don’t necessarily make it the best technique for many organizations. Extreme Programming is a technique that is used to much profit by many organizations today. In contrast to the more “rigid” documentation guidelines outlined above, Extreme Programming is characterized by a much more streamlined time-bounded development process. Typically developers work in teams and communicate directly with the stakeholders. They then choose a part of the problem to focus on and develop something in a fixed amount of time; to the degree that features cannot be accommodated within the stated time frame (often weekly, sometimes monthly), these features are ignored in order to have something to show. The stakeholders then inspect the result and suggest modifications or improvements to the developers who continue with the process until the stakeholders are satisfied. While this can be very effective at showing progress in the short-term, it suffers from a lack of long-term memory. Over time, stakeholders change or original goals were forgotten. Extreme Programming is best suited for ephemeral projects, such as a consumer-facing website, that often don’t have a lifetime beyond several months. It is inappropriate for corporate development of mission critical systems that have complicated requirements and whose stakeholders can change over time. Successful corporate development efforts borrow the best practices from Extreme Programming, such as Test Driven Design and Pair Programming along with the most useful of the Waterfall design practices, such as the reliance on use cases and functional specifications, to create lasting products.